Research on path planning of cleaning robot based on an improved ant colony algorithm

The conventional ant colony algorithm is easy to fall into the local optimal in some complex environments, and the blindness in the initial stage of search leads to long searching time and slow convergence. In order to solve these problems, this paper proposes an improved ant colony algorithm and applies it to the path planning of cleaning robot. The algorithm model of the environmental map is established according to the grid method. And it built the obstacle matrix for the expansion and treatment of obstacles, so that the robot can avoid collision with obstacles as much as possible in the process of movement. The directional factor is introduced in the new heuristic function, and we can reduce the value of the inflection point of paths, enhance the algorithm precision, and avoid falling into the local optimal. The volatile factor of pheromones with an adaptive adjustment and the improved updating rule of pheromones can not only solve the problem that the algorithm falls into local optimum, but also accelerate the running efficiency of the algorithm in the later stage. Simulation results show that the algorithm has the better global searching ability, the convergence speed is obviously accelerated, and an optimal path can be planned in the complex environment.


Introduction
As one of service-oriented mobile robots, cleaning robot has specific intelligent and automation technology. It can complete the cleaning task flexibly and efficiently, and this replaces the dull and heavy human labour. Meanwhile, it improves people's living standard and has an extensive market prospect. Cleaning robot has many research hotspots of robot technology, among which path planning is the most critical to determine the intellectual level of robot. The primary purpose of path planning is to make the robot to plan an optimal path from the starting point to the ending point and can avoid obstacles in the environment with obstacles [1].
The typical path planning methods include artificial potential field method [2], genetic algorithm [3], particle swarm optimization algorithm [4], neural network algorithm [5], etc. However, these algorithms many problems such as: fall into local optimum easily, poor stability and low efficiency. Compared with these algorithms, ant colony algorithm has many advantages, for example, strong robustness, excellent distributed computer system, to be easy to combine with other algorithms, and the excellent result of path planning can be achieved.
Although ant colony algorithm(ACO) has the above advantages, the traditional ACO is also easy to fall into local optimization, has longer search time and slower convergence speed of path planning in a complex environment. For this reason, many experts and scholars have made many optimization improvements on ACO from the aspects of search strategy, the pheromone updating and combination with other intelligent algorithms. Reference [6] proposed a dynamic search operator to improve the quality of the solution and the convergence speed of the algorithm; reference [7] improved the search efficiency of the algorithm by combining ACO with the genetic algorithm; reference [8] added a new heuristic factor by introducing the obstacle repulsion weight to build a new transfer probability to improve the ability of the ant to avoid obstacles; reference [9] proposed a new mechanism of local and global pheromone updating to reduce the possibility that the algorithm falls into local optimality.
ACO has longer search time in robot path planning and is easy to fall into local optimal. To better solve these problems about path planning of cleaning robot, this paper has made improvements of ACO as follows:1.in the heuristic function, the directional factor is added in the heuristic function to reduce the generation value of the path inflection point; improve the orientation; enhance the accuracy and speed of the algorithm; and effectively avoids the problem of local optimal solution; 2.the volatilization factor of pheromones with an adaptive adjustment can not only avoid the problem about the local optimal solution, but also improve the convergence speed of the algorithm in the later stage, so that it speeds up the operational efficiency of the algorithm; 3.the improved updating rule of pheromones by high-quality paths to avoid blind updating pheromones and to improve search efficiency of the algorithm.

Establishment of environment model based on grid method
Different from the walking path in the real physical space, the path planning of the algorithm is carried out in an abstract environmental model. The completed environmental model is stored in the computer for carrying out the path planning conveniently, which lays the foundation for solving practical problems. Because the obstacle environment used in this article requires a little computer memory, the grid method is chosen for space planning. The grid method is a grid-based planning method and was proposed by W.E. Howden in 1968 [10] . It essentially describes the environment as a unit grid of the same size, which is a common way to solve the problem about the environment modelling of path planning. The grid method is convenient to deal with obstacle boundary problems, and it is also convenient to store and manipulate data. The specific working principle of the grid method is to assume that the environment in which the robot moves is limited two-dimensional space, and divides the space area row by row into many cells of the same size. The state of the grid is marked according to whether there is an obstacle in the cell. When there is an obstacle in the grid, the state of the grid is 1; otherwise, the state is 0. If the obstacle in the grid does not occupy the entire grid, it will still be processed as if it occupies the entire grid. As shown in Fig.1, the white one is an obstacle-free grid, which means the robot can completely feasible move to it, and the black one is a full obstacle grid, which means the robot cannot completely infeasible move to it. The robot is regarded as a mass point. When there are no obstacles around, the robot can move in 8 directions, as shown in Fig.2.

Establishment of environment model based on grid method
ACO is derived from the scene of ant foraging in nature. During foraging, the ant in an ant colony cooperates with other ants to share the information about the pros and cons of foraging paths by releasing pheromones. The shorter the path, the more pheromones there are. Finally, the ant colony found the shortest foraging path under the effect of positive feedback [11] .

Transition probability function
Ants choose the path according to length of the path and the concentration of pheromones. In any iteration, the transition probability of ant k(k=1,2,...,m) moves from the current position i to the next position j is: where α is the heuristic factor of pheromones, and represents the importance of the concentration of pheromones τij(t); β is the expected heuristic factor, and indicates the importance of the heuristic function ηij(t); τij(t) is the concentration of pheromones between the position i and the position j in the tth iteration; ηij(t) is the reciprocal of the distance from the position i to the position j, namely ηij(t)=1/dij; allowk is the collection of unreached locations [12] .

Updating strategy of pheromones
When all ants have reached the target position or completed the path search, the pheromone concentration of all paths will be updated according to the residual pheromone and the new pheromone by each ant: where Q denotes the intensity of pheromones as a constant; Lk is the total length of the path taken by ant K in this iteration [13] .

Construction of heuristic function
When basic ACO solves the path planning problem, the heuristic function ηij(t) is only related to the distance dij from the current position i to the next position j. Ants will inevitably generate a large number of cross paths in the initial search process. Due to the limitation of the taboo table, ants will fall into the local optimum during the optimization process, and the resulting path will not be the optimal path. Therefore, the new heuristic function introduces the directional factor, and causes effects as follows: reduce inflection points of the path, improve the orientation, guide ants to move in the direction of short length and fewer inflection points, strengthen the accuracy and speed of the algorithm, and prevent ants falling into local optimum. The heuristic function after adding the directional factor is: where, φij represents the directional factor, which is the value of the inflection point, as shown in formula (5). Suppose the node at the current position is i, the node at the next candidate position is j, and the ending point is E. The angle θ between the line from node i at the current position to node j at the next position and the line from node j at the next position to the ending point. as shown in Fig.3.

Fig.3
A schematic of the path of the node to be selected the value of the candidate node j is: in formula (5), when θ is larger, the value of the inflection point is smaller, that is, the value of the directional factor is smaller. Therefore, the path will be shorter and the node will be more easily selected. When θ is 0, the path has no inflection point and the value of the directional factor is extremely small, that is, the value of heuristic function ηij is more enormous. Because of this, the ant will move to that mode.

Strategy of adjusting adaptively the volatile factor of pheromones
In basic ACO, the volatile factor of pheromones ρ∈[0,1] remains unchanged during the entire process of path planning. And it has an important influence on global path search and the convergence of the algorithm. The algorithm has different performance requirements at different stages, but the strategy with ρ fixed value cannot optimize the performance of the algorithm. In response to this problem, this paper proposes a strategy of adjusting adaptively the volatile factor of pheromones according to the complexity of the environment and the number of ants' iterations, as shown in formula (6). This strategy enhances the global search ability of ants, prevents local optimization, improves the convergence speed of the algorithm in the later stage, and speeds up the running rate of the algorithm. n is the current iteration number, nmax is the total iteration number, nTa and nTb are set as the limit range of iteration times according to the pheromone volatilization factor. The basic idea of this strategy is: when n≤nTa, the initial pheromone volatilization factor is set to a smaller value ρ0. At this time, more pheromones are left on the path, so that the ants can obtain strong optimization ability through the guidance of positive feedback. When nTa≤n≤nTb, and the length of the optimal path is the same in Ta consecutive iterations, then adjust adaptively the value of the pheromone volatilization factor. Since nTb/n<1, n is larger, the volatile factor of pheromones will become smaller after opening the root. Because of the above, the global search ability of ants has been enhanced, ants avoid falling into the local optimization, and the final path is prevented to be a non-optimal path. When n≥nTb, and the length of the optimal path is the same in Tb consecutive iterations, which is less than or equal to the historical optimal path; or when the optimal path solved by the current iteration is longer than the historical optimal path in length, the volatile factor of pheromones can be adjusted adaptively again. Since (nmax-n+1)/nmax<1, the volatile factor of pheromones will become larger after opening the root. Because of the above, the speed of subsequent ants searching for the optimal path is improved, and the convergence of the algorithm in the later stage is accelerated.

Improvement of the updating rule of pheromones
In basic ACO, pheromones of all paths are updated after each iteration. In the early stage of the algorithm, it is better for ants to take a lot of paths when they are in the exploring state. But when the ant has found the optimal path, if the pheromone updating is performed on all paths after each iteration, the ant may still go to the non-optimal path, which will cause the convergence of the optimal path to become very slow or even unable to find the optimal path [14] . In order to avoid this situation, an update rule of high-quality ants is proposed. The rule is that after each iteration, deadlocked ants will be discarded, and other ants that reach the end will be selected. Besides, high-quality paths of the ants chosen will be sorted according to the length of the path from long to short, and pheromones of high-quality paths will be updated globally according to the formula (7) [15] .
The local pheromone updating is processed by segment, as shown in formula (8). Based on the volatile factor of pheromones can be adjusted adaptively, the accumulation of new pheromones for high-quality paths in the early stage of the algorithm. If the length of the path is shorter, the pheromone accumulation will be more. This will make the role of high-quality paths in the follow-up process can be effectively played, so that subsequent ants will be able to will be more likely and quickly to find the optimal path. However, with the increase of iterations, the accumulation of new pheromones may be too much, which leads to the decrease of the global search ability of ants and the final path may be non-optimal. In order to solve this problem, after the volatile factor of pheromones has been adjusted adaptively, if there are the following situations: 1.the number of current iterations is greater than the preset number of iterations; 2.the length of the current optimal path is longer than the length of the historical optimal path, pheromones on paths that are larger than the average length of all high-quality paths will be weakened according to formula (9), and pheromones on the historical optimal path will be strengthened again according to formula (8).
where λ is the proportional increasing coefficient, and its value is a positive number less than 1; Nk is the number of the high-quality path that the current ant passes through in this iteration; Nmax is the total number of high-quality paths in this iteration; length of the high-quality path that the ant passes through; ���� the average length of all high-quality paths in this iteration. As shown in Fig.4, the flow chart of the improved ant colony algorithm in this paper. The specific steps are as follows:

Implementation steps of improved ant colony algorithm
Step 1: Design a grid map, and set the starting grid, the ending grid and obstacle grids.
Step 2: Initialize various parameters, such as the number of ants M, the total number of iterations N, the initial pheromone matrix τau, the volatile factor of pheromones ρ, the intensity of pheromones Q, the heuristic factor of pheromones α, the heuristic factor of expectation β, the taboo table and other parameters' information.
Step 3: Search for the path from the starting point. Calculate the value of each inflection point according to formula (5), and bring it into formula (4) to obtain the value of the heuristic function. Then determine the next node that can be reached according to the transition probability formula (1), and add it to the taboo table. Stop searching until the ant reaches the ending node or has no nodes to be reached.
Step 4: After all ants have completed a search of paths, to record paths that searched by ants reaching the end. Sort these high-quality paths according to the length of the path from long to short, and select the optimal path in this iteration.
Step 6: Determine whether the number of ant colony iterations reaches the maximum, if not, return to step 3, otherwise, output the optimal path.

Simulation experiment and result analysis
In order to verify the effectiveness and practicability of this algorithm in a complex environment with obstacles, simulation experiments about the path planning algorithm of 20x20 and 30x30 environmental model were carried out in Matlab R2018a. Meanwhile, in the same environmental model, the results of the T-IACO are compared with the results of the basic ACO, and contrast to the results obtained by the improved ant colony algorithm of literature [9] (L[9]-IACO). The parameters of these experiments are shown in Table 1:

20x20 simple simulation environment
In the 20x20 simple environmental model, the results of the path planning are presented in Fig.5 with the three different algorithms, Fig.5 (1) shows the results of the path planning with basic ACO, Fig.5 (2) shows the results of the path planning with L [9]-IACO, Fig.5 (3) shows the results of the path planning with T-IACO. The convergence curves are shown in Fig.6 by three kinds of algorithms, the blue curve is the convergence curve obtained by basic ACO, the green curve is the convergence curve obtained by L [9]-IACO, the red curve is the convergence curve obtained by T-IACO. And the data of the simulation experiment are shown in Table 2. It can be seen from Fig.5 that basic ACO falls into the local optimum during searches, and the subsequent ant colony always search this path under the positive feedback. Besides, the reason is that L [9]-ACO and T-ACO for the pheromone updating which lead the searches of ants to avoid the local optimum. In Fig.6, the introduction of the directional factor constructs a new heuristic function lead to enhance the accuracy and speed of T-ACO. Therefore, the best path planned by T-ACO among three algorithms and it converges after the seventh iteration of T-ACO. As shown in Table 2, because of the volatile factor of pheromones can be adjusted adaptively, T-ACO which is compared with the other two algorithms not only has the fastest convergence rate, but also has the shortest running time as 6.875s.

30x30 complex simulation environment
In the 30x30 complex environmental model, the results of the path planning are presented in Fig.7, Fig.7 (1) shows the results of the path planning with the basic ACO, Fig.7 (2) shows the results of the path planning with the L [9]-IACO, Fig.7 (3) shows the results of the path planning with the T-IACO. The convergence curves are shown in Fig.8 by three kinds of algorithms, the blue curve is the convergence curve obtained by basic ACO, the green curve is the convergence curve obtained by L [9]-IACO, the red curve is the convergence curve obtained by T-IACO. And the data of the simulation experiment are shown in Table 3. T-ACO which is compared with the other two algorithms still makes a good improvement in terms of local optimality. The running time of algorithms are similar, but it can be found from the Fig.7 that the optimal path of T-ACO is the shortest and also has the fewest inflection point. In Fig.8, T-ACO has the fastest convergence speed and the least number of iterations. As can be found from Table 3, T-ACO effectively reduces the time of paths' searching and finds the optimal path. The simulation results show that T-ACO still has high global performance and fast convergence in the complex environment, which shows the superiority and effectiveness of T-ACO. Because of the reduction in driving time and turns of the robot, the energy consumption of the robot also has reduced. Moreover, the more complex and larger environmental scale is, the more superior T-ACO is.

Conclusions
This paper proposes an improved ant colony algorithm to solve problems of local optimum and slow convergence. The introduction of the directional factor constructs a new heuristic function in this algorithm, and its effects are as follows: reduce the inflection points of the path, improve the orientation of the algorithm, guide the ant towards a short length with few inflection points, improve the accuracy of the algorithm, let the algorithm avoid falling into the local optimal. The volatile factor of pheromones can be adjusted adaptively during ants' search: at the early stage of the algorithm, the volatile factor of pheromones is small to enhance the ability of global search and prevent ants from falling into local optimality; at the later stage of the algorithm, the volatile factor of pheromones is enlarged to improve the operation efficiency and accelerate the convergence speed of this algorithm. The improved pheromone updating rule with adjusting adaptively the volatile factor of pheromones can prevent the 'premature' phenomenon of this algorithm and improve the robustness of this algorithm. Simulation results show that T-ACO is feasible and effective, and can solve the deficiency of basic ACO in path planning, so that reduces the energy consumption of cleaning robots and improves the level of intelligence and automation of robots.