Discrete Teaching–learning-based optimization Algorithm for Traveling Salesman Problems

. In this paper, a discrete variant of TLBO (DTLBO) is proposed for solving the traveling salesman problem (TSP). In the proposed method, an effective learner representation scheme is redefined based on the characteristics of TSP problem. Moreover, all learners are randomly divided into several sub-swarms with equal amounts of learners so as to increase the diversity of population and reduce the probability of being trapped in local optimum. In each sub-swarm, the new positions of learners in the teaching phase and the learning phase are generated by the crossover operation, the legality detection and mutation operation, and then the offspring learners are determined based on greedy selection. Finally, to verify the performance of the proposed algorithm, benchmark TSP problems are examined and the results indicate that DTLBO is effective compared with other algorithms used for TSP problems.


Introduction
The traveling salesman problem (TSP) is the most used and famous optimization problem within the kind of combinatorial optimizations [1].The main aim of TSP is to obtain the shortest tour that starts from one city and visits each of the other cities once before returning to the starting city.This is one of the most fundamental NPcomplete optimization problems [2].Since the TSP problem belongs to the class of NP complete problems, its solution grows exponentially with the increase in distribution points.Thus, exact algorithms [3][4] are not capable of solving problems with large dimensions.On the other hand, heuristics [5][6] are thought to be more preferable and efficient for a complex TSP problem and have become very popular with researchers.
Although heuristic methods can be used to solve complex TSP problems, they are easy to be trapped into local optimum and cannot obtain the optimum solution.As a result, a new kind of optimization algorithms named meta-heuristics has been proposed in recent years and these relevant meta-heuristics have been proposed to solve complex problems [7][8].In contrast to heuristic algorithms, meta-heuristic algorithms make use of the randomization of the search space, effectively explore and exploit some promising solution regions.Hence, meta-heuristic algorithms have many advantages of implementation simplicity, adaptability, flexibility and robustness, and the algorithms can obtain a more better or near-optimal solution of higher quality in relatively low time.Here many meta-heuristic algorithms have been developed to solve complex TSP problems [9][10].
Recently, a meta-heuristic algorithm called Teachinglearning-based optimization algorithm (TLBO) has been developed [11].TLBO and its variants have shown significant performance for combinatorial optimization problems [12,13].In this paper, we present a new discrete version of TLBO to solve TSP problems.In the proposed method, the learner representation scheme is redefined based on the characteristics of TSP problem, and the updating rules for learners are also redesigned.Finally, to verify that our proposed DTLBO algorithm is a promising method, 7 different TSP problems are examined.The results indicate that DTLBO is effective compared with other algorithms used for TSP problems.

Teaching-learning-based optimization
Teaching-learning-based optimization algorithm (TLBO) is a population-based optimization algorithm inspired from the philosophy of teaching and learning.The learners correspond to the individuals of evolutionary algorithms.The teacher is the learner with the best fitness and she/he can share her/his knowledge to all the learners, thus enhancing the average grade of the class in teaching phase.Each learner also learns knowledge from another randomly selected learner in the class, thus enhancing the average grade of the class in learning phase.

Learner initialization
In the initial phase, all learners in the class are randomly initialized.The i-th learner 1 2 ( , , , ) in the class can be generated as follows.min max min ( ) where min i X and max i X are the lower and upper bounds of the control variables respectively, rand is a random vector within the range [0, 1].

Teaching phase
During the teaching phase, the task of learning is to increase the mean marks of the class.In this phase, all learners update their positions according to the difference between the current teacher and the mean position of the class.The updating formula of the i-th learner X i in the class is expressed as follows.
( ) where, newXi is the new positions of the i-th learner Xi, Teacher is the best learner of the class and   is a teaching factor that decides the value of mean to be changed, the typical value of it is either 1 or 2. r is a random number in the range [0, 1].

Learning phase
Learners also can increase their knowledge by means of group discussions, presentations, formal communications, and so on.In this phase, each learner update their positions learns something new if the other learner has more knowledge than him or her.The updating formula of the i-th learner Xi in the class is expressed as follows.
where, newX i is the new positions of the i-th learner X i , X k is a randomly selected learner from the class, r is a random vector in the range [0, 1], f(X i ) and f(X k ) is the fitness value of the learner X i and X k , respectively.
After the new candidate learner has been generated in the teaching and learning phases, the greedy selection is executed to determine the offspring learners.That is, if the value f(newX i ) is better than the value f(X i ), then replace X i with newX i .

The proposed DTLBO
In this section, the discrete variant of TLBO is presented.First, initialize a population with N individuals based on a problem-specific strategy.Then, all learners are randomly divided into several sub-swarms with equal amounts of learners.Moreover, the best individual and the mean individual in each sub-swarm are determined, and the crossover operation, the legality detection and mutation operation are performed to generate a candidate learner for each learner in the teaching and learning phases.Finally, the learner is updated by performing a comparison of the current learner and the candidate learner.The above process ends when the termination criterion is met; otherwise, the next iterative process is performed.The detailed description of the DTLBO algorithm is as follows.

Individual representation and initialization
For a TSP problem, the goal is to obtain a city sequence of with the shortest path.Therefore, it is natural and intuitive to code the solution as a set of real integer values based on the number of cities, whose length is the same as the number of total cities and where each real integer value corresponds to a unique city number.For a TSP problem with 7 cities, the learner can be represented as shown in Figure 1. is generated after initialization.The i-th row of the matrix corresponds to the i-th learner individual, and each column of the i-th row represents a city.The column order represents the order of visiting cities. Concretely, the initialization of the learner can be shown as follows: where the randperm(N) representation yields a random permutation of the integers from 1 to N.

The determination of teachers and mean individuals
In the original TLBO algorithm, the teacher tries to disseminate knowledge among learners, which in turn helps learners to achieve good grades and increases the knowledge level of the entire class, thus improving the mean results of the class.The updating of the learner is implemented based on the teacher and the mean individual.However, all learners learn from the random difference between the teacher and the mean individual, and this easily causes learners to become trapped in local optima.To avoid this situation, the multi-swarm mechanism is introduced to maintain the diversity of the population.That is, all learners are randomly divided into several sub-swarms with equal amounts of learners in each iterative process.Suppose that there are K (K=4 in this paper) learner individuals 1 2 , , , in the i-th subswarm, and NTeacheri and NMeani are the best individual and the mean individual respectively in the i-th subswarm, which can be respectively given as follows: 1 2 ( , , , ) , , , Different from the original TLBO algorithm, in our approach, NTeacher i and NTeacher i are all integer vectors in which each element corresponds to a unique city number.However, there are the same elements in the mean individual according to Eq.(6).For example, the mean individual may be expressed as follows: Obviously, there are duplicate elements in the learner NTeacher i , and they are illegal.Hence, the legality detection for the learner individual must be executed.

EITCE 2017
Generally, the executing process of the legality detection for the individual NMean i is given as follows: Step 1: Count the number of times each city appears.For the above individual NMean i , cities marked 3, 4 and 6 appears 2 times, and the city marked 5 appeared once.It can be expressed as a vector A. In the vector A, if a city appeared more than once, the number of times this city appears is written into the last appeared location of this city; otherwise, the other locations are set to 0.
Step 2: Determine the cities that did not appear.For the above individual NMeani, cities marked 1, 2 and 7 have not appeared.It can be expressed as a vector B. In the vector B, the order number of a city that did not appear is written into the associated location of this order number, and the other locations are set to 0.
Step 3: Find out the last location of a city that has appeared more than once.In the vector C, if a city appeared more than once, the order number of this city is written into the last appeared location of this city; otherwise, the other locations are set to 0.
Step 4: Determine the location of cities that did not appear in the individual.For the above individual NMean i , in the vector C, the order numbers of cities that did not appear is written into the locations with the values of zeroes, and then the final NMean i can be obtained.

Teaching phase of DTLBO
In the original TLBO algorithm, the updating of the learner is implemented according to Eq.( 2) and it is used to deal with continuous optimization problems.For discrete TSP problems, it needs to redefine the updating strategy of learners.
As a result, assume that there are four learners X X X X in the i-th sub-swarm.Their updating strategies can be given as follows: where  represents the crossover operation.Generally, the crossover operation  can be described in detail as follows.Assume that there are two learners A and B shown as follows: A = 2 3 |7 5 1| 4 6 B = 1 7 |4 6 5| 2 3 Then, randomly select two gene loci and exchange the corresponding elements between two gene loci in A and B: AA = 2 3 |4 6 5| 4 6 BB = 1 7 |7 5 1| 2 3 Due to existing duplicate elements in AA and BB, they need to be revised by means of the legality detection to obtain newA and newB as follows: newA = 2 3 |4 6 5| 7 1 newB = 6 4 |7 5 1| 2 3 After the crossover operation has been completed, the legality detection is executed so that the duplicate elements in the learner are revised.And then, the mutation operation is executed to generate the candidate learner shown as follows: where  represents the mutation operation.
In this paper, the reverse mutation is performed and it is given as follows: randomly select two gene loci in the order of visiting cities, and then reverse the elements between two gene loci to generate the new mutation individual.For example, two gene loci in newAA is 3 and 5, the new mutation individual newAA can be obtained by the mutation operation: newA = 2 |3 4 6 5| 7 1 newAA = 2 |5 6 4 3| 7 1

Learning Phase of DTLBO
In the learner phase of the original TLBO algorithm, the learner is updated according to the difference between the current learner X i and the current random selected learner X j .In the learning phase of the DTLBO algorithm, the current learner individual X i is updated as follows: where,  represents the crossover operation, and it is as same as the crossover operation in the teaching phase.
Just like the learning rules in the teaching phase, when the crossover operation is completed, the duplicate elements in the learner are revised, and then the mutation operation is executed to generate the candidate learner shown as follows: where  represents the mutation operation.
Just like the original TLBO algorithm, the greedy selection is executed to determine the offspring learners.That is, after the candidate learner has been generated in the teaching and learning phases, if the value f(newX i ) is better than the value f(X i ), then replace X i with newX i .

DTLBO algorithm for solving TSP
In this section, in order to verify the validity and correctness of the algorithm, we have compared the effectiveness and efficiency of our proposed algorithm on 7 classic benchmark problems for the TSP in comparison with several other meta-heuristic algorithms.

Parameter setting
The algorithms are tested on several sample instances of TSPLIB [14] with different cities.Every algorithm requires appropriate parameter settings to get proper outcome.For all population-based algorithms, the size of population on the test problems is set to 100, and the maximum number of iteration on the test problems is set to 500 for fair comparison.

Comparison of simulation result
The simulated experimental results are listed in Table 1.In the table, KOS denotes the known optimal solution as reported in TSPLIB.Best denotes the best solution found by each algorithm and Mean denotes the average value of all solutions by running independently 20 times.to the known optimal solution 15397 than the other four algorithms.For Att48, the average value obtained by DTLBO is 33524, while the other four algorithms are worse than the DTLBO.For Eil51 and Eil76, ABC has the best performance in terms of the average value and DTLBO is better than ACO, PSO and GA, While ABC can reach the known optimal solution of 426 for Eil51.For rand50 and rand75, DTLBO performs better than PSO and GA.
To demonstrate the effectiveness of the new DTLBO algorithm, the convergence curves and route tracing obtained by PSO, GA and DTLBO are given in Figure 3.However, we are only able to show convergence graphs for the proposed algorithm DTLBO, SPO and GA because the data of ACO and ABC were taken from literature and this information was unavailable.It can be concluded from Table 1 and Figure 3 that the DTLBO has a good performance for the most of all TSP problem instances in this paper.

Conclusion
In this paper, a new discrete teaching-learning-based optimization algorithm (DTLBO) is proposed for solving the traveling salesman problem.The proposed approach used multi-swarms mechanism to maintain the diversity of the population and reduce the probability of being trapped in local optimum and introduced the new crossover operation, legality detection and mutation operation into the teaching phase and the learning phase.The numerical simulation tests were carried out on 7 different TSP problems, and the results show that the DTLBO algorithm is a promising method for TSP problems.However, it can also be seen from the results that, the performance of DTLBO is also decreased with the increasing of city size.Hence, future work will concentrate on how to improve DTLBO's performance for TSP problems with large scale cities.
mean position of the class.

Figure 1 .
Figure 1.Individual representation of learners Based on the above individual representation method, the learners of population can be generated by random initialization method.Assume that the population size is NP, the city size of the TSP problem is N, and then a matrix of NP N is generated after initialization.The i-th row of the matrix corresponds to the i-th learner individual, and each column of the i-th row represents a city.The column order represents the order of visiting cities. Concretely, the initialization of the learner can be shown as follows:

Figure 2 .
Figure 2. The legality detection for the learner individual

Figure 3 .
Oliver 30 city TSP optimization curve and path

Table 1 ,
for Oliver30, DTLBO and ACO have the best performance in terms of the average value of 423.74 in all of 20 runs, and the average values obtained by ABC, GA and PSO are 432, 455.45 and 464.71 respectively.For Chn31, DTLBO obtains the average value of 15397, and those of ACO, ABC, PSO and GA are 15602, 15942, 17097 and 16646.It can be clearly seen that DTLBO is closer