Improved Simulated Annealing Genetic Algorithm based Low power mapping for 3D NoC

Mapping of IP(Intellectual Property) cores onto NoC(Network-on-Chip) architectures is a key step in NoCbased designs. Energy is the key parameter to measure the designs. Therefore, we propose an Improved Simulated Annealing Genetic Alogrithm, abbreviated as ISAGA. The algorithm combines the parallelism of Genetic Algorithm(GA) and the local search ability of Simulated Annealing(SA). We improve the initial population selection of GA to get the lower power consumption mapping scheme. The experimental results show that compared with the GA, ISAGA has good convergence and can search the optimal solution quickly, which can effectively reduce the power consumption of the system. In the case of 124 IP cores, the average power consumption of the ISAGA is reduced by 32.0% compared with the GA.


Introduction
With the rapid development in nanometer and VLSI technology, integrating a large number of IP cores into a single chip will be a challenge for the System-on-Chip (SoC)1. The emergence of the NoC architecture has completely solved the SoC problem from the architecture2. However, with the rapid increase in the number of IP cores, 2D NoC faces a range of issues such as chip area, performance, bandwidth, and power consumption3. Therefore, the concept of 3D NoC was proposed4. Compared with 2D NoC, it has a smaller area and shorter delay, and has greatly improvement in performance and power consumption5.
The 3D NoC research is classified into three major categories: 3D NoC infrastructure, 3D NoC communication mechanism, and 3D NoC mapping optimization5. The NoC mapping algorithm is to assign the processing unit in the task graph to the NoC architecture resource node based on the given task characteristic diagram and the NoC architecture7. This article is based on the 3D-Mesh topology structure, and mainly researchs the mapping algorithm to the NoC architecture. Similar to the problem of task allocation, the IP core mapping problem is an NP-hard problem in which we explore the NoC design space to find the Pareto optimal mapping sequence8. At present, many researches have been carried out at home and abroad, such as Ascia G proposed a genetic algorithm to solve the multiobjective optimization problem of Noc mapping7. Chang proposed an improved tabu search algorithm to continuously search for the best mapping scheme by local search and elite reorganization10. Li proposed a power consumption model for 3D-mesh, implemented a NoC mapping for communication energy consumption using ant colony algorithm based on the 2D NoC research method11.Kartikeya Bhardwaj proposed C3Map and ARPSO Based Mapping Algorithms for Energy-Efficient 3D NoC Architectures12. PK Hamedani proposed ILP based static thermal sensing mapping algorithm13. Xiao-Hang Wang proposed an energy efficient Run-time incremental mapping algorithm using TSV technology14. F Ge and G Feng proposed a thermal-aware mapping algorithm for 3D mesh network-on-chip architecture15. However, these mapping algorithms still have deficiencies and need to be further improved.
In this paper, we propose an Improved Simulated Annealing Genetic Algorithm(ISAGA) based low power mapping for 3D NoC architectures. Our main contributions are as follows：  We obtain a better mapping scheme by improving the initial population.The improvement of the initial population can effectively improve the quality of the population, so that the algorithm can search for the optimal individual (the optimal mapping scheme) faster and more accurately.  adding simulated annealing at the mutation operator stage. The addition of the simulated annealing algorithm makes the algorithm less likely to fall into the local optimum, and strengthens the global optimization ability of the algorithm. Rest of the paper is organized in many sections. We describe the NoC mapping problem and give the power consumption model in Section 2. We introduce the principle of GA and ISAGA, and give the specific steps in Section 3. We use some relevant experiments to prove the point is effective in Section 4. Finally, we conclude our proposed work in Section 5.

Mapping Description
3D NoC mapping is to allocate IP cores to each resource node in NoC according to a certain mapping algorithm, so that the entire NoC has the smallest amount of traffic, and finally achieves the purpose of reducing power consumption based on the known NoC architecture and IP core traffic. The mapping process is shown in Figure 1. The subtasks correspond to the IP cores in the NoC architecture. Definition 1 defines a weighted directed acyclic task feature graph MCG = < V, T >, The vertex V represents a task, and the directed arc t , ϵ T represents the communication between the tasks t and t , their weights ω , are the amount of communication data between them. If there is no communication relationship between t and t , then Equation (1) Definition 2 defines a NoC topology structure diagram NTD =< E, P > . A vertex e ∈ E represents a resource node and p , ϵ P represents a communication path between p and p . The weights f , of edges p and p are the communications volume between them.
Definition 3 For 3D-mesh type NoC, M represents the jump distance(Manhattan distance) from e to e , is expressed as Equation (2). M = |x − x | + |y − y | + |z − z | (2) Therefore, the traffic f , of the resource nodes e to e can be expressed by Equation (3).

Power Consumption Model
The NoC system power consumption is mainly composed of the energy consumed by the processing node during the operation and processing and the energy consumed by the network topology in the data transmission17. The energy consumed by the network topology consists of two parts: the energy consumption of the routing node and the energy consumption of the communication path.. Therefore, in order to minimize the power consumption, the power consumption model proposed in18 is adopted in this paper. The energy consumption for transmitting a basic length of 1 data at NoC can be expressed by Equation (4).
, E represents the energy consumed in crossbars, buffers, internal interconnects, and adjacent routing nodes, respectively. Because E and E are very small actually, they are usually negligible, so change the formula (4) to E = E + E ( 5 ) Thus, the energy consumed by the 1 data from node e to node e is calculated as In equation (9), all constants except ∑ | | , are consta nts, so the objective function of low-power mapping can be expressed as In summary, we design a good mapping algorithm to reduce the amount of traffic and ultimately reduce the NoC communication power consumption.

Genetic Algorithms and Improvements
GA is a computational model based on Darwin's evolution theory and Mendel's genetic theory. It is a heuristic search method to search for optimal solutions by simulating the natural evolution process19.
As we all know, the IP core mapping problem belongs to the NP-hard problem. The coverage of GA is large, which is conducive to global optimization. At the same time, the algorithm is easy to implement parallelism. It has unique advantages in solving performance and computing time and is very suitable for solving IP core mapping problems.
This paper considers that the GA is too random and the initial population quality is low. Therefore, we propose a method to improve the initial population effectively, and considering the deficiency of the GA in local optimization, we add the SA at the mutation operator stage so that the progeny could jump out of the local optimum and tend to be the global optimum. In the following section, we call it ISAGA. The entire flow of the algorithm is shown in Figure 2 ( the dotted box in the figure is which we proposed in this paper).

Coding scheme
This paper adopts real number coding. The vector is used to represent the assignment of the task graph to the IP core. Assume vector = ( , , ⋯ , ), where n is the number of IP cores and is the IP core number. All IP cores form a chromosome, representing a mapping scheme.

Initial population improvement
The initial population selection of the GA is random, so the population quality is low. We improved the initial population selection method, so that the best population can be obtained in tens of generations or a hundred generations, which increases the population's iteration speed and reduces the data traffic greatly, reducing the power consumption ultimately. In20, a greedy algorithm is proposed to construct the initial population, so that the initial population quality is improved and the power consumption is reduced. However, when the initial population is generated, the algorithm requires a large amount of calculations for each individual. When determining the position where each individual is mapped to 3D NoC, it is necessary to calculate the fitness value of all unmapped individuals to all available positions, and all the individuals need to perform this operation, so the initial population running efficiency is not high.
We use a depth-first approach to traverse the task graph to determine where each individual is mapped to the 3D NoC, without calculating the fitness value of all unmapped individuals to all available positions. It is only necessary to calculate the Manhattan distance between it and its neighbors. Our method greatly reduces the amount of calculation and effectively improves the efficiency of the algorithm. The specific steps of the algorithm are as follows:  Construct two arrays to store the node access flags of task graph MCG and topology graph NTD (nodes are numbered from 0).

Simulated Annealing Genetic Algorithm
The Simulated Annealing (SA) was proposed by N. Metropolis21 in 1953. It is a stochastic optimization algorithm based on the Monte-Carlo iterative solution strategy. It can jump out of the local optimal solution probabilistically and eventually tends to global optimality. It overcomes the poor local search ability and precocity of genetic algorithm. Combining it with GA will give full play to GA's global search ability and SA's local search ability.
Based on this consideration, we added SA at the mutation operator stage of the GA to ensure that the generated progeny do not fall into a local optimum, so that the population jumps out of the local optimum and eventually reaches a global optimum. The core idea of the algorithm: Let new individual fitness be f, the threshold of change is f'. When > ′, accept new individuals, otherwise, accept new individuals with a certain probability = exp(( − ′)/ )) . where T is the temperature. The algorithm is described as follows:  Initialization: initial temperature T=T0, i=0.  When > , do the following steps, otherwise, return to the optimal solution.
 If i < = number of cross-operations, do the following steps, otherwise, return to the optimal solution.
 Select n pairs of individuals from the group as the parent, for each parent, do the following:  The parent P1, P2 cross operation to generate children S1, S2  If > , > , then use s1, s2 to replace p1, p2 for the next genetic operation, Otherwise, hold p1, p2 with the probability of exp(( − )/ ).  Drop temperature in a certain way, T=Ti+1, i=i+1.  Return step3. Where T0, T final indicate the initial temperature and the termination temperature, respectively.

4Equations and mathematics Experimental results and analysis
The source code of the paper is written in C++ language, and the programming tool is Visual Studio 2012. It runs in the windows environment. In order to verify the feasibility of the algorithm, we use three classic task graphs for comparison experiments. And in order to reflect the universality of the algorithm, we also use a random task graph generated by the TGFF22 to perform a comparison experiment.

Parameter Design
This paper is a mapping algorithm based on GA. There are many parameters that will affect the performance of the algorithm. We can improve the efficiency of the optimal solution by controling of these parameters. Specific parameters are designed as shown in Table 1.

Comparison and Analysis of Experimental Results
The three classic task graphs used in this experiment are MWD, VOPD and DVOPD. They are 12, 16, and 32 nodes, respectively.

Comparison of Convergence Rates Based on Classical APCG
In this section, we compare the convergence speeds of the classic task graphs MWD, VOPD, and DVOPD. The abscissa represents the number of iterations, and the ordinate represents the total amount of communications. The smaller the total amount of communications, the smaller the system power consumption. The comparison of GA and ISAGA for MWD, VOPD and DVOPD is shown in Figure 6(a)-(c).  According to Figure6(a)-(c), with the number of iterations increases, the total traffic decrease gradually, the ISAGA has a significantly lower total communication volume than the GA. Due to the ISAGA has a higher quality in the initial solution, it will have a better solution in fewer iterations. The ISAGA has greater advantages in terms of convergence speed, and with the increase in the number of IP cores and traffic, this advantage will be even more pronounced.

Power Comparison Based on Classic APCG
Due to the randomness of the GA, for different APCG, we use GA and ISAGA to solve the average value of 30 times respectively, as shown in Table 2.
As shown in Table 2, analyzing the results of MWD, the average power consumption of the ISAGA is reduced by 1.3% compared to the GA, the maximum power consumption is reduced by 5.4%, and the minimum power consumption is flat. Although it has been reduced, the reduction is relatively small. Due to the small number of nodes, the GA and the ISAGA can find a better solution faster in a certain number of iterations.
Analyzing the VOPD results, compared with the GA, the ISAGA reduces the average power consumption by 6.2%, reduces the maximum power consumption by 15.2%, and reduces the minimum power consumption by 3.0%. In terms of maximum power consumption, the ISAGA has a significant reduction in power consumption compared to the GA, because the population of GA is generated randomly, the population quality is relatively low. After using the ISAGA, the initial population quality has been significantly improved compared to random.
Analyzing the DVOPD results, compared with the GA, the ISAGA reduces the average power consumption by 14.4%, the maximum power consumption by 19.2%, and the minimum power consumption by 11.8%. With the increase in the number of IP cores, it is clear that the ISAGA has a significant reduction in power consumption in all aspects. The ISAGA accelerates the convergence speed by improving the initial population quality. On the other hand, due to the global optimization ability of SA, the algorithm avoids falling into the local optimal solution. The combination makes the ISAGA have better power consumption than the GA.

Comparison of Power Consumption Based on Random Task Graph
TGFF was used to generate random task graphs with IP core numbers of 24, 45, 60, 80, 98, and 124. The task graph attributes are shown in Table 3. For these random task graphs, two algorithms were used to solve the average of 20 times. The experimental results are presented as percentages. The experimental comparison results are shown in Table 4. Comparing ISAGA and GA, we can see that when the number of IP cores is small, the reduction in power consumption is small, less than 10%. Because the number of nodes is too small, it is trapped in a local optimum, due to the number of nodes is too small, it is trapped in a local optimum, so the minimum power consumption is reduced by a negative growth. However, with the increase in the number of IP cores, the average power consumption reduction ratio has reached 32%, and the reduction ratio of the maximum power consumption has reached 35% in 124 IP cores, which is a very significant performance improvement. From the overall trend of the data in the table, with the increase of IP cores, this reduction will gradually increase.

Comparison with other improved genetic algorithms
Literature20 proposed an improved genetic algorithm based on greedy algorithm (Improved Genetic Alogrithm), refered to as IGA, which uses greedy algorithm to generate initial population to reduce power consumption. The algorithm proposed in this paper(Improved Simulated Annealing Genetic Alogrithm) is simply referred to as the ISAGA. In the following, the two improved genetic algorithms are compared by the classical task graph and the random task graph. As shown in Figure 7, analyzing the VOPD comparison results, ISAGA reduces more power consumption than IGA. From the experimental results, it can be seen that the average power consumption is reduced by 38.9%, the maximum power consumption is reduced by 50.3%, and the minimum power consumption is reduced by 74.3%.

1) Classic task graph power comparison
Analyzing of DVOPD comparison results, ISAGA compared to IGA power reduction is more obvious. From the experimental results, it can be seen that the average power consumption is reduced by 68.8%, the maximum power consumption is reduced by 80.1%, and the minimum power consumption is increased by 44.6%. Due to the relatively large number of DVOPD tasks, the initial population improved by ISAGA is not easy to fall into local optimum, and because of the addition of simulated annealing algorithm, the algorithm can jump out of local optimum and tend to be globally optimal.
From the above experimental results and analysis, we can see that when the number of IP cores is large, the advantage of ISAGA is more obvious. 2) Comparison of random task graph power consumption As shown in Figure 8, in terms of average power consumption, when the number of IP cores is small, the IGA decreases slightly compared with ISAGA, because in the case of fewer tasks, the algorithm is prone to fall into a local optimum, however, with the increase in the number of IP cores, from the overall trend, the reduction in power consumption of ISAGA is gradually higher than that of IGA. In 124 IP cores, the power consumption reduction ratio of the ISAGA relative to the IGA is increased by 67.8%.
In terms of maximum power consumption, when the number of IP cores is 45, the power consumption reduction ratio of the ISAGA relative to the IGA is increased by 38.5%. When the number of IP cores is 80, the power consumption reduction ratio of the ISAGA relative to the IGA is increased by 68.9%. When the number of IP cores is 124, the power consumption reduction ratio of the ISAGA relative to the IGA is increased by 61.4%.
In terms of minimum power consumption, when the number of IP cores is 45, the power reduction ratio of the ISAGA algorithm relative to the IGA is increased by 84.7%. When the number of IP cores is 80, the power consumption reduction ratio of the ISAGA relative to the IGA is improved. 65.0%. When the number of IP cores is 124, the reduction in the power consumption of the ISAGA relative to the IGA is increased by 57.1%.
From the above experimental data, it can be seen that with the increase of the number of IP cores, the ISAGA has more significant advantages in reducing power consumption, and can reduce power consumption better than the IGA.

Conclusion
For the 3D NoC low-power mapping problem, we propose a genetic algorithm that improves initial populations and combines simulated annealing. We effectively solve the problem of combining the optimization algorithm with the mapping. Experiments show that ISAGA has a good effect in solving 3D NoC low-power mapping. In the case of 124 IP cores, the maximum power consumption generated by ISAGA is reduced by 35.3% compared with GA.

Outlook
1) Fever is an important issue to be considered in the design of 3D NoC. With the deepening of research, we will consider the factors of fever and other factors in the research of mapping algorithm.
2) At present, the research on mapping algorithms for heterogeneous 3D NoC architecture is still in its infancy, and there is still much room for development. In future research, heterogeneous features can be fully utilized, and different mapping algorithms are used for different types of task communication diagrams to improve the mapping efficiency of 3D NoC and reduce the power consumption of the system.