A Novel Dynamic Programming Approach for Optimizing Driving Strategy of Subway Trains

The reduction of operation energy consumption without decreasing service quality has become a great challenge in subways daily operation. A novel DP based approach is proposed for optimizing the train driving strategy. The optimal driving problem is first considered as a multi-objective problem with five optimal targets (i.e., energy saving, punctual arriving, less switching, safe driving and accurate stopping). The optimization problem is remodelled as a multistage decision problem by discretizing the continuous train movement in space. The process of dynamic programming is carried out in the velocity-space status space. Due to the discretizing rules of searching space, the optimal goals of safe driving and accurate stopping can be satisfied during the searching process. The rest of multiple goals are spilt into cost functions and constrains for each stage. Due to the multiple cost functions, a set of pareto optimal solutions can be achieved at each vertex during the process of dynamic programming. To further improve the efficiency of algorithm, two evaluation criterions are introduced to maintain the capacity of the pareto set at each vertex. A case study of Yizhuang urban rail line in Beijing is conducted to verify the effectiveness and the efficiency of DP based algorithms.


Introduction
As a fast and green transportation, urban rail transit can efficiently reduce congestion and pollution in our daily life. The fast growth of urban rail transit has brought huge energy consumption in recent years. According to the incomplete statistics, the total energy consumption of urban rail transit in China was 15 billion kilowatt-hours in 2018. The traction energy consumption accounted for 45.3 percent of the total energy consumption. Therefore, reducing traction energy consumption has become a hot research topic recently.
The traction energy consumption can be efficiently reduced by optimizing the train driving strategy. The fundamental work of driving optimization was established a few decades ago. Most of those work is based on optimal control theory, and on Pontryagin Maximum Principle particularly. The derivation shows that the optimum driving strategy should contain four regimes (i.e., maximum traction, cruising, coasting, and maximum braking) [1,2,3,4]. Based on the derivation, some researchers presented heuristic algorithms to solve the optimal driving problem by determining the position of multiple coasting switching points [5,6]. Due to the randomness of searching process, the convergence speed of heuristic algorithms is slow. Furthermore, the global optimum solution is not always guaranteed to be found. To obtain the global optimum solution, some researchers presented exact algorithm to solve the problem by considering the target problem as a multistage decision problem [7,8,9,10]. The multistage optimization is carried out in a space-time-velocity three-dimensional searching space. However, the efficiency of the exact algorithms is limited by the 'curse of dimensionality'.
To overcome the shortage of heuristic algorithms and traditional exact algorithm. A dynamic programming (DP) based exact algorithm is proposed to obtain the global optimum solution of the multi-objective problem in an efficient way. Furthermore, a more realistic train driving model is considered with multiple optimal goals (i.e., energy saving, punctual arriving, less switching, safe driving and accurate stopping).

Problem definition
The optimal driving problem can be described as follows. For a given running time, there exists enormous train driving strategies which can ensure train arriving punctually, accurately and safely. Among those feasible driving strategies, the strategy which costs minimum energy consumption is regarded as the optimum driving strategy. Overlooked by other researchers commonly, the frequent switching of the driving regimes may cause passengers uncomfortable and increase tear and wear of vehicle equipment. Thus, reducing the regimes changes will be regarded as another optimal object in this paper.
For simplicity, we assume that the train equip with Automatic Train Control (ATO) system to realize the cruising control at any line position. We also assume that the train load is a constant value at any time. The regenerative energy between accelerating train and braking train is not considered in this paper.

Model and algorithm
In our approach, the space range [0, X] of interstation is spilt into K stages. Note, the speed restriction and track slope of each stage is constant. At kth stage, the train takes unique regime u k ∈ {maximum traction, cruising, coasting, maximum braking} as the driving mode. All the K regimes constitute the driving strategy U, U = {(u 1 , x 1 ), (u 2 , x 2 ), …, (u K , x K )}, where x k is the travelling distance at kth stage. The speed range [0, V max (x k )] at position x k is discretized into a set of vertices s k,i = (x k , v i ) by velocity interval Δv. The V max (x k ) is the maximum train velocity at position x k under the flat-out driving strategy. Figure 1 shows an example of the discretization, where the solid line is the flat-out train trajectory and the hollow dots are the vertices. The vertex transition equation s k+1,j = F(s k,i , u k ) at kth stage can be described by train dynamic equations in details which based on single-particle model: where x is the position of the train, v is the velocity of the train, M is the mass of the train, α is a factor to consider the rotating mass, p(u k ) is the train traction at regime u k , q(u k ) is the braking force at regime u k , r is the running resistance which includes basic resistance and line resistance. The basic resistance can be formulated as Davis equation where a, b and c are train-specific constants. The line resistance is the force caused by the the track gradient θ. The g is the gravitational constant.

Speed
Position The traction and braking force of train at regime u k are described in Table 1, where the P is the maximum traction and Q is the maximu braking force, P and Q are constant values. The goal of enery saving can be fomulated as： where X is the interstation distance, E(U) is energy consumption of strategy U. The goal of punctual arriving can be fomulated as： where T(U) is interstation running time of strategy U and T set is the running time specified by timetable. The goal of less switching can be formulated as: where the D(U) is the regime changes of strategy U. The goal of safe driving can be formulated as: where V limit is the speed restriction. The goal of accurate stopping can be formulated as: According to the discretization rules above, the optimal goals of safe driving and accurate stopping can be satisfied during the searching process. Other three goals (i.e., energy saving, punctual arriving, less switching) should be optimized during the dynamic programming approach.
The dynamic programming is executed backward from destination to origin. We define the sub-strategy U(s k,i ) as the sequence of regimes from vertex s k,i to destination. There are three cost funcions involved to evaluate the optimality of U(s k,i ) at kth stage : The e(s k,i , u k ) and t(s k,i , u k ) are the equation of the immediate cost of energy and time in kth stage at vertex s k,i using regime u k . To satisfy the punctual arriving goal, the driving strategy U(s k,i ) should be constrained as: Where T des (s k,i ) is the minimum running time from s k,i to destination, T org (s k,i ) is the minimum running time from origin to s k,i . Specifically, the constrains at destination vertex s K+1,0 and origin vertex s 1,0 are formulated as: The calculation method of T des (s k,i ) and T org (s k,i ) are described in Figure 2, where the blue line is the flat-out train trajectory from origin to s k,i , and the red line is the flat-out train trajectory from s k,i to destination. Due to the conflict between energy consumption and running time cost functions, each vertex s k,i exists a set of pareto-optimal solutions which satisfy the cost functions listed above. A strategy U * is said to be pareto optimal if only there is no other strategy U such that E(U) ≤ E(U * ) and T(U) ≤ T(U * ) can be established at the same time. We define U * w (s k,i ) as the wth pareto solution in the pareto set of vertex s k,i and W(s k,i ) as the total capacity of pareto set at vertex s k,i . To improve the efficiency of the algorithms, an upper limit W limit for the capacity of pareto set at each vertex is introduced. The congestion degree and regimes changes of each pareto solution are taken as the evaluation criterions to maintain the capacity of the pareto set. The reserve operation will be executed only when the capacity of pareto set is bigger than the W limit .
According to the less switching goal, the pareto solutions which have less regime changes will be reserved during the DP process. For the pareto solutions which have same number of regimes changes, the congestion degree is introduced to evaluate the importance of those pareto solutions. The congestion degree is defined as the searching space size of the each pareto solution for the conflict goals of energy consumption and running time, as described in Figure 3. The congestion degree of each pareto solution can be calculated as: where the C(U * w (s k,i )) is the congestion degree of the wth pareto solution in the set of s k,i . The pareto solutions which have bigger congestion degree will be reserved during the DP process. The introduction of the congestion degree can improve the efficiency of DP process and retain the diversity of the solutions generated at each stage.
The vertex's pareto solutions which have more regimes changes and less congestion degree will be deleted only when the capacity of pareto set is bigger than the W limit . Note, the evaluation criterion of switching number has the higher priority than congestion degree in this paper.
The process of DP is executed from destination to origin. The optimum solution is selected as the minimum energy consumption solution from the final vertex's pareto set. The process of the backward DP is discribed as follows : Step1 Import the data of lines and trains. Set the model parameter (i.e., the number of stages K, the velocity interval Δv and the capacity of pareto set W limit ).
Step2 Traversal all the vertices at kth stage. For each vertex s k,i and u k ∈ {maximum traction, cruising, coasting, maximum braking}, backtrack the last vertex though dynamic equations F(s k,i , u k ). Add the decision variable u k to each pareto solution in the set of the last vertex {U * w (F(s k,i , u k ))} and generate new strategies. Check the feasibility of each new strategy with constrains (13 -14). Add the feasible solution to the solution set {U(s k, )} at vertex s k,i .
Step3 According to the cost functions (8 -9), execute the pareto dominance operation for the solution sets of all vertices at kth stage, and get the pareto solution set {U * w (s k,i )} for each vertex s k,i . If the capacity of pareto set W(s k,i ) > W limit , calculate the congestion degree of the solutions and rank the solutions from large to small. Based on the rank sequence of congestion degree, rank the optimal solutions by the switching number from lest to most. Remove the redundant solution until W(s k,i ) = W limit . Note, the two-setp rank operation introduced above can keep the priority of regimes changes higher than congestion degree. If k -1 ≠ 0, let k = k -1, and jump to step2, else output the optimum solution from the the final vertex's pareto set.

Simulation and experiment
A C++ based procedure is developed to verify the effectiveness and efficiency of the Multi-Objective Dynamic Programming (MODP) approach as mentioned above. The procedure is performed using a laptop with Intel (R) Core TM i5-2530M CPU and 4 gigabytes memory. The train parameters and line data of Beijing Yizhuang subway is selected according to the literature [11]. The searching space is discretized with the rules that stages number K = 50 and velocity interval Δv = 1 km/h. The upper limit capacity of pareto is set as W limit = 50. As a comparison, a Weight-based Dynamic Programming (WDP) approach according to the literature [10] is performed with the same model setting. The WDP introduces weight factor to balance the importance of multiple objects. Table 2 shows the results.
There are 13 interstation listed with distance, trip time and flat-out energy consumption in our cases. Arriving deviation, energy-saving comparing to flat-out, switching number and calculate time are taken as the evaluating criterions of two optimization approaches. As the results showed in table 2, MODP obtains the better solutions using less calculation time. In average, the arriving deviation, energy saving and regimes changes of WDP is 3.5 second, 52.8% and 11.5 times. The arriving deviation, energy saving and regimes changes of MODP is 0.7 second, 54.5% and 7 times. Furthermore, the average calculation time of MOPD is 2.9 seconds, which is much less than the calculation time of WDP. Figure 4 shows optimal results of case 3. The red line is the train trajectory of WODP and the blue line is the train trajectory of WDP. The regimes changes of WODP optimal solution is 7 and the regimes changes of WDP optimal solution is 12.

Conclusion
In this paper, we proposed a novel DP based approach for optimizing driving strategy. The optimal driving problem was first considered as a multi-objective problem with five optimal targets (i.e., energy saving, punctual arriving, less switching, safe driving and accurate stopping). The optimization problem was remodelled as a multistage decision problem by discretizing the continuous train movement in space. The process of DP was carried out in the velocity-space status space. Due to the discretizing rules of searching space, the optimal goals of safe driving and accurate stopping can be satisfied during the searching process. The rest of multiple goals were spilt into cost functions and constrains for each stage. Due to the multiple cost functions, a set of pareto optimal solutions can be achieved at each vertex during the process of dynamic programming. To further improve the efficiency of algorithm, regimes changes and congestion degree was introduced to maintain the capacity of the pareto set at each vertex. A case study of Yizhuang urban rail line in Beijing was conducted to verify the effectiveness and the efficiency of DP based algorithms. The results show that our approach obtained higher quality solutions. Comparing with other DP based algorithm, the proposed algorithm resulted in 1.7 percent energy saving improvement, while the average running time deviation was -0.7 s and average regimes changes was 7 times. Furthermore, the computational time savings was significant compared to other DP based algorithm. In order to satisfy the real-time caculation requirement of train-borne equipment, the discretization parameters ((i.e., the number of stages K, the velocity interval Δv, the capacity of pareto set W limit )) should be set to the proper values The parameters setting decision made by users according to the actual subway is a trade-off between algorithm effectiveness and efficiency.