Optimizing Subway Train Operation With Hierarchical Adaptive Control Approach

The proportional integral derivative (PID) method is widely used in industrial control applications. However, when applied to complex and dynamic train operation control systems, real-time parameter adjustment becomes a formidable challenge. Moreover, the multifaceted nature of train operation control, encompassing safety, parking precision, passenger comfort, and energy efficiency, exacerbates the difficulty of parameter adjustment. To address this problem, this paper formulates train operation control as a Markov decision process (MDP) and introduces an innovative adaptive control approach. This approach features a hierarchical structure comprising an upper-level deep deterministic policy gradient (DDPG) controller and a lower-level PID controller, leveraging the learning capability of the DDPG algorithm, as well as the stability and interpretability of the PID method. The upper-level controller acquires train status information and autonomously fine-tunes the PID parameters, while the lower-level controller accepts these parameters and adjusts the percentage of traction or braking to achieve train operation control. Furthermore, the reward function has been meticulously designed to reconcile the diverse objectives of train operation. Extensive experiments conducted on a subway simulation platform substantiate the effectiveness and adaptability of the proposed approach in various operational scenarios.


I. INTRODUCTION
Efficient subway train operation is paramount for ensuring the high performance and reliability of urban transportation systems.In the face of ever-increasing public transportation demand, optimizing subway train system operations has become imperative in the aim to enhance overall performance [1].
The control of subway train operation mainly focuses on two aspects [2]: reference speed optimization and speed tracking control.Reference speed optimization entails the pre-calculation of an ideal speed profile for the train journey between stations.This profile is meticulously designed to encompass critical operational objectives encompassing safety, parking precision, passenger comfort, and energy efficiency.This optimized speed profile acts as a pivotal The associate editor coordinating the review of this manuscript and approving it for publication was Jesus Felez .reference, thereby harmonizing with the train's operational dynamics and guiding the formulation of an effective speed control strategy.Speed tracking control, also known as subway train operation control, aims to ensure that the train operates as closely as possible to the reference speed profile, thus resulting in desired operational outcomes.Fig. 1 shows the speed-position/time curve and the applied train forces.During acceleration, the train experiences traction forces that propel it forward.Meanwhile, during braking, deceleration is achieved through braking force.Throughout the entire operational process, the train encounters resistance.Consequently, under the combined influence of these various forces, the train endeavors to closely adhere to the prescribed reference speed profile.
A highly efficient and reliable control system is essential for achieving effective train operation between stations.The most classical and commonly used method for train speed control is the proportional-integral-derivative (PID) controller [3], [4], which is still applied in subway lines like the Yizhuang Line and Changping Line of Beijing Subway.The PID controller is widely used in the industry due to its simplicity and good robustness.However, conventional PID controller parameter tuning relies on manual experience and repetitive on-site adjustments, thus leading to high costs and difficulties in achieving dynamic adaptive parameter adjustments [5].Moreover, during train operation, various internal and external factors, such as mechanical wear, normal aging, and weather conditions changes, continuously affect the train.The fixed-parameter PID controller inevitably experiences performance degradation.Additionally, the PID controller tends to frequently output speed adjustment commands to minimize tracking errors, thus possibly resulting in poor passenger comfort and an inability to balance multiple objectives [6].Therefore, an advanced control method is needed to overcome these limitations and to enhance the efficiency and effectiveness of subway train operations.
Researchers have explored different methods to improve speed tracking performance.Fuzzy control is one of the most commonly used train operation control methods [7].Fuzzy control involves fuzzifying the inputs of a control system, establishing a fuzzy rule base, and completing fuzzy inference.For instance, Pu et al. [8] designed a fuzzy PID controller to adaptively adjust PID gains, and they considered multiple objectives, including punctuality, energy consumption, parking accuracy, and comfort, to improve the tracking performance of the nonlinear train system.Similarly, for balancing multiple objectives, Zhu et al. [9] proposed a multi-objective model for urban railway train automatic operation, as well as designed a fuzzy controller to control train operations.
The fuzzy control methods heavily rely on the formulation of logical rules, which can be non-trivial [10].Furthermore, researchers have recently incorporated neural networks through which to enhance train operation controllers.Sun et al. [11] investigated an adaptive neural fuzzy sliding mode controller to suppress the disturbance effects on the train model parameters, thus demonstrating its effectiveness in speed tracking control.Pu et al. [12] proposed an adaptive control method for subway trains, whereby the time-varying parameters of train motion were considered and a train model with dynamic parameters was established.They designed a model-free adaptive control system by combining neural networks and the PID algorithm to achieve adaptive control.However, the neural network methods based on supervised learning heavily rely on the quality and quantity of the samples.Some of the above methods treat most parameters of the train system model as constants [13], [14], In actual subway operations, the traction/braking force and resistance of the train vary at different speeds.To overcome the limitations of previous methods, we propose a hierarchical adaptive control approach through which to optimize subway train operation.This approach leverages deep reinforcement learning technology to fine-tune PID parameters.Reinforcement learning enables agents to learn through interaction with the environment.It allows intelligent agents to autonomously learn and improve their behavioral strategies based on feedback and reward signals from the environment.Deep reinforcement learning combines deep neural networks with reinforcement learning algorithms to enable agents to learn complex representations and features from highdimensional, unstructured input data, resulting in improved generalization and robustness when facing unseen states [15].Deep reinforcement learning has achieved success in adaptive PID parameter tuning and found wide applications in various domains [16].For example, in wind turbine control [17], [18], robot control [19], [20], [21], and unmanned aerial vehicle attitude control [22], [23], deep reinforcement learning has demonstrated its effectiveness.However, to the best of our knowledge, this strategy has not been applied in the field of train operation control.Thus, we explore this direction by adopting deep reinforcement learning to adaptively adjust PID parameters in train operation control, thereby expecting to achieve similar successful results in this domain.
As a whole, the proposed approach possesses a hierarchical structure comprising an upper-level deep deterministic policy gradient (DDPG) controller and a lower-level PID controller.The reinforcement learning DDPG algorithm dynamically adjusts PID parameters online, thus allowing for the learning of optimal control strategies for different operating conditions and effective management of the train's complex continuous state and action spaces.The integration of the DDPG algorithm enables the online adjustment of PID parameters through value function approximation, thus facilitating an adaptive control based on continuously changing operational requirements.In addition, in order to balance multiple optimization objectives, the reward function is meticulously designed.
By combining the learning capability of the DDPG algorithm with the stability and interpretability of the PID controller, stable and interpretable control signals can be provided under complex and time-varying conditions, thus achieving precise and efficient train operation.
In summary, the main contributions of this paper are as follows: 1) We propose an adaptive control approach through which to optimize subway train operation, thus addressing the limitations of other PID control strategies.2) We developed a hierarchical structure that combines the learning capability of the DDPG algorithm with the stability and interpretability of the PID controller, thus enabling accurate tracking of the reference speed profile while considering multiple objectives.3) We conducted extensive numerical experiments on a city subway simulation platform to evaluate the effectiveness and superiority of the proposed approach.The rest of this paper is organized as follows.Section II provides a detailed analysis of the train dynamic model, traction/braking force, and resistance, thus laying the theoretical foundation for the proposed approach.In Section III, we elaborate on the adaptive control approach, including the design of the upper-level controller, the lower-level controller, and the reward function.Section IV describes the simulation settings and presents the numerical experiment results, thus demonstrating the adaptability of the proposed approach.Finally, Section V summarizes this paper and suggests potential future research directions.

II. TRAIN MODEL ANALYSIS A. TRAIN DYNAMIC MODEL
The single-particle train model is the most commonly used model for solving train operation problems [24], [25].In this paper, we adopt the single-particle model to simulate the train.According to Newton's laws of motion, the train's dynamics can be expressed as where x, t, and v represent the train's position, time, and speed during its operation, respectively.M represents the mass of the train, and F and R represent the traction/braking force and resistance of the train, respectively.The proposed approach controls the train by sampling at the time interval t, thus enabling an iterative calculation of the train's position and speed using (2) and (3).

B. TRACTION/BRAKING FORCE
The traction and braking forces of the train are provided by the traction system and braking system, respectively.The maximum traction/braking force that the train can provide at a specific moment depends on the train's speed and is usually represented by the traction/braking force characteristic curve [5], which is illustrated in Fig. 2. From Fig. 2, it can be observed that the maximum traction and braking forces remain constant at lower speeds.However, as the speed approaches the critical value, these forces start to decrease.The maximum traction and braking forces at time t can be expressed as where F tmax (t) and F bmax (t) represent the maximum traction and braking forces at time t, respectively.f tmax (v t ) and f bmax (v t ) are functions describing the variation of the train's maximum traction and braking forces with respect to speed v t at time t.The train control system determines the percentage of traction or braking force output [12], which is denoted as u.Therefore, the traction/braking force F(t) can be expressed as where F max (t) is the maximum traction/braking force at time t, and u(t) is the percentage of the maximum traction/braking force at time t.When u(t) > 0, the traction system applies traction force; when u(t) < 0, the braking system applies braking force; and when u(t) = 0, both the traction force and braking force are zero.

C. RESISTANCE
Resistance is an important factor that cannot be ignored when controlling a train.There are many influencing factors during a train's operation, thus making it difficult to precisely solve the resistance.Empirical formulas obtained through numerous experiments are typically used for calculation.
In this paper, we use the Davis equation to represent the resistance [26], which is formulated as where D 1 , D 2 , and D 3 are empirical coefficients, each of which is greater than or equal to zero.The specific values of these coefficients may vary depending on the different trains and track conditions.From ( 6), it can be seen that the resistance increases with increasing speed.To evaluate the adaptability of our proposed approach, various combinations of these coefficients were considered.

III. HIERARCHICAL ADAPTIVE CONTROL APPROACH
A. PROBLEM DEFINITION Subway train control is a complex task that can be effectively addressed by formulating it as an optimal control problem.The main objective is to determine the optimal strategy for adjusting traction and braking forces throughout the entire journey between stations.By carefully adjusting these forces, the train can achieve efficient and safe operation while maximizing performance.
Train control is composed of a series of control commands for the train, and its output depends solely on the current input state of the train, i.e., it is independent of historical states.Therefore, the optimal control of subway train operation can be formulated as a Markov decision process (MDP), which is a fundamental framework used to model decision-making problems involving sequential interactions.An MDP is typically composed of a state space S, an action space A, a state transition function P, a reward function R, and a discount factor γ , which are represented as a quintuple < S, A, P, R, γ >.
The state space S encompasses all possible states s within the environment.In order to comprehensively encapsulate the environment's state and ensure seamless coordination with the lower-level controller, this paper chooses the tracking speed v t , the percentage of traction/braking u t , the difference speed vt between the reference speed and tracking speed, and the distance dt between the reference position and tracking position to form the state at time t, which is defined as The action space A refers to the set of all possible actions a.In this paper, it consists of the PID parameters k p (t), k i (t) and k d (t), as shown in (8), which will be discussed in Section III-D.
The state transition function P is a conditional probability density function that is denoted as p(s t+1 |s t , a t ).It represents the probability of the state transitioning to s t+1 if the current state s t and action a t are observed.The state transition function is automatically completed by the train dynamic model.Reward R is a numerical value returned by the environment to the agent after executing an action.The reward function design is discussed in Section III-E.
The reward discount factor γ ∈ [0, 1] is used to reduce the weight value with increasing time steps.The goal of reinforcement learning is to find an optimal policy π * that maximizes the cumulative reward R t , which can be expressed as where r i is the reward at time i.

B. APPROACH OVERVIEW
In this section, we provide a comprehensive overview of the proposed approach for optimizing subway train operations.
The approach adopts a hierarchical control structure, whereby the advantages of DDPG and PID controllers are combined to improve train operation efficiency and performance, as shown in Fig. 3.The upper-level controller is responsible for adjusting the parameters of the lower-level controller based on environmental state information.The DDPG algorithm effectively addresses the problem of continuous action space in reinforcement learning by combining deterministic policy and the Actor-Critic architecture, thereby reducing exploration complexity and enabling efficient value function estimation.This provides an effective solution for reinforcement learning problems that involve continuous action spaces, such as PID parameter tuning.The lower-level controller dynamically adjusts the percentage of the traction and braking forces applied to the train, thereby allowing it to operate according to the desired control strategy.It is worth noting that the reward function was meticulously designed to balance multiple optimization objectives.

C. UPPER-LEVEL CONTROLLER DESIGN
The upper-level controller design in the proposed approach utilizes reinforcement learning techniques to learn the optimal control strategy for subway train operation.Reinforcement learning involves the interaction between an agent and its environment, with the agent making decisions based on feedback in the forms of states and rewards.
The DDPG is a deep reinforcement learning algorithm developed by Lillicrap et.al [27], and it is composed of two key components (where θ µ and θ Q are the parameters): an Actor (online) network µ(s|θ µ ) and a Critic (online) network The Actor network is shown in Fig. 4, and it consists of an input layer, two hidden layers, and an output layer.To enhance non-linear expression capacity, ReLU [28] is used as the activation function for the intermediate layers, and Tanh [29] is used for the output layer.It takes the state information as input and outputs the PID parameters.Note that, due to the use of the Tanh function, the output x out range is limited to [−1, 1], and it is scaled to a reasonable range  using (10).
where k up p , k up i , and k up d denote the upper limit of the allowable PID parameter values.
The Critic network is similar to the Actor network, as shown in Fig. 5.The difference lies in the input, which consists of a joining of the state and action.The output layer has no activation function, and it only outputs a real value, which is represented by the Q-value q(t).
The Actor network is updated by calculating the policy gradient, as shown in (11).The parameters in the Critic network are updated by minimizing the value of the loss function L, which is expressed as where y i is the estimate of the state-action value, and it is defined as where θ µ ′ and θ Q ′ are the parameters of the target networks.
In the DDPG algorithm, the incorporation of target networks constitutes a pivotal technique that is aimed at enhancing the stability and convergence of the training process.The Actor/Critic target network is a delayed copy of the Actor/Critic online network.During training, the target networks are updated periodically using a soft update 138296 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where τ is the update rate of the target networks.
To enhance the learning process, the DDPG algorithm incorporates an experience replay buffer.This buffer stores past experiences, thereby enabling efficient training by reducing data correlation and preventing policy oscillation.
During training, small batches of experiences are sampled from the replay buffer, and the DDPG network parameters are updated based on the calculated loss and policy gradient.This iterative process improves the control strategy over time, thereby enabling the upper-level controller to adapt to different operating conditions and to optimize the subway train's performance.

D. LOWER-LEVEL CONTROLLER DESIGN
The lower-level controller plays a crucial role in adjusting the percentage of the traction/braking force based on the parameters received from the upper-level controller.
The PID controller employs three types of items: proportional (P), integral (I), and derivative (D).The proportional item incorporates a suitable proportion of the error (difference between the desired value and the controlled object's output) into the control output.The integral item monitors the changing error variable over time and corrects the output by reducing the offset of the error variable.The derivative item control mode monitors the rate of change in the error variable, thus modifying the output in the presence of abnormal variations.By adjusting the parameters of the three items, the desired performance is obtained from the process.
The incremental PID is widely used in industrial applications as it overcomes the drawback of accumulating significant cumulative errors in the positional PID [19].Therefore, the incremental PID control law is employed to design the lower-level controller, as shown in (15).
where t represents the discrete sampling time, and the coefficients k p (t), k i (t), and k d (t) correspond to the proportional, integral, and derivative parameters of the incremental PID controller at time t, respectively.u(t) represents the increment value at time t, and the output value û(t) at time t is obtained by adding u(t) to the previous control output u(t − 1).The terms e(t), e(t − 1), and e(t − 2) represent the system error at times t, t − 1, and t − 2, respectively, i.e., the difference between the reference speed v r and the tracking speed v.
To ensure that the final output u(t) of the PID controller stays within the desired range, we utilized the clip function.This function restricts the value of u(t) to the range of −1 to 1, which is expressed as As described above, The lower-level controller in the proposed approach employs the incremental PID as the PID controller, as well as applies constraints on the output to accurately adjust the percentage of the traction/braking force based on the received control signals.This enables the subway train to respond effectively to continuously changing conditions and helps to accurately track the reference speed curve.Fig. 6 provides an overview of the design and integration of the lower-level controller in the proposed approach.

E. REWARD FUNCTION DESIGN
The design of the reward function is crucial for incentivizing desired behaviors and penalizing undesirable ones.In this paper, the reward function is defined as a weighted combination of individual reward components that correspond to each control objective.The overall reward is represented as R, as shown in (17).
where w i (i = 1, 2, 3, 4, 5) determine the relative importance of each reward component.The individual reward components are defined as follows.1) Safety Reward (R Safety ): This component encourages the train to operate below the speed limit to ensure safety.It imposes a penalty for exceeding the speed limit, which is represented by a large negative constant value.It is worth noting that, in each simulation, the current iteration is terminated if the speed exceeds the speed limit.
where C is a large number.
where v r and v represent the reference speed and tracking speed, respectively, and p r and p represent the reference position and tracking position, respectively.
3) Parking Reward (R Parking ): This component evaluates the accuracy of parking at the target position.It provides a reward for precise stops within the required tolerance, and it penalizes large position errors.Typical requirements dictate that the error in the parking position should be within ±0.3 m.It is worth noting that this reward only takes effect when the simulation time reaches the pre-planned time.Otherwise, its value is 0. Furthermore, when the simulation time reaches the pre-planned time and its speed is not 0, the iteration is terminated and considered a failure.
4) Comfort Reward (R Comfort ): Passenger comfort is closely related to the rate of acceleration change called jerk.A lower jerk value indicates higher comfort.Incorporating this component helps achieve smooth acceleration and deceleration, thus enhancing passenger comfort.
5) Efficiency Reward (R Efficiency ): This component encourages energy-efficient operation by penalizing energy-consuming behavior, and it is the time integral of the product of train speed v(t) and traction force F(t).
Through the integration of multiple rewards, the proposed approach can learn to make decisions that balance multiple objectives, as well as optimize the operation of the subway train accordingly.

F. ALGORITHM STATEMENT
In this section, we present the algorithm of the entire process, as shown in Algorithm 1.The proposed approach first initializes the parameters of the DDPG.It then iterates in a loop that interacts with the environment.During each iteration, the current state of the subway train is obtained, and an appropriate action is chosen based on the current policy determined by the DDPG controller.This action parameterizes the PID controller, which, in turn, determines the traction/braking percentage of the train based on the current state.The environment executes the train dynamics based on this traction/braking ratio, and it then observes the resulting next state, reward, and termination signal.These transition experiences are stored in a replay buffer for experience replay.Then, the network parameters are updated using batch sampling from the replay buffer, thereby calculating the loss and policy gradient.Receive the initial current state s 0 of the subway train t ← 0 6: while not done do 7: Select action a t according to the current policy and exploration noise Calculate the policy gradient ∇ θ µ J and loss L based on the T b 13: Update actor online network parameters using (11) 14: Update critic online network parameters using (12) 15: Update target networks using (14) 16: end while 19: end for The integration of DDPG and PID controllers achieves adaptability and flexibility in the control process.The DDPG controller learns from experience and provides guidance to the PID controller, thus enabling it to adjust its parameters based on different environmental conditions.This adaptive interaction between controllers and the environment contributes to the continuous improvement and optimization of subway train operations.

IV. NUMERICAL EXPERIMENTS
In this section, we establish a train operation simulation environment to conduct experiments to validate the effectiveness of the proposed approach and compare its performance with other popular control methods.Additionally, various reference speeds and resistance parameters are adopted to explore the adaptability of the proposed approach.

A. SIMULATION SETUP
We simulated a subway line with a total length of 1280 m and a speed limit of 80 km/h.The total weight of the train was set to 300 tons.The parameters of the Davis equation were set as D 1 = 0.6841, D 2 = 0.0229, and D 3 = 0.000345.
The characteristics of the traction and braking forces varying with speed were modeled using ( 23) and ( 24), 138298 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.respectively.
where F f (v) and F b (v) represent the train's traction force and braking force, respectively (in units of kN, where v denotes the train's running speed in km/h).
For comparison, we considered three other control methods: the PID controller, F-PID controller [8], [9], and NN-PID controller [12].The F-PID controller uses fuzzy logic and rules to optimize the PID parameters in real time.The NN-PID controller is an adaptive controller that combines neural networks with the PID algorithm for train operation control.In the NN-PID controller, PID parameters are controlled by a neural network, and the squared error between the reference speed and the tracking speed is used as a loss function for supervised training.
During the experiments, we assigned a weight of w i = 0.2 to each reward component, and the constant C in R safety was defined as 100.Additionally, the upper limits for the PID parameters were set as k

B. MAIN EXPERIMENTS
In this section, we present the experiments conducted to evaluate the performance and effectiveness of the proposed approach.These experiments compare the proposed approach, named the DDPG-PID controller, with other control methods in subway train operation, namely the PID controller, F-PID controller, and NN-PID controller, Fig. 7 and 8 show the speed tracking error and position  tracking error, respectively.Table 1 provides additional statistics of the tracking performance.
From Fig. 7, it can be observed that in terms of speed tracking around 18s (although all methods have speeds the reference speed), the DDPG-PID controller produced a smaller error compared to the other methods.At around 23s, while all methods exhibited overshoot, the DDPG-PID controller demonstrated the least overshoot.It was also evident that inaccurate speed tracking often occurs during acceleration and deceleration periods, while the tracking performance was found to be excellent during stable operation.Fig. 8 reveals that the pattern of position tracking errors was similar to that of speed tracking errors.At around 20s, all methods exhibited varying degrees of position tracking errors.However, the DDPG-PID controller proposed in this paper consistently maintains optimal qualitative performance, with errors consistently below 0.4m.
By combining the statistical data in Table 1, it is evident that the DDPG-PID controller outperforms other control methods in terms of tracking accuracy.Compared to the PID, F-PID, and NN-PID controllers, the DDPG-PID controller significantly reduced both the total error and maximum error in speed tracking.Specifically, when compared to the state-of-the-art NN-PID, the DDPG-PID reduces the total and maximum speed tracking error by 44.5% and 37%, respectively, and the total and maximum distance tracking error by 38.5% and 27.7%, respectively.Furthermore, from Table 1, it can be noted that, except for PID, all other methods almost perfectly achieved precise parking.The jerk indicator, which is calculated as the maximum acceleration rate over the time span of 2s during the entire tracking process, reflected that the DDPG-PID controller can provide superior passenger comfort compared to other methods.
The above experimental results indicate that the DDPG-PID controller outperforms other comparative methods.The potential reasons leading to this phenomenon could be attributed to the fact that the PID controller relies on fixed control parameters that are manually tuned for specific systems, thus lacking adaptability and learning capabilities, thereby leading to subpar tracking performance.The F-PID controller introduces some degree of adaptability by employing fuzzy rules to adjust PID parameters based on predefined conditions.Although it presents a certain degree of adaptability compared to the conventional PID controller, it still relies on rule-based methods, thus potentially failing to fully capture the complex dynamics of subway train systems.The NN-PID controller combines neural networks with the PID algorithm to adjust control parameters.However, it relies on supervised training, thereby using the squared error between reference speed and tracking speed as the loss function.This approach may not effectively balance multiple control objectives, thus leading to suboptimal performance.
In contrast, the DDPG-PID controller boasts an adaptive nature that is achieved through the fusion of reinforcement learning and the PID control mechanism.This innovative approach enables the controller to learn from previous experiences, and it helps it to optimize the control strategy via the signals received from the subway train, thus facilitating precise adjustments to the traction/braking force distribution.Furthermore, this controller operates independently of predefined rules or training datasets, thereby allowing it to adeptly capture the complex dynamics of train operations and deliver a significant enhancement to the adaptability of the underlying PID control mechanism.

C. EXPERIMENTS ON DIFFERENT RESISTANCE PARAMETERS
In this section, we investigate the performance of the DDPG-PID controller under different resistance conditions.Resistance is a critical factor that affects subway train operation and control.By exploring the controller's performance under different resistance scenarios without retraining the model, we can assess its adaptability under varying resistance conditions.
To simulate different resistance conditions, we adjusted the coefficients (D 1 , D 2 , and D 3 ) in the Davis equation.3.In Scenario 1, representing the baseline resistance parameters, the DDPG-PID controller achieves excellent speed and position tracking accuracy, as indicated by the low total and maximum tracking errors.is because the model was trained using Scenario 1's parameters, and was then tested in Scenarios 1, 2, and 3.
In Scenarios 2 and 3, where resistance parameters are respectively increased by 50% and decreased by 50%, the DDPG-PID controller exhibited slightly higher speed and position tracking errors compared to Scenario 1.This is because the model was not trained under these resistance parameters.Despite the higher tracking errors, the controller achieved precise parking in almost all cases.From the above table, it can be observed that Scenario 2 saw the highest energy consumption, which can be attributed to the increased resistance.Thus, it required the controller to exert more effort in maintaining the desired speed and position.Regarding the jerk indicator, there was little difference among the scenarios, thus indicating that all three scenarios ensured passenger comfort.
The above experimental results demonstrated that the DDPG-PID controller dynamically adjusts the control actions in the face of different resistance conditions, and this is performed without the need for retraining the model, thus significantly reducing tuning costs.This reflects the DDPG-PID controller's good adaptability in subway train operation control, thereby further establishing its potential for practical applications in subway train control.

D. EXPERIMENTS ON DIFFERENT REFERENCE SPEEDS
In this section, we investigate the performance of the DDPG-PID controller under different reference speed scenarios.By examining the control performance under different reference speeds, we can assess the controller's ability to handle varying operational requirements and can accurately track the desired speed profile.138300 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.To evaluate the controller's performance, we constructed two additional reference speed profiles, reference-fast (Scenario 4) and reference-slow (Scenario 5).These profiles were based on the original reference speed profile (Scenario 1) used in previous experiments.These new speed profiles involved increasing or decreasing the running time, respectively.Fig. 10 displays the tracking curve of different scenarios.The statistics are summarized in Table 4.As shown in Fig. 10, we plotted the speed position curves due to the different total times for the different reference speed profiles.Similarly, as shown in Table 4, we replaced the total error with the average error.
In Scenario 4, where the reference speed increases compared to Scenario 1, the DDPG-PID controller exhibits slightly higher tracking errors.The increased speed introduced a more challenging control task, thus resulting in larger deviations in speed and position tracking.Correspondingly, the energy consumption increased.Compared to other scenarios, although Scenario 4 exhibited a relatively high error, it remained within an acceptable range.In Scenario 5, which had a decreased reference speed, the DDPG-PID controller demonstrated excellent accuracy in speed and position tracking.The reduced speed allowed for the controller to make precise adjustments, thus leading to significantly reduced tracking errors compared to other scenarios.
The experimental results demonstrate that the DDPG-PID controller can generate control parameters without retraining the model under different reference speed profiles.This allows for accurate speed tracking and reduces the resource waste that is caused by retraining the model or by adjusting the parameters when changing the reference speed profile due to the re-formulation of travel plans.These results highlight the adaptability of the DDPG-PID controller in accurately tracking the desired speed profile, as well as its superior performance, minimal deviation, and efficient control, which all contribute to the optimization of subway train operation.

V. CONCLUSION
In this paper, an adaptive control approach for optimizing subway train operation is proposed.The proposed approach utilizes a hierarchical structure consisting of an upper DDPG controller and a lower PID controller, which work together to improve the efficiency and performance of train operations.By conducting comparative experiments with other control methods, the superiority of the proposed approach is demonstrated.Moreover, the adaptability of the proposed approach is established through the manipulation of varying resistance parameters and desired speed profiles.
However, it is worth noting that our proposed approach relies on a reference speed profile, which, to some extent, limits the flexibility of train operation.In future work, we intend to focus on developing techniques for achieving the online control of trains without the need for predefined speed profiles.Furthermore, another area of future research is the coordination of multiple trains during operation.While our approach has demonstrated promising results in optimizing individual train operation, exploring other methods to coordinate the movements and interactions of multiple trains will be a crucial area of interest.

FIGURE 1 .
FIGURE 1. Speed-position/time curve and the applied train forces.Subway trains A and B simulate the force distribution during the traction and braking processes, respectively.

FIGURE 2 .
FIGURE 2. The traction and braking characteristic curves of the train.

FIGURE 3 .
FIGURE 3. Structure of the proposed approach.

FIGURE 6 .
FIGURE 6. Design and integration of the lower-level controller.

2 )
Tracking Reward (R Tracking ): This component measures the deviation of the train speed and position from the reference values.It aims to provide a positive reward for accurate tracking.

= 1 . 5 .
The initial PID parameters were specified as k p = 3.4, k i = 0.3, and k d = 0.2 with a sampling interval of 0.2 seconds.
The baseline resistance parameters (Scenario 1) were set as D 1 = a base = 0.6841, D 2 = b base = 0.0229, and D 3 = c base = 0.000345.Two additional resistance scenarios, denoted as Scenario 2 and Scenario 3, were created based on the baseline parameters.The adjusted resistance coefficients are shown in Table 2.The speed tracking curves under different resistance scenarios are shown in Fig. 9.The statistical data are presented in Table

FIGURE 9 .
FIGURE 9. Speed tracking curves under different resistance scenarios.

TABLE 1 .
Statistics of the tracking performance.

TABLE 2 .
Resistance coefficients of the different scenarios.

TABLE 3 .
Statistics of the different resistance scenarios.

TABLE 4 .
Statistics of the different reference speed scenarios.FIGURE 10.Speed tracking curves under different reference speed scenarios.