Optimal Control for Aluminum Electrolysis Process Using Adaptive Dynamic Programming

Optimal control of aluminum electrolysis production process (AEPP) has long been a challenging industrial issue due to its inherent difficulty in establishing an accurate dynamic model. In this paper, a novel robust optimal control algorithm based on adaptive dynamic programming (ADP) is proposed for the AEPP, where the system subjects to input constraints. First, to establish an accurate dynamic model for the AEPP system, recursive neural network (RNN) is employed to reconstruct the system dynamic using the input-output production data. To ensure input constraints are not to exceed the bound of the actuator, the optimal control problem of the AEPP is formulated under a new nonquadratic form performance index function. Then, considering the perturbation of the AEPP, the robust control problem is effectively converted to the constrained optimal control problem via system transformation. Furthermore, a single critic network framework is developed to obtain the approximate solution of the Hamilton-Jacobi-Bellman (HJB) equation. Finally, the proposed ADP controller is applied to the AEPP system to validate the effectiveness and performance.


I. INTRODUCTION
The production process of aluminum electrolytic industry is a strongly coupled and dynamic nonlinear process. AEPP mainly depends on the electrolytic reaction in the electrolytic cell. The quality of the electrolytic reaction determines the product quality of the aluminum. In actual production, AEPP cannot produce metal aluminum with high efficiency and high quality due to many problems exist in the AEPP. For example, the reaction in the electrolytic cell is affected by many factors, such as the temperature and the direct current (DC) voltage of the electrolytic cell. Hence, it is difficult to control the reaction process effectively. In the actual AEPP, engineer often based on equipment operating status and artificial experience to adjust the control parameters to achieve control requirements of the aluminum electrolysis. However, The associate editor coordinating the review of this manuscript and approving it for publication was Ton Duc Do . due to the limitations of this method and the limitation of manual operation, the product quality often fails to meet the requirements. Due to the complexity of the AEPP, how to establish an effective model and achieve efficient control of the AEPP has great academic value and engineering application value.
In the AEPP, a large amount of online and offline data are generated and stored such as online detection data, offline analysis data, operation statistics and so on. These production data contain rich system operation information [1]. Because it is difficult to establish an accurate aluminum electrolysis mechanism model, the use of data-driven control theory to solve the optimization and control of complex nonlinear aluminum electrolysis systems has become a hot spot in aluminum electrolysis optimization control. In [1], an intelligently optimized aluminum electrolytic manufacturing system was proposed for complex AEPP. In [2], a multi-target bacterial foraging algorithm was proposed, which can maximize current efficiency and reduce resource consumption. In [3], a global double heuristic planning control strategy was proposed based on event-trigger, and which is applied to the optimization process of aluminum electrolytic production. Therefore, the data-driven can not only identify production processes with unknown aluminum electrolytic system models, but also can use online and offline data to achieve modeling and control between output variables and process variables [4]- [7].
Adaptive dynamic programming (ADP) known as a typical data-driven control method is proposed to approximately solve nonlinear optimal control problems [5]- [7]. Based on offline and online data, ADP uses non-linear function fitting methods to approximate the performance indicators of dynamic programming [8]- [12]. ADP is a powerful tool to solve HJB equation and overcome the severe difficulty of ''curse of dimensionality''. In recent years, ADP is more and more widely used in industrial systems. However, it is difficult to establish an accurate mathematical model for the complex and uncertain non-linear production process. Hence, many data-driven ADP control methods have been presented, where offline or online input and output data are directly used to replace the model knowledge. In [13], data-driven method was used to establish a recursive neural network model for slag powder production process. A tracking controller was designed with control constraints and applied to the slag powder production process. In [14], a novel ADP method was proposed based on adaptive reinforcement learning for unknown nonlinear systems with input constraints. Simulation results verified the effectiveness of the proposed algorithm. In [15], a robust adaptive control algorithm was proposed based on reinforcement learning, which transformed the robust problem into an optimal control problem with constraints and guaranteed the stability of the nonlinear system. In [16], a data-driven robust approximate optimal tracking control scheme was proposed, where an unknown nonlinear system model was reconstructed and an approximate optimal tracking controller using the ADP method was designed. In [17], a data-driven adaptive dynamic programming method was proposed for a class of continuous-time nonlinear systems. By designing a multivariable tracking scheme, a simulation experiment of multivariable tracking control is realized. In [18], a novel data-drive neuro-optimal tracking control algorithm was proposed for unknown nonlinear systems. The proposed ADP controller was applied to a continuous stirred reactor system to verify its effectiveness and performance. In [19], an event-triggered approximate optimal control structure is proposed for a nonlinear continuous time system with control constraints. In [20] addressed the challenging industrial problem of natural gas desulfurization control, and proposed an improved unscented kalman filter assisted ADP method to solve the optimal control problem of the desulfurization system. Hence, data-driven adaptive control method, which can accurately identify the complex system and achieve optimal control, is widely used in actual industrial system [21].
In this paper, a novel ADP control algorithm is proposed for the AEPP system with input constraints. First, recursive neural network is used to establish an accurate model for the AEPP. Then, a robust ADP algorithm with control constraints is proposed to obtain the optimal control law. The robust control problem is converted to the constrained optimal control problem. Furthermore, a single critic network framework is developed to obtain the approximate solution of the HJB equation. Experimental results show the effectiveness of the proposed algorithm. The major contributions of this paper include the following.
1) A novel robust optimal control algorithm is developed for the AEPP system, where only one critic NN is employed. Hence, the computation complexity is reduced.
2) This paper extends the work of [4] and [14] to develop an optimal controller for AEPP system with input constraints. Specifically, a new non-quadratic form performance index function is developed for the AEPP, which ensures the optimal control law not to exceed the bound of the actuator.
3) Typical industrial production process, the AEPP, is utilized to verify the effectiveness of the proposed method.
The rest of this paper is organized as follows: In Section II, the optimal control problem is formulated for nonlinear AEPP. In Section III, the dynamics of unknown nonlinear AEPP is reconstructed by RNN. Section IV develops the robust optimal control scheme in detail. In Section V constructs a single critic network to approximately solve the HJB equation. In Section VI, the proposed algorithm is applied to the AEPP, and experimental results are discussed. Finally, the conclusion is given in Section VII.

II. PROBLEM FORMULATION A. AEPP DESCRIPTIONS
The AEPP is a complex reaction processes. First, alumina and cryolite are fused into the electrolytic cell. Then, physical and chemical reactions are occurred when high-voltage direct current is access to the electrolytic cell. Then, the produced liquid aluminum are extracted and clarified liquid aluminum are poured into aluminum ingots. As shown in Fig.1, raw materials of the AEPP system are including alumina, carbon anode, cryolite, and fluoride salt [22]. Among them, alumina is the key raw material. The reaction process is roughly as follows: Cryolite and fluoride together form an electrolyte melt. Cryolite is a good conductive melt, while the function of fluoride is to improve the molecular ratio of electrolyte and reduce the temperature of primary crystal. The aluminum electrolytic cell uses a carbon anode and a carbon cathode, and alumina is dissolved in the cryolite melt. Then, DC voltage input to the electrolytic cell. When the temperature of the cell reaches about 960 • C, the electrolytic reaction will occur. Hence, liquid aluminum is obtained at the cathode and anode gas is generated at the anode. The reaction process is affected by the following factors: the temperature of the electrolytic cell, the voltage of the direct current, the concentration of alumina, the distance between the anode and the surface of the liquid aluminum, the molecular ratio between sodium fluoride and aluminum fluoride, etc.

B. CONTROL SYSTEM DESCRIPTION
The key of the aluminum electrolytic production system is to ensure the quality of liquid aluminum while ensuring the safe operation of the system and reducing the produce accident. The reaction process is mainly affected by the cell temperature of the electrolytic cell and the DC voltage of the electrolytic cell.

1) TEMPERATURE OF THE ELECTROLYTIC CELL
The temperature of the electrolytic cell, which can reflect the current efficiency of the electrolytic cell, is an important process technical parameter. In the real AEPP, when the electrolytic cell temperature fall 10 • C, the current efficiency can increase 1% -1.5%. However, it does not mean the lower temperature the better efficiency. If the temperature of the electrolytic cell is too low, it is easy to cause a cold tank or a sick cell. Therefore, in order to save energy, the temperature of the electrolytic cell should be controlled within an optimal range in the premise of guarantee normal operation of the electrolytic cell. In the real AEPP, the temperature range of the electrolytic cell is 940-960 • C.

2) DC VOLTAGE OF THE ELECTROLYTIC CELL
The DC voltage of the electrolytic cell in AEPP is also an important indicator to evaluate the reaction efficiency of the electrolytic cell. DC efficiency and DC power consumption can reflect the DC voltage [23].
DC efficiency is defined as the ratio of aluminum production per unit time to the theoretical aluminum production calculated according to Faraday's law, that is where, Al 1 , Al 2 are the actual aluminum production and the theoretical aluminum production, respectively. Al 2 = 0.3356 × It, the unit is kg/kA.h, 0.3356 is the electrochemical quantity of aluminum, I is current, t is time.
In the AEPP, loss current is inevitable. We can reduce the loss current to improve the current efficiency. The DC power consumption directly reflects the energy consumption of production one ton aluminum, which is also an important parameter to evaluate the technical level of AEPP. The relationship of DC power consumption, cell voltage and current efficiency is where p is DC power consumption of one ton aluminum, its unit is kW .h/t − Al; U is working voltage of electrolytic cell, its unit is V ; CE is current efficiency. It can be seen that the DC power consumption is inversely proportional to the current efficiency. The higher the current efficiency, the lower the DC power consumption. In addition, the slot voltage decreases and the DC power consumption will also decrease. Shortening the pole distance can reduce the tank voltage and thus reduce the DC power consumption, but excessively shortening the pole distance will reduce the current efficiency and increase the DC power consumption. Hence, the voltage of the electrolytic cell cannot be too low, we generally set at about 4-4.5 V .

3) DYNAMIC DESCRIPTION OF AEPP
In real AEPP, the reaction process of the electrolytic cell is mainly related to the following parameters: • The main controlled variables, such as working tank voltage V 1 , electrolytic cell working temperature C, etc.
• The main control variables, such as the number of feeds u, the molecular ratio, DC power consumption, aluminum output and current efficiency during the reaction.
In this paper, the number of feeds u is mainly considered as the control amount. Hence, the AEPP control system can be described aṡ where u=[u] T , x=[x 1 , x 2 ] T , θ is a constant, x 1 represents working tank voltage, x 2 represents working temperature of electrolytic cell, u represents feeding number.
In the real AEPP, the actuator is constrained by its own physical constraint. Hence, the control action u(t) should be limited in a specified range. The control action u(t) can be constrained as follows: where T min and T max are the minimum and maximum value of u(t), respectively. The reaction in the electrolytic cell is a complex process. To make the reaction process smoothly, the temperature and the voltage of the electrolytic cell need to be controlled within the specified range. Therefore, the control target is to design an optimal control law u * (t) to make system states track the desired temperature and voltage value.
However, it is difficult to obtain an optimal control law from system (3), due to the system function is unknown in the real AEPP. Meanwhile, the corresponding desired control of the desired tracking state is also not easy to get from the unknown system. Therefore, we will present a new optimal tracking control method to obtain the controller for CSTR.

III. DATA-BASED MODELING FOR AEPP
In the real AEPP, the production system is a complicated nonlinear process. It is difficult to establish an accurate mathematical model for the AEPP. Hence, a large amount of production data in the AEPP are used to reconstruct system model by a RNN. Based on this, considering the asymmetric control constraints of AEPP, an optimal control strategy is designed to improve the production quality.
Based on input and output data, a RNN is used to identify the system dynamics. Hence, the system can be formulated aṡ where system status x ∈ R n , control law u ∈ R m , W 1 , W 2 , W 3 are RNN ideal weight matrices. ε (t) is the bounded model reconstruction error. f (·) is the activation function, µ (·) is a monotonically increasing function, and for any x, y ∈ R exists k > 0, satisfies: is a positive constant. In this paper, µ (x) = tanh (x), f (x) = tanh (x). According to the formula (5), a data-driven model can be reconstructed aṡ wherex (t) is the estimated value of system state vectors,Ŵ 1 , W 2 ,Ŵ 3 is the estimated value of desired weight W 1 , W 2 , W 3 , respectively. v (t) satisfies: where e m (t) = x (t) −x (t) is the model state error, η is adjustment parameter. Combining equations (5) and (6), the dynamic equation of model error can be derived aṡ Considering the dynamic equation of model error, we havẽ The network weight matrix and adjustment parameters of the data-driven model (6) are updated according to the following learning law: where i , i = 1, 2, 3 is the corresponding positive definite matrix. The model error identification converges gradually when lim t→∞ e m (t) = 0, andŴ 1 (t),Ŵ 2 (t),Ŵ 3 (t) approaches the ideal matrix W 1 , W 2 , W 3 , respectively. Therefore, using a large amount of offline data and after a long enough time for model identification, the nonlinear system can be expressed aṡ

IV. OPTIMAL CONTROL SCHEME BASED ON ADP
For the RNN model, a special index function is used to solve the asymmetric input bounded problem, and a critic network is employed to approximate the index function. Hence, an adaptive robust controller is developed to meet the control constraints [24].

A. HJB EQUATION FOR AEPP
According to equation (12), the aluminum electrolytic production system model with the perturbation term can be expressed aṡ where u = {u|u ∈ R m , T min ≤ u (t) ≤ T max } is the control inputs that satisfy constraints. We assume thatû (t) is the expected control corresponding to expectedx (t), the control error is defined as where T min −û (t) ≤ u e (t) ≤ T max −û (t).
For a constrained optimal control problem, the control goal is to find an optimal control law that satisfies the constraints to make the system progressively stable. Then, the performance index function can be formed as (15) where r (x, u) = x T Qx + W (u), Q is positive definite matrix and W (u e ) is positive definite. In this paper, we choose a non-quadratic function for the AEPP system, which can be expressed as where a = (T max + T min )/2 −û(t), k = (T max − T min )/2, R= diag {r 1 , r 2 , · · · , r m }, r i > 0, i = 1, · · · , m, ψ ∈ R m , ψ (·) is the boundary limit and |ψ (·)| ≤ 1, It should be emphasized that W (u e ) is positive definite since ψ −1 (·) is a monotonic odd function and R is positive definite. In this paper, we chose ψ (·) = tanh (·) to guarantee that the control inputs are bounded. Derivate time T to get Lyapunov equation The Hamilton operator equation that defines the control law u (x) and the value function V (x) : Define the optimal value function as The optimal value function can be obtained by solving the Hamilton function below Suppose that the minimum value exists, the optimal control u (k) can be derived as Combining formula (19) and (20), we can write the equation of the nonlinear system as where According to [25], [26], we have Through equation (22), equation (21) can be rewritten as

B. PROBLEM TRANSFORMATION
In this section, by using Theorem 1 to prove that the robust control of the system (13) can be obtained by finding the optimal control solution for the value function (15) of the system (12) [27]. Assumption 2: f (x) + g (x) u is Lyapunov continuous on compact set containing origin, namely, the system (13) is stable on compact set . In addition, f (0) = 0.
Assumption 3: The control matrix g (x) is known to be bounded. For each x ∈ , both exist constant g m and g M (0 < g m < g M ) namely g m < g (x) < g M .
Theorem 1: Consider (12) the nominal system described by the value function (15), suppose Assumption 1-3 holds. Then, the optimal control u * (t) design in (20) can guarantee the system (13) run steadily in the sense of uniform ultimate boundedness (UUB).
Combining these two formulas, we can geṫ where Due to According to (28) and (29), (27) can be rewritten aṡ According to the median integral theorem, we can get where the value of θ i is between 0 and tanh −1 u * i /k . According to Assumption 3 and (29), we have Combining (30) and (32), we obtaiṅ where λ min (Q) represents the minimum eigenvalue of the matrix Q, which is positive definite, we have λ min (Q) > 0. Consequently,V * (x) < 0 as long as the state x (t) is out of the compact set This shows that V * (x) is a Lyapunov function for system (13) with the control u * , whenever x (t) lies outside the compact set x. Therefore, the optimal control u * (t) developed in (20) can ensure the trajectory of system (13) to be UUB.
According to Theorem 1, the robust control of system (1) can be obtained by solving the optimal control problems (2) and (3), that is, the solution of HJB equation (15) is needed. However, we will find that (15) is actually a non-linear PDE about V (x), which is difficult to solve with analytical methods. In the next section, an optimal control scheme based on neural network is developed to solve this difficulty.

V. CONTROLLER IMPLEMENTATION BASED ON NEURAL NETWORK
The key of the proposed ADP algorithm is to obtain the optimal value function and optimal control law. In this paper, we use a critical neural network to approximate the value function. According to the general properties of neural networks, we can represent the optimal value function as where W c ∈ R N 0 is the ideal neural network weight, ϕ(x) = ϕ 1 (x) , . . . , ϕ N 0 (x) T ∈ R N 0 is the activation function, N 0 is the number of neurons. (x) is the function reconstruction error of neural network.
Hence, by using the median theorem, the optimal control law can be rewritten as follows: where u * = − 1/2 1 − tanh 2 (ξ ) g T ∇ and the value of ξ ∈ m limited to ζ 1 (x) and A (x).
Because the ideal neural network weight is unknown, formula (36) cannot be calculated in the actual control process. VOLUME 8, 2020 Therefore, we choose a critic network to approximate the value function. Then the value function can be represented aŝ whereŴ c is the estimated value of W c . The error value of the weight can be defined asW c =W c −Ŵ c , and the estimated value of the optimal control using equation (37) can be represented as followŝ According to formulas (17), (36) and (38), the approximate Hamiltonian can be represented as From formula (35) and formula (39), we can get where ϒ (ζ αi ) = ln [1 − tanh (ϒ (ζ αi ))] , α = 1, 2, so for ∀ζ αi (x) ∈ R, ζ αi (x) can be expressed as ϒ (ζ αi ) = −2 ln 1 + exp (−2ζ αi (x) sgn (ζ αi (x))) −2ζ αi (x) sgn (ζ αi (x)) + ln 4 where sgn (ζ αi (x)) ∈ R is symbolic function. It should be noted that Hence, combining (40)and (42), we have In order to get the minimum value e, we need to chooseŴ c to minimize the squared residual error E = (1/2) e T e.
From the expressions (36) and (38), we can find that the value function and the critic network have the same weight. Hence, if the value function can be approximated by the critic neural network given in (36), the control strategy is obtained by (38).

VI. EXPERIMENTS AND DISCUSSION
In this section, actual production data of 300KA low-energy aluminum electrolytic cell of Chongqing Tiantai Aluminum Co., Ltd. are used to show the effectiveness and performance of our proposed algorithm.
In the experiment, we collected various historical data of the aluminum electrolytic cell production process from January 2017 to February 2018. Due to errors or human factors, the data collected directly from the production process inevitably exit errors and noise. After error elimination, 1680 sets of data are used to experiment. From the above analysis, it can be known that the AEPP is a multi-variable and strongly coupled nonlinear processes. In real AEPP, the temperature of the electrolytic cell and the DC power consumption are important indicators to measure the quality of aluminum products. In the experiment, temperature and average DC voltage of the aluminum electrolytic cell are used as the state variables, feeding number u is the control variable. Experimental simulation setup is shown in Fig.2. In order to maintain the quality of the product and maintain the stability of the AEPP, each control variable must reach a certain range. At the same time, according to the constraints of the actuator and the experience of field engineers, each control variable has an allowable range of change.
In order to obtain an accurate model,four-layer RNN with structures 2-2-2-2 is used to identify the input and output data.    Fig.3 and Fig.4, it can be seen that the proposed method can identify the state of the system well. Fig.5 shows the model   identification error. In the initial stage, the model error is large due to the inappropriate initial value. After a period of time, the model error converges to zero. From Fig.5, we can say that the proposed identifier network can effectively approximate the unknown nonlinear AEPP system. GANG YIN received the B.S. and M.S. degrees in mechatronics and automatic control from Chongqing University, in 1985 and 1992, respectively.
He is currently a Professor with the College of Resource and Safety Engineering, Chongqing University. His research interests include aluminum electrolytic smelting detection and monitoring technology, safety control technology, aluminum electrolytic fault detection and diagnosis, and artificial intelligence.
WEN HE is currently working with Meishan Bomei Qimingxing Aluminum Company Ltd. His research interests include aluminum electrolytic process and production, and aluminum electrolytic fault detection and diagnosis. He is currently an Associate Professor with the College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing. His research interests include industrial process modeling and intelligent optimization. VOLUME 8, 2020