A model-data-fusion pole piece thickness prediction method with multi-sensor fusion for lithium battery rolling machine

Trend prediction based on sensor data is an important topic in the thickness control system of lithium battery electrode mills. As the number of sensors increases, we can measure and store more and more data. The characteristics of nonlinearity, uncertainty, and time-variability in the lithium battery electrode thickness control system. The increase of control system complexity and data volume does not effectively improve the prediction performance. This paper proposes a physical-data fusion modeling prediction method based on a multiform coupling model and Bayesian LSTM (Bayesian Long Short-Term Memory) to achieve dynamic prediction of lithium battery electrode thickness, overcome data irrelevance and sensor noise, ensure the consistency of electrode thickness, and improve the operational efficiency of battery electrode production: Firstly, we establish the underlying physical model of the roll to further obtain the specific parameters affecting the thickness control and overcome the data irrelevance and sensor noise; secondly, we use Bayesian method to obtain the characteristics of the weight distribution of the sub-prediction network and construct the Bayesian LSTM predictor. An MLP (Multilayer Perceptron) is used as the fusion layer to fuse the results of different sub-predictions to improve the robustness of the nonlinear control system prediction model and solve the problems of slow approximation speed and ease to fall into local minimization of traditional neural networks. Finally, the advantages of the deep learning model are analyzed in terms of data feature self-extraction and model generalization generalizability. Compared with other neural network models, Bayesian LSTM has better generalizability for small sample data. The results show that the predictor can effectively model the large measurement data of the thickness control system of lithium battery electrode mills and improve the prediction performance.


I. INTRODUCTION
The lithium-ion battery electrode coating process is a key process in the manufacture of lithium-ion batteries, and the uniformity of the processed electrode thickness directly affects the safety, consistency, and other key indicators of lithium-ion batteries. Therefore, the real-time monitoring and control of the thickness of the electrode piece are crucial. [1]. The process flow of the lithium battery wafer is shown in Fig.1. In the production process of battery wafer, the most important process is coated drying and electrode compaction. The coating is the uniform coating of a stable, viscous, and liquid slurry on the positive and negative collectors. In the electrode coating process ensure that the parameters before and after the electrode are consistent, to effectively avoid problems such as differences in battery capacity and large differences in cycle life. The electrode compaction process relies on the formation of friction between the roll and the electrode, which continuously drags the electrode between the upper and lower rolls, and then under the pressure provided by the rolling pressure system, the electrode is plastically deformed. As shown in Fig. 1, the rolling force set by the upper and lower rolls is P , and the compaction running speed of the electrode piece with V . The increase of the relative density of the lithiumion particles in the electrode sheet is mainly manifested in the displacement of the particles, which are displaced between the pores by the roller rolling, filling the particles in the base material. At the same time, a small part of the deformation of the particles occurs, when the rolling force is continuously increased, the void of lithium-ion slurry is filled after a larger deformation [2]. The substrate foil strip is denser ( 0 c c ρ ρ > ) with the active material attached to the surface. The electrode's compacted density is maintained in a certain range of space and the consistency of the electrode thickness is maintained. After that, the electrodes are slit and stacked, and finally encapsulated and filled with liquid.
In the electrode rolling process, the thickness of the electrode sheet is not uniform after rolling due to the uneven thickness of the electrode sheet in the previous coating process. When the thickness of the electrode sheet is not uniform, the rate of lithium-ion and electron transport and conduction in the electrode sheet will be different. It is easy to cause the precipitation of lithium dendrites, which is unfavorable to the performance of the electric core. In addition, the contact resistance between the active material and the collector fluid is also different when the thickness of the electrode is different, and the thicker the electrode is, the greater the internal resistance, and the more serious the battery polarization is, which affects the capacity of the electrode. Therefore, it is a key part of the whole processing process to ensure the consistency of the thickness of the electrode. To solve the problem of inconsistent electrode thickness, many scholars have used real-time thickness measurement to achieve continuous high-quality production. The laser thickness measurement method is characterized by high sensitivity and fast sampling speed, using dual laser displacement sensors up and down differential thickness measurement and adding calibration devices to keep the thickness consistency [2][3][4]. However, the optical measurement method is easily disturbed by impurities in the external air and is prone to errors.
And with the rapid development of artificial intelligence technology, machine learning algorithms are gradually applied to machinery production status prediction. Fu. et al [5] established a multi-intelligence collaborative control system to achieve dynamic and synchronous control of the rolling mill. However, due to the complexity of the rolling process and the uncertainty of the influence of various unknown factors, it is difficult to obtain accurate mathematical models of thickness and plasticity coefficients. Therefore, he established the thickness AGC prediction intelligence and used the fuzzy neural network approach to encapsulate the neural network-based thickness and plasticity coefficient predictions into two bits of intelligence respectively, which replace the ordinary simple mathematical models and interact with other intelligence as auxiliary intelligence, thus further improving the system accuracy. In addition, machine learning algorithms have been applied to the field of power system prediction [6][7] , photovoltaic power generation prediction [8][9] and Agrienvironmental prediction [10][11], among others. In the field of machining, many scholars have found that the factors affecting the thickness of rolled parts in steel rolling mills are temperature [12][13], irregular vibration of the machine base [14][15][16], and rolling force [17][18][19]. It was found that machine learning models can be used instead of mathematical models to eliminate irregular vibrations of the machine [14][15][16]. Starting from the characteristics of hotrolled strip steel production data, Li. et al [16] used systematic clustering to determine the number of clusters and then used the K-means algorithm to divide the production data into K clusters. Each data cluster is used to build its own BP neural network prediction model, PSO (Particle Swarm Optimization) is used to optimize the network parameters to avoid the neural network from falling into them. While the stability of rolling force is one of the key parameters affecting the quality of rolled parts, Wang. et al [18] proposed two rolling force prediction methods combining improved PSO and BP (Back Propagation) neural networks in order to improve the prediction accuracy of rolling force during dynamic rolling in reversible cold rolling mills, using a large amount of actual data as the neural network training input and fully considering the effect of influence between the input parameters.
During the production of electrode electrodes, the roll rolling force stability and the thickness consistency of the electrodes are the keys to improving the production efficiency [20][21][22]. Wang. et al. [21] studied a mill rolling force prediction model based on an improved support vector machine. A least-squares support vector machine based on RBF kernel function and polynomial kernel function was established, and the parameters of the hybrid function were optimized using a cooperative quantum particle swarm algorithm to improve the prediction performance of the prediction model. The electrode thickness control system of lithium batteries has the characteristics of nonlinearity, uncertainty, and time variation, Xu. et al [22] used a genetic algorithm and backpropagation method to optimize the neural network and established the thickness control prediction model, which eliminated the nonlinearity and uncertainty problems due to mechanical vibration, and the RMSE was within 6.72, which effectively improved the accuracy and precision of prediction.
In contrast to machine learning algorithms that require manual input of features, deep learning algorithms can automatically select features that affect the prediction results of the model [23][24][25]. Deep learning can easily input problem-specific constraints into the model, which can easily reduce model bias [25]. RNN (Recurrent Neural Network) is a special type of neural network model that has the ability of short-term memory, which preserves the association between data by "remembering" relationships [25]. Liang. et al [26] used LSTM and Encoder-Decoder LSTM models to predict the thickness of coal seams. When the hyperparameters were not optimized, the error of the model's coal seam thickness prediction results was both large. The introduction of an expert knowledge base and optimization of hyperparameters such as the number of neurons, training period, and input-output sequence step size can effectively fit the real coal seam thickness true values. Relying only on data-driven models to predict the operating state tends to ignore the characteristics of the mechanical equipment itself. Zhang. et al [27] used a theoretically simplified rolling force model and a bounce model based on the theory to determine the prediction model input parameters and established a DBN-LSSVM hybrid model to predict the strip thickness, and the prediction accuracy was further improved compared with the conventional deep confidence network model. The essence of model-based methods is to predict the system output by constructing an observer and to determine the fault based on the residuals between the predicted and observed values [28]. This type of approach is more intuitive in terms of the predicted response of the system, but with the increasing complexity of the structure and working mechanism of mechanical equipment and the increase in random disturbances, the complexity of modeling increases significantly. Therefore, model-based methods have many constraints in complex engineering application scenarios and can no longer adapt to the current trend of mechanical equipment [29]. The data-driven approach analyzes the implied laws from historical data and trains the model through continuous iterations to make its "input-output" fit the distribution characteristics of historical data [30]. Given this feature, the data-driven dynamic prediction and diagnosis technology is largely free from the reliance on a priori knowledge, without the need to establish complex physical models, only need to select the appropriate parameters from the sensor monitoring data center and supporting the corresponding learning algorithm to achieve accurate early warning. However, there is a contradiction between computational speed and accuracy in the practical application of online prediction methods that rely solely on physical or data models for frequency dynamics. Hao. et al [31] established a mathematical model based on the influence function method to accurately obtain the thickness distribution during the angle rolling process. After that, a real-time dynamic prediction model was established relying on experimental data. The model can be applied to the actual production process to develop effective plate shape control strategies and flexible rolling protocols to meet the various thickness requirements for custom production.Wang.et al [32] proposed a method for online prediction of frequency dynamics based on the idea of physical-data fusion modeling: the transient frequency influencing factors were divided into critical and noncritical factors, the system frequency response model was applied to the critical factors to preserve the causal link between electrical information, and the error correction model based on the limit learning machine was applied to the non-critical factors to characterize the correlation. Therefore, we can introduce the nonlinear deformation formulas (including boundary and initial conditions) in battery electrode production into the data-driven model as training constraints, which can achieve the goal with fewer data samples.
In summary, the electrode thickness, as an important indicator to assess the operating status of the electrode mill, needs to be monitored in real-time. In order to solve the problems of wasted industrial resources and increased costs due to inconsistent sheet thickness, a more advanced prediction method is needed to predict the machine operating status for some time in the future. At present, the traditional method of predicting the thickness of lithium battery electrodes can no longer meet the demand for high precision and still has the following problems： 1. Many current models can often only analyze data purely based on data-based machine learning, without actually considering the underlying physical mechanics more [32]; 2. The processing of Li-ion battery electrodes is a very complex non-linear system, and even two identical electrodes may react differently when subjected to the same environment； 3. We can predict the behavior of nonlinear systems based on deep learning algorithms, but it is not easy to train and validate this system, often requiring large amounts of data.
In response to the above problems, we adopt a physicaldata fusion modeling prediction method based on the coupled multimorph model and Bayesian LSTM to achieve dynamic prediction of the thickness of lithium battery electrodes, ensure the consistency of the electrode thickness, and improve the operating efficiency of electrode production. Firstly, we establish the underlying physical model of the roll to further obtain the specific parameters affecting thickness control and reduce the redundancy and complexity of the data; secondly, we use LSTM and Bayesian LSTM models to predict the electrode piece thickness under variable working conditions, improve the robustness of the prediction model of the nonlinear control system, and solve the problems of slow approximation speed and easy to fall into local minimization of the traditional neural network. The details of the research are as follows： Firstly, the two upper and lower rolls of the mill will give pressure to the electrode piece and produce a plastic deformation state, which makes the thickness of the electrode piece change, while the rolls are subjected to the reaction force of the electrode piece and the bouncing phenomenon occurs. In addition, the calculation accuracy of the rolling force directly affects the quality and thickness of the lithium battery electrode piece. Therefore, we establish a multi-deformation coupled physical model based on the roll stiffness deformation model, elastic bending model, and rolling force model.
Second, the ordinary neural network predictive control model has the following problems： (1) Uncertainty about the initial connection and the size of the threshold； (2) Slower convergence of errors； (3) Easy to fall into the problem of local miniaturization； (4) Variable choice of network structure； (5) The contradiction between prediction ability and training ability.
Therefore, we use LSTM deep learning model to solve the problem of slow convergence and easy to fall into local minimization. However, the number of hyperparameters of LSTM is large, and the prediction accuracy of the model is poor under the influence of noisy data. Further, we use the Bayesian algorithm to optimize the LSTM model and suppress the noise effect of the data. We use the variable parameters of the deformation coupling model as the input variables of the prediction model, and the predictions based on the selected input data are fused by a nonlinear fusion network, which effectively balances the predictive and training capabilities of the model. In addition, we use MSE, RMSE, MAE, R, and R 2 metrics to evaluate the convergence ability and fitting accuracy of the neural network.
Finally, we compare other deep network models with the method proposed in this paper. The prediction accuracy between the GA-BP model, DBN-LSSVM model, LSSVM model, GRNN model, and the prediction model proposed in this paper is mainly compared and analyzed. The advantages of deep learning models in data feature selfextraction and model generalization generalizability are analyzed. Compared with other neural network models, Bayesian LSTM has better generalizability for small sample data.
Our proposed multi-deformation coupled model and data-driven method for battery electrode thickness prediction control method have the following innovations： 1. In order to improve the prediction accuracy of lithium battery electrode thickness, maintain the thickness consistency and improve the mill operation efficiency. In this paper, we innovatively propose a distributed prediction model with multi-deformation coupled model and Bayesian LSTM, in which the variable parameters of the deformation coupled model are used as the input variables of the prediction model, and the predictions based on selected input data are fused by a nonlinear fusion network, which effectively balances the prediction ability and training ability of the model. The requirement of large data volume is reduced based on improving convergence ability and fitting accuracy.
2. To solve the contradiction between computational speed and accuracy of traditional data-driven online prediction methods in practical applications. In this paper, we propose a multi-deformation coupling model of rolls and a rolling force model, which introduces the nonlinear deformation formula (including boundary and initial conditions) in the production of battery electrodes into the data-driven model as training constraints, which can achieve the target with fewer data samples.
3. Under variable working conditions, the uncertainty of LSTM hyperparameters leads to poor prediction results. To solve this problem. In this paper, we propose a Bayesian statistical approach to optimize the LSTM hyperparameters, which reduces the influence of noisy data on the prediction results. We use MSE, RMSE, MAE, R, and R 2 metrics to evaluate the convergence ability and fitting accuracy of the neural network.
4. To further analyze the applicability of the Bayesian LSTM model, we compare other deep network models with the method proposed in this paper. The results show that the Bayesian LSTM has better generalization in small sample data compared with other neural network models. It is also more suitable for future prediction of the operating state of a double-roller lithium battery electrode mill. This paper is organized as follows: In Section 2, a coupled model of the electrode mill roll deformation is developed, including a stiffness deformation model, an elastic bending model, and a rolling force model. In Section 3, the distributed LSTM predictor and the MLP-based Bayesian LSTM depth fusion predictor are proposed. In Section 4, we conduct experiments to verify the accuracy of our proposed prediction models.

II. DEFORMATION COUPLING MODEL
In the rolling process, there are two main reasons for changes in the thickness of lithium battery electrodes: one is the reason for the rigidity of the electrode mill itself; the other is the reason for the battery electrodes themselves. While the mill performance changes are often the main factor, the battery electrode strip is generally a secondary factor. Therefore, we study lithium battery electrode mill changes on the impact of the electrode thickness. The following specific introduction of mill stiffness deformation model, roll elastic deformation model, and rolling force model.

A. STIFFNESS DEFORMATION MODEL
When the electrode mill stiffness is small, the frequent fluctuation of the rolling force will affect the accuracy of the electrode thickness. At the same time, because the rolls work for a while along the roll surface wear distribution is not uniform, which also causes the change of the mill stiffness, so the electrode mill design generally uses a high stiffness design.
The overall elastic deformation of the mill mainly includes the elastic deformation of the mill plate [33], the elastic deformation of the roll bearing seat, the elastic deflection, and flattening of the rolls, etc. The overall elastic deformation of the mill, which includes the deformation of various components, is collectively referred to as the mill bounce. Seat elastic deformation is shown in Fig.2.
During the rolling process of the lithium battery electrode, the two upper and lower rolls of the mill will give pressure to the electrode, producing a plastic deformation state, which makes the thickness of the electrode change, while the lithium battery electrode will act on the opposite force of the roll, producing an elastic deformation state, making the roll gap larger and the bouncing phenomenon occur [34]. Combined with the mill operating parameters to establish the following bouncing model：

B. ELASTIC BENDING MODEL
To simplify the analysis process, we simplify the roll to a solid stepped axis and use the reference system method to establish the deflection and angle synthesis relations of the beam model in multiple reference systems. The simplified model is shown in Fig.3. As shown in Fig.3, respectively, take the roll left journal midpoint, the two end faces of the roll surface, the roll right journal midpoint, for four cross-sections: 1 S , 2 S , 3 S , 4 S .
The lengths between two sections are 1 L , 2 L , 3 L . The flexural stiffness of the three segments are: The support reaction force at P is P R and the shear force at 2 S is B Q . In the dynamic system I, B Q and P R can produce a relative deflection of the section 4 S in the beam model concerning section 1 S , which is set as 4 calculation formula is as follows: Then, the total relative deflection of section 4 S in the beam model for the dynamic system I am: The cross-section B is at the support B, so its absolute deflection is 1 0 B a f = . The static reference system is represented by the support B [35], Then the absolute deflection of section B is equal to the sum of the implicated deflection of the section by the dynamic system I and the relative deflection of section B to the dynamic system I, i.e. So: And the implicated deflection of the dynamic system I to the section m is: VOLUME XX, 2017 9 Then the deformation of the section m is: Therefore, in a certain dynamic system, the deformation of the roll deformation section depends on some force parameters, which are set as input parameters of the prediction model in the subsequent establishment of the prediction model using the equivalent transformation.

C. ROLLING FORCE MODEL
The rolling pressure model is used to calculate the rolling pressure in the production of the electrode, and it is also a very important variable in the mill control system, and its calculation accuracy directly affects the quality and thickness of the lithium battery electrode. We mainly study the area where the plastic deformation occurs between the roll and the lithium battery electrode piece in contact with the force, as shown in Fig.4, to form a more ideal rolling model.

1) PLASTIC DEFORMATION
In the plastic deformation region, we defined the absolute and relative deformations [36], as shown in Table Ⅰ.
Where, h ∆ refers to absolute depression, l ∆ refers to absolute extension, b ∆ refers to absolute spreading, h η refers to depression rate, l η refers to extension rate, b η refers to spreading rate. h ∆ is the result of subtracting the exit thickness from the entrance thickness of the lithium battery electrode, which is defined as: h=H -h ∆ (11) l ∆ is the result of subtracting the exit height from the entrance height of the lithium battery electrode, which is defined as: ∆ is the result of subtracting the exit width from the entrance width of the lithium battery electrode, which is defined as: η is the percentage of the pressed down the amount of the lithium battery electrode to the entrance thickness, which is defined as: l η is the extension of the lithium battery electrode as a percentage of the entrance height, which is defined as: η is the percentage of the widespread of the lithium battery electrode to the entrance width, which is defined as: From expert experience, the volume of the lithium battery electrode does not change before and after rolling, so:

2) FRONT-SLIP AND BACK-SLIP
During the rolling process, the thickness of the electrode decreases gradually, but its rolling speed is slowly increasing, resulting in a velocity difference between the electrode and the roll. When the velocity of the electrode mass located in the plastic deformation zone is larger than the velocity of the horizontal component of the roll speed, the front slip phenomenon is generated; when the velocity of the electrode mass is larger than the horizontal component of the roll speed, the back slip phenomenon is generated. The front-slip and back-slip zones allow the rolling process to continue in an equilibrium state. Assuming that the width of the electrode does not change and no deformation occurs in the rolling, the relationship between the thickness of the electrode before and after rolling and the speed is: Where, H v denotes the roll speed, h v denotes the velocity of the electrode. As shown in Fig.5, we define the front-slip zone I and the back-slip zone II. A neutral surface exists in the middle of zone I and zone II, indicating that the horizontal speed of the roll corresponding to this surface is equal to the horizontal speed of the electrode ( cos H f is the difference between the speed of the horizontal direction of the roll and the rolling speed and the percentage of the horizontal speed of the roll, which is defined as: Where: h f denotes front-slip value; H f denotes back-slip value; v denotes rolling speed； Therefore, we need to consider the front-slip value when using continuous rolling with tension. The front-slip and back-slip values need to be considered when calculating the torque required to rotate the rolls and the tension between the stands. In addition, the front-slip value needs to be calculated when adjusting the mill, otherwise, the tension will be too high and the electrodes will be pulled off.

D. THICKNESS CONTROL INFLUENCING FACTORS
According to the first three sections, we can obtain the expression for the polymorphic coupling model for the thickness of the lithium battery pole piece: From the above equation, it can be seen that the electrode thickness formula is complicated and the accuracy cannot be guaranteed, which is not conducive to practical application. Therefore, when the traditional pure mathematical model is used to predict the electrode thickness, the maximum deviation will reach more than 20% [32], this has not been able to meet the demand in actual production. We summarize the factors affecting thickness control according to the previous article as:  The thickness of the electrode entrance and the thickness of the electrode exit, whose thickness variation ultimately affects the actual thickness of the electrode.  The variation of the rolling speed of the battery electrode sheet. It directly affects the size of the frontslip and back-slip, the size of the rolling force in the plastic deformation zone at the exit of the electrode sheet, and the actual thickness of the electrode sheet.  Variation in the tension of the lithium battery electrode sheet. In the winding and unwinding mechanism of a rolling mill, the rolling tension can change the stress state of the electrode and ultimately the deformation resistance of the electrode [18]. The unwinding tension and the winding tension can fine-tune the thickness of the electrode sheet.  The variation of roll gap. The electrode will be subject to friction and plastic deformation between the roll and the electrode piece during the rolling process, resulting in a larger roll gap, which affects the actual thickness of the electrode. Through specific analysis of the rolling process and control system, we can get that H (Thickness of the electrode entrance), h (Thickness of the electrode exit),

A. LSTM MODEL
When the network is too deep, standard RNNs tend to suffer from long-time dependence and gradient disappearance [25,[39][40]. In other words, when the time step is too large, the information carried by the preceding neurons is lost, because no structure in the standard recursive layer can control the flow of memory itself alone.
To solve this problem, long and short-term memory networks have been proposed, which are improved recursive electrodeular structures whose differences from RNNs are shown in Fig.6. The network structure of LSTM is much more complex than that of RNN. Microscopically, the LSTM adds three gate structures and a unitary state structure to the RNN structure to achieve the function of remembering the information state values at any past moment [41]. In order to better characterize the coupling relationship between variables, we adopt a single-to-single structure LSTM model, and use the training error of each category of variables as the input bias of the next category of variables， and the model structure is shown in Fig.7. The gate structures are: the oblivion gate, input gate, and output gate, which are responsible for different functions in the LSTM. Among them, the forgetting gate is mainly responsible for information filtering, determining the trade-off of information, and this decision is made by the ''sigmoid'' function in the forgetting gate [40]; The input gate is the information selection storage unit in the LSTM, which is responsible for selecting the appropriate and useful information to be retained in the LSTM; the output gate mainly determines the output of the neural network.
The LSTM uses storage elements to transfer information from past outputs, rather than making the output of the RNN unit a nonlinear function of the weighted sum of the current input and the previous output [41][42]. In other words, instead of using only the hidden state H , the LSTM uses the electrode state C to preserve long-term information. LSTMs mainly uses three gates (forgetting gate, input gate, and output gate) to control the electrode state C . The forgetting gate is used to control from previous electrode states 1 t C − to the information of current electrode states t C ; The input gate determines how much input should be held in the t C ; The output gate determines the output t H according to t C . Compute the LSTM output for step t using the following equation: Among them, ' ⋅ ' represents the inner product between variables, ' * ' represents the outer product between variables, W and b are trainable weights and biases, respectively, and I , F and O represent input gates, forgetting gates and output gates, respectively. These three doors have the same shape,

B. BAYESIAN LSTM MODEL
The parameters in the LSTM model are divided into two main categories, one is the model parameters, which are updated by the gradient descent algorithm during training, and the other is the hyperparameters, which are generally fixed values or vary with predefined rules during training, such as batch size, learning rate, weight decay, and gamma in kernel functions, etc [43]. The goal of hyperparameter tuning is usually to minimize the generalization error, but other optimization goals can be customized for specific tasks. Bayesian conditioning uses a continuously updated probabilistic model to "focus" promising hyperparameters by extrapolating past results and is suitable for optimization on spaces smaller than 20 dimensions. Through Monte Carlo sampling, Bayesian deep learning networks train the network several times and take the average of all losses, which is then used for back-propagation to obtain the distribution of weights and biases. In a normal LSTM network, the parameters including all weights and biases are trained constants. Bayesian LSTM can treat the weights and deviations as a random distribution that is constantly updated with iterations. Each parameter obtained through Bayesian LSTM network training is based on the mean and variance of the weight and bias distribution [44][45]. Fig.8 shows the differences between the ordinary LSTM network and the Bayesian LSTM network. The ordinary LSTM network obtains a certain weight and bias after training. In contrast, the Bayesian LSTM obtains weights and biases after training as a distribution with mean and variance. Assuming training data D , Bayesian inference can be used to compute the posterior distribution of the weights ( | ) P w D [44]. The predicted distribution of the input data x is given by  (31) For the evaluation of the effect of modeling the feature dataset, we define the cost function: For the variance to be expressed as a non-negative real number, we make the following definition: (34) After that we can get: Where L θ ∂ ∂ is the gradient found by the backpropagation algorithm on an ordinary LSTM network. Thus, to understand the mean and standard deviation, we can calculate the gradient by backpropagation and then scale and transform it [46][47].

C. THICKNESS PREDICTION MODEL FRAMEWORK
We propose a distributed prediction model combining a deformation coupling model and a deep learning network for the thickness prediction problem of lithium battery electrode mills. The model framework is shown in Fig.9, which consists of three main components: data preprocessing, data reorganization, and Bayesian LSTM subpredictors. Data pre-processing using K-means and Z-score methods to reduce data dimensionality and eliminate redundant data. Data reorganization classifies the processed features with a total of data variables. For n data variables, n Bayesian LSTM sub-predictors are designed. Finally, we use fusion nodes to fuse the predictions of multiple subpredictors. The fusion node uses an artificial neural network MLP. MLP is a fully connected combination of artificially designed neurons, which applies a nonlinear activation function to model the relationship between input and output.
Our proposed thickness prediction model framework is able to mine the "historical information" of the data while fusing the correlations between different feature data.

A. DATASETS
Our experiments use the data set of the lithium battery electrode thickness control system of a battery equipment company in Xingtai, Hebei Province, China. These data include system preset and actual thickness values. The lithium battery electrode thickness was selected for prediction to validate the proposed model. The dataset contains electrode thicknesses under the action of 3 different P F (45t-55t, 55t-65t, and 65t-75t). The data under different working conditions are collected according to time steps, and there is a strong correlation between the time series data, that is, the data of the previous time step will have an impact on the thickness prediction results of the next time step. Datasets contains a total of 30,000 data samples, and Table Ⅱ shows some of the data. We selected the first 90% of the data for training and the remaining 10% for testing. Between model training, we need to normalize the parameters to map the original data into the (-1, 1) interval, maintaining the correlation between the input and output data [10].

B. MODEL PARAMETER SETTINGS
We use LSTM deep neural networks and Bayesian LSTM deep neural networks for our experiments. We use Relu as the linear activation function for the Bayesian LSTM layer and the LSTM layer. For the Bayesian LSTM layer, we set up an MLP layer with the size of each layer set to 24. Supervised training was performed using the Adam algorithm, and the model was trained using small batch sampling. The model hyperparameters, such as learning and batch size, were obtained from experiments, as shown in Tab.3.
For the LSTM layer, we also use Adam's algorithm for supervised training. The training-test dataset partitioning is kept consistent with the Bayesian LSTM layer. The hyperparameters of the model are set based on expert experience, as shown in Table Ⅲ.
The performance of the model is evaluated by the following five metrics. The mean square error (MSE) can reflect the value of the convergence loss function of the neural network and is defined as: Where ˆi y is the prediction, i y is the ground truth, and n is the number of data.
The root-mean-squared error (RMSE) is a better way to describe the data and is defined as: The mean absolute error (MAE) and Pearson's correlation coefficient (R) between the predicted and reference values were also explored in the experiments [48]. In addition, to describe the relationship between the predictor and response variables, we explored the Rsquared coefficient (R 2 ):

C. EXPERIMENT Ⅰ
Under this experiment, the performance of the LSTM model was verified by predicting the electrode thickness and evaluating the causality. We used 7 variables selected to influence the thickness control as input data for the distributed depth model. The tests are divided into three groups, and there are 45t-55t, 55t-65t, and 65t-75t. The training process of the LSTM model is to optimize the RMSE by continuously iterating. The training process of the LSTM model is shown in Fig.10. Using the 45t-55t condition as an example, we experimentally set the total number of iterations to 250, specified an initial learning rate of 0.005, and reduced the learning rate after 125 training rounds by multiplying by a factor of 0.2. As can be seen from the fig.10, the RMSE is close to stabilization near the 80th generation and is in the 20-23 interval. Therefore, the LSTM model is fast to train and has low approximation error. We set the maximum time step in the time series to 250, i.e., including 250 observations. The blue and yellow lines indicate the basic facts of the thickness and the predictions of the model, respectively. Fig.11 shows a comparison of the measured data and the results of the 250-step prediction. The best fits were obtained for Observation 17 and Observation 34 at 45t-55t conditions. The poor prediction of Observation 78 is due to the small-time variation trend of the measured data, which leads to a poor fitting effect. The best fits were obtained for Observation 20 and Observation 24 at 55t-65t conditions. While the prediction results of Observation 42 are located on both sides of the measurement results, the reason is that as the number of data increases, the hyperparameters and the amount of data does not fit. Under the conditions of 65t-75t, Observation 84 has the best fitting effect. While Observation 4 and Observation 28 have the same trend as the measured data, but fluctuate more, because the data itself is too noisy, which affects the prediction accuracy. The maximum MAE of most observation points (Observation 20, Observation 24) appears at the data inflection point, which indicates that the data is noisy and the LSTM is not enough to suppress the noise. However, the predicted trend is close to the measured data, and most of the predicted values are within the confidence interval. We counted the RMSE of each observation point and the results are shown in Fig.12. The frequency of RMSE=0 for the pole piece thickness samples is the highest in the 45t-55t and 55t-65t conditions, and the error curve has a centrosymmetric distribution. Under the conditions of 65t-75t, the frequency of RMSE of the pole piece thickness samples is the highest in the interval of [-10, 10], which indicates that the data distribution is not uniform and the noise is large. We counted additional metrics for the observation sites, as shown in Table Ⅳ. We can find that the absolute error between predicted and observed values is smaller and the data fits better in the 45t-55t condition; in the 55t-65t condition, although the absolute error is larger, the correlation between the data is stronger. However, the RMSE of the training model is too large to meet the accuracy of pole piece production. Therefore, an optimization algorithm is needed to adjust the LSTM model hyperparameters.  Fig.13. The blue and black lines indicate the thickness measurement data and the model prediction respectively. The Bayesian LSTM model performs better than the LSTM model when there is a large amount of data noise in each working condition. In Fig.13 (a), there is spike noise in the data, but the prediction results are not disturbed and the prediction results remain in a stable range; Fig.13 (b) and (c) show that the data from multiple subpredictors contain a large amount of attribute noise, resulting in a large range of data fluctuations, but the prediction results remain in a stable range. Similar to experiment Ⅰ, we counted the MSE and RMSE of the Bayesian LSTM predictor under three working conditions, as shown in Fig.14. The MSE of the predictor under 45t-55t working condition is 94.57 and the RMSE is 9.72, which indicates that the convergence function error is larger; the MSE and RMSE of the predictor under 55t-65t and 65t-75t working conditions are lower, and the predictor performance is better. In addition, the maximum R and R 2 values of the proposed Bayesian LSTM model represent the best fit between the predicted and observed values. The results show that the fitted data are highly correlated and can effectively reduce the uncertainty of some of the largescale parameters or parameter combinations. The specific predictor indexes are shown in Table Ⅴ. Compared with the LSTM predictor, the Bayesian LSTM predictor has a high fitting ability and strong data denoising ability, which can effectively characterize the relationship between the predictor variables and the response variables.

E. EXPERIMENT Ⅲ
In this experiment, we compared other deep network models with the approach proposed in this paper. As there are few studies related to thickness prediction for lithium battery pole mills, the operating principle and deformation parameters of steel rolling mills are similar to those of lithium battery pole mills. Therefore, we have analyzed and compared the relevant results from the thickness prediction studies of steel rolling mills. Among them, no baseline model includes a feature selection process and uses all features as network inputs. In our previous study, we used a BP neural network prediction model, but the model tended to fall into local minimization. Later, a genetic algorithm was used to optimize the weights and thresholds of the BP neural network [22]. Other model prediction results are shown in Table Ⅵ.
We chose other network models including GA-BP [22], DBN-LSSVM [27], LSSVM [49] and GRNN [50]. In order to make the comparison results more accurate, we chose the mean value of the metrics for analysis.

V. CONCLUSION
This paper mainly introduces a physics-data-driven prediction model for the control of the thickness of lithium battery electrodes. This paper analyzes the current development status of the research on lithium battery electrode mills and electrode thickness prediction control and summarizes the development trend, research content, and shortcomings of electrode thickness prediction control methods. To effectively reduce the difficulty of data acquisition in industrial production and further improve the operation performance and productivity of the electrode sheet mill, this paper establishes a multi-deformation coupling model based on the elastic deformation of the roll and the rolling force. Meanwhile, to improve the robustness of the prediction model of nonlinear control system under variable working conditions and to solve the problem of slow approximation speed of the traditional neural network, this paper proposes to establish the thickness prediction model by combining physical model and deep learning method and to optimize the LSTM neural network by using Bayesian theory.
In building the physical model, we analyze the influence of the elastic deformation of the roll on the thickness by starting from the key components affecting the thickness of the electrode and establishing the stiffness deformation model and elastic bending model. Meanwhile, the rolling force is also one of the factors affecting the thickness of the electrode. We define the plastic deformation region of the rolling process, determine the amount of plastic deformation between the roll and the electrode, and also analyze the effect of front slip and back slip on the consistency of the electrode. We obtained mathematical expressions to control the pole piece thickness production and determined the influencing factors affecting the electrode thickness control.
In building the physics-data-driven prediction model, we introduce the nonlinear deformation formula in battery electrode production into the data-driven model as a training constraint, considering the influence of the physical mechanics factors of the underlying pole mill on the prediction accuracy. Meanwhile, we use the LSTM predictor to learn the prediction for long time-series data. We find that the LSTM model does not have high prediction accuracy and does not suppress noise well under variable working conditions and insufficient data volume. Therefore, we use Bayesian theory to optimize the hyperparameters of the LSTM neural network, modeled by weighted sampling, and employ MLP to fuse multiple subpredictors to obtain a more stable output. The Bayesian LSTM can automatically find the optimal hyperparameters under different operating conditions and can eliminate redundant data, which reduces the requirement of large data volume while improving the convergence capability and fitting accuracy. The RMSE, MSE, MAE, R, R 2 of the Bayesian LSTM model are 5.11, 2.25, 1.852, 0.800, and 0.322, respectively. It has significantly lower errors and better characterizes the correlation between data compared with other prediction models.
Although we have improved the thickness prediction accuracy of lithium battery pole pieces and evaluated the correlation between different sensor data, there are still some problems that we need to solve further: 1. The working environment of the lithium battery pole piece rolling mill is complex. The related parameters of the pole piece thickness should also include the original density, coating quality and environmental factors. The thickness prediction model should add more nonlinear factors to improve its robustness of the thickness prediction model. 2. The deformation coupling model proposed in this paper can reduce the network's input to a certain extent and realize feature dimension reduction. However, this method cannot remove the redundant noise of the sensor and cannot extract the high-dimensional hidden features of the data. We can seek to adopt some feature extraction algorithms suitable for real-world industrial data and consider the coupling relationship between different features.
3. Although the Bayesian network can estimate the model uncertainty, it relies on prior knowledge and is computationally expensive. Other optimization algorithms can be sought to improve the uncertainty expression ability of the model.
With the demand for industrial production, the lithium battery pole piece rolling mill is gradually converted from the single-rolling mode to the double-rolling mode, and the process technology will be more complicated. Therefore, we will continue to study the coupling relationship between the double rolls and the battery pole pieces, propose a deformation coupling model that is more in line with the actual operating conditions, and continuously improve the thickness prediction model. The method proposed in this paper can be combined with other parameter estimation algorithms to study nonlinear parameter identification of different disturbances of lithium battery pole piece rolling mills and applied to other fields such as signal processing and process control systems.

Data availability
The code and data used to support the findings of this study have been deposited in the battery electrode thickness using polymorphic variable coupling model and data-driven method repository and can be obtained from the corresponding author upon request. SHUHAN DENG is studying for a master's degree from the Hebei University of Technology. He graduated from Harbin University of Science and Technology in 2020. He mainly engaged in