Harmonic Reducer Performance Prediction Algorithm Based on Multivariate State Estimation and LargeVis Dimensionality Reduction

As an important transmission component of industrial robots, the harmonic reducer determines the positioning accuracy, bearing capacity and service life of the robot end-effector. Predicting the performance can grasp the working status in advance and avoid major losses caused by uncertain factors such as component damage. The current paper focuses on a harmonic reducer performance prediction algorithm based on Multivariate State Estimation Technique (MSET) and LargeVis dimensionality reduction. Firstly, an accelerated life test platform is designed to collect multi-dimensional parameters that can characterize the operating state of the harmonic reducer throughout the life cycle. Afterwards, as far as the MSET method is concerned, the fault warning threshold is set according to the residual between the constructed memory matrix of the health state data and the actual observed value. Finally, utilizing LargeVis to reduce the dimensionality of multi-dimensional features, combining with Mahalanobis distance to construct a health index degradation model, and then selecting Long Short-Term Memory (LSTM) network to predict the downward trend of the harmonic reducer. The analysis of the accelerated life test data of the harmonic reducer demonstrates that the proposed method can send out the fault warning signal 18 minutes in advance in the sample with a life of 5.7 hours, and has a strong ability to predict the degradation trend of the harmonic reducer.


I. INTRODUCTION
Industrial robots are the most common type of robots, which are widely used in intelligent manufacturing. In the era of Industry 4.0, industrial robots also play an indispensable role in it [1]. Industrial robots can replace people to do many things, such as painting of cars and tightening of screws, which require high stability, low failure rate and low vibration of industrial robots [2]- [5].
Reducers are an important component of industrial robots [6], and their life and stability are directly related to the operating conditions of industrial robots. Precision reducers are utilized in all joints of industrial robots to improve the overall stiffness and output torque of the robot. There are two main types of precision reducers used in robots, RV reducers and harmonic reducers. RV reducers are large in size compared to harmonic reducers and are mainly used in large joints of industrial robots. Harmonic reducer is small in size, large transmission ratio, small mass and high transmission efficiency and transmission accuracy, which is widely used in wrist and small arm hand joints of industrial robots. The performance of harmonic reducer is directly related to the safety and stability of industrial robots.
The model diagram of harmonic reducer is shown in Figure 1. Harmonic reducer mainly consists of flexible wheel, rigid wheel and wave generator. As the use time increases, the harmonic reducer motion accuracy decreases and fatigue damage occurs. Damage to the harmonic reducer occurs in the robot will bring a large economic loss. At present, the factory uses regular inspection and maintenance to troubleshoot the harmonic reducer. There is a certain lag in this troubleshooting method, and it is important to establish a harmonic reducer health assessment model to predict its performance [7].

FIGURE 1. Harmonic reducer model diagram
Researchers have conducted some theoretical and experimental studies on the performance evaluation of mechanical structures such as reducers. Zhang [8] investigated the reliability evaluation method of harmonic reducers using transient finite elements and accelerated life tests. Qian [9] proposed a multi-failure mode time-varying reliability analysis method based on a two-loop Kriging model to analyze the time-varying reliability of reducers. Liu [10] proposed a simple index indicating the failure severity based on the ratio between the characteristic frequency component of the failure and the currently sampled vibration signal. The fault characteristic frequency components are identified by applying wavelet packet decomposition and Hilbert transform to the original signal. Shiau [11] proposed a nonparametric regression-based accelerated life stress model for describing the accelerated degradation curve. An estimate of the mean time to failure under normal conditions of the product was obtained by analyzing the relationship between the stress level and the acceleration factor. Crk [12] proposed a multivariate multiple regression function parametric analysis method for applied stresses using a degradation trajectory-based approach. Doksum [13] proposed a variable stress accelerated life test model based on the concept of cumulative degradation with a Wiener process and inverse Gaussian distribution, which has more flexibility for nonparametric life models. Li [14] built a performance margin model based on the hysteresis and transmission error of harmonic reducer as key parameters, and analyzed and quantified the multi-source uncertainty to construct a reliability model. Jun [15] proposed a new n-ARIMA based equipment performance degradation model method, and used a non-smooth autoregressive integrated moving average model to build an equipment performance degradation model to verify the performance degradation of OTM650. Riascos-Ochoa [16] proposed a Levy framework to simulate the effect of degradation sources on the degraded system, and obtained the moments of reliability function, probability density and lifetime. The performance of harmonic reducers is often expressed through multiple state parameters, such as vibration, transmitted torque, temperature, etc. These performance evaluation methods, designed using a single state parameter, do not reflect the performance of harmonic reducers.
Some scholars have also developed performance evaluation models through neural networks. Ma [17] proposed a particle swarm optimization based reliability life prediction analysis for harmonic reducers. Wang [18] proposed a hybrid prediction method for remaining service life prediction of rolling bearings using a correlated vector machine regression with different kernel parameters to represent the bearing test dataset and adaptively estimate the remaining service life using an exponential degradation model combined with Frechet distance. Prudhom et al [19]. showed that a simple fault severity index using short-time fourier transform (STFT) information can be proposed based on the notation over definition of time-frequency transform instead of fault indicators. Wang [20] proposed a bearing performance evaluation method based on topological representation and hidden Markov model, where a topological network of original features is obtained by self-organizing mapping, and a hidden Markov model is used as the evaluation model to assess the bearing performance degradation trend. These performance evaluation models established using neural networks are able to predict the performance state of the mechanism, but a large amount of experimental data is required to establish convolutional neural networks and machine learning models, harmonic reducers have a long service life, and it takes more than ten years to collect hundreds of data sets to establish performance evaluation datasets, and the model training process is complex and prone to overfitting.
In brief, our significant technical contributions are the following: (1) A test platform for the accelerated life of harmonic reducer is built, and multiple sets of multi-dimensional data parameters such as input speed, output speed, input torque, output torque, temperature and acceleration in X, Y and Z directions are collected during the operation.
(2) The designed MSET method can realize the comprehensive condition monitoring of the harmonic reducer, and the fault warning signal can be sent out 18 minutes in advance by setting the fault threshold.
(3) The dimensionality reduction of the characteristic parameters is realized by the LargeVis technology, and the performance degradation trend information of the harmonic reducer is predicted by combining with the LSTM method.

II. Accelerated Life Test of Harmonic Reducer
In this paper, a test platform is set up to conduct accelerated life experiments and collect multidimensional parameters that can characterize the operation state of the harmonic reducer during its full life cycle. The test platform consists mainly of drive frequency conversion motor, harmonic reducer, torque sensor, magnetic powder brake, accelerometer, as shown in Figure 2. Among them, the performance parameters of the drive motor are shown in Table 1, with a rated power of 1.1kW and a rated torque of 3.5 N·m. It is connected with the harmonic reducer through a coupling to drive the harmonic reducer. The performance parameters of the measured harmonic reducer are shown in Table 2. The torque sensor selects a range of 5 N·m and 200 N·m respectively to measure the torque of the input and output terminals. The TR-3 acquisition instrument matching the sensor is selected, and the RS232 interface is adopted to communicate with the computer. The model of the magnetic powder brake is FZ200J/Y, which can provide a rated torque of 200 N·m, all of which meet the test requirements.  Maximum input speed (r/min) 6000 Average input speed (r/min) 3500 Backlash (Arcsec) 20 Design life (h) 10000 In order to obtain the full life cycle data of the harmonic reducer, several groups of accelerated life tests are designed and carried out in this paper, and the results are shown in Table 3. According to 14 groups of accelerated life tests, most of the failure forms of harmonic reducer are flexible wheel fracture, and a few are flexible bearing damage. The operating time ranges from several hours to dozens of hours, with strong uncertainty. If the traditional method based on physical model is used, there will be a large error. The MSET method based on data drive proposed in this paper is used for condition monitoring and fault warning to inform the appropriate time for preventive maintenance of equipment.
In addition, the lifespans of samples No. 6 and No. 9 are 3.1 hours and 1.1 hours, respectively, which is relatively short. Combine with the analysis of the test site and consultation with the manufacturer of the harmonic reducer, the two groups of samples are found to be short in running time due to manufacturing defects. Therefore, there is no reference value. At the beginning of the test process, the harmonic reducer is in an accelerating state, and the data after the harmonic reducer runs smoothly is intercepted for analysis. In addition, sensor errors during the test may cause a large difference between individual data and the overall sample, which should be discarded in the subsequent model calculation.
When the harmonic reducer is running, the temperature in the body rises due to friction heating. Especially in the later period of use, the vibration intensifies and the input and output torque cannot be maintained. Therefore, input speed, output speed, input torque, output torque, temperature and acceleration in X, Y, and Z directions are collected in the test to analyze the performance of the harmonic reducer.

III. Harmonic Reducer Performance Prediction Algorithm
Harmonic reducer performance prediction algorithm mainly includes condition monitoring and degradation trend prediction. The realization principle diagram is shown in Figure 3. The condition monitoring adopts the MSET method, and compares the residual error between the memory matrix constructed by the health state and the actual observation value, thereby realizing the failure warning of the harmonic reducer. Degradation trend prediction is to employ LargeVis feature dimensionality reduction and LSTM network to predict the degradation trend of harmonic reducer.

A. HARMONIC REDUCER CONDITION MONITORING METHOD 1) MSET method
MSET [21] is a non-parametric statistical modeling method based on the weighted average of historical data, which is suitable for failure warning of mechanical equipment with higher test cost and greater risk factor. For industrial robot equipment, there is a certain risk in its fault test, and it is difficult to obtain the fault feature database. Hence, the MSET technology is utilized to estimate the true state of the observation vector by learning the mechanical knowledge contained in the historical monitoring data, and then a fault warning model is established.
Firstly, the historical memory matrix D is constructed by utilizing the monitoring data under in normal operation state, and then the MSET model is established. Assuming that n variables are collected by the sensor , the observation vector at a certain sampling time tm can be expressed as: Where xn(tm) is the observed value of variable xn at sampling time tm. Then the historical memory matrix D can be constructed as: The memory matrix D is the basis of the MSET model, and the construction of the memory matrix is the process of learning and remembering the normal operating characteristics of the harmonic reducer. Its columns represent the values of all observed variables at a certain sampling moment, and its rows represent the values of an observed variable at different sampling moments.
For a new observation vector Xobs, it can be compared with the historical observation vector stored in the memory matrix D, and then the prediction vector Xest can be estimated. Xest can be expressed as the product of D and the weight vector w : Where Xest is the linear combination of the m historical observation vectors in D. The weight w represents the similarity measure between the vectors in Xest and D, and the weight w can be calculated by minimizing the residuals. The residual formula is as follows: The optimal weight vector w is selected to minimize the residual sum of squares between the newly measured vector Xobs and the predicted value Xest. Then the residual sum of squares can be given by: Take the partial derivatives of S(w) with respect to w1, w2,…, wm and make them equal to 0. Then the weight w can be written as follows: Where the nonlinear operator  chooses to use Euclidean distance operation.
When the harmonic reducer works under normal conditions, the model input value Xobs is located in the normal working space of the memory matrix D, and is close to the historical observation vector in D. Once a failure occurs, Xobs deviates from the observation vector in the memory matrix, the accuracy of the output value decreases, and the residual will increase accordingly. Hence, the residual between Xest and Xobs can reflect the abnormal state and fault information. By means of monitoring changes in residuals and setting reasonable thresholds, fault alarm can be achieved.

2) Feature selection
In view of the construction of the memory matrix D, different strategies are employed to construct the rows and columns.
The purpose of feature selection is to transform the raw data into a more meaningful form, thereby simplifying the model and improving performance. The feature selection of the Pearson correlation coefficient method is employed to screen the rows of the memory matrix. Because it has a simple process, high computational efficiency, and it is suitable for large-scale data. The degree of linear correlation between two variables is calculated by the Pearson correlation coefficient, and the value range is generally [-1, 1]. The larger the absolute value, the higher the degree of linear correlation. Then the overall Pearson correlation coefficient formula between the two variables is as follows: The formula for calculating the correlation coefficient between two samples can be expressed as: Due to the long sampling time and the huge amount of data, the memory matrix will be redundant, thereby reducing the computational efficiency. The commonly used equal interval sampling method is a typical one-time offline sample selection method, which can effectively control the size of the memory matrix. However, considering the changes of actual machining parameters and operating conditions, the estimation accuracy of the MSET with a fixed memory matrix will gradually decrease and frequent maintenance is required. Accordingly, it is necessary to improve the flexibility of the fixed memory matrix. In this paper, a dynamic memory matrix construction method based on KNN algorithm is adopted to construct the columns of the matrix, realize online calculation of MSET, and observe the distance between the vector and each sample in the training set in real time. Taking variable xi as an example, the method flow of adding observation vectors to the memory matrix is shown in Figure 4. Here, δ is a small positive number.

3) Sliding window method
Within the context of the current paper, the sliding window method is adopted to eliminate the uncertain factors and random interference of the harmonic reducer in operation, thereby improving the reliability of fault warning. By finding an appropriate sliding window width, the statistical characteristics of continuously changing residuals can be quickly captured.
Assuming that in a certain period of time, the sequence number of residuals estimated by the MSET method can be given by: Setting the width of the sliding window to N, then the moving average of N consecutive residual values within the window can be calculated as: The threshold of fault warning ɛt is determined by the maximum average residual value in the sliding window. Under the condition of normal operation of the equipment, the maximum residual average value between the normal observation vector and the estimated vector of the MSET model is ɛN. The fault warning threshold can be designed as: Where k is the alarm threshold coefficient, which is determined by the operator according to actual experience, and the value is generally greater than 1. When the residual value is greater than the residual threshold ɛT, the system sends out an alarm signal.

4) Experimental verification
Taking the sample with a lifespan of 5.7 hours in the 8th test as an example, the total number of samples is 20000 after deleting the data of abnormal and initial acceleration stage. The magnitude of the correlation between variables is calculated, and the Pearson feature correlation diagram is shown in Figure 5. After calculation, it can be known that the output speed and torque have little correlation with other variables, as shown in Table 4.   It can be seen that the speed and torque at the output terminal have little correlation with performance decline, so other state variables such as input speed, input torque, temperature and acceleration in the X,Y and Z direction are selected as the row vectors of the memory matrix. The curve diagrams of each variable are shown in Figure 6.
The column vector of memory matrix is constructed by dynamic method based on KNN. For 20000 data samples, the first 2000 samples are taken as the training set, the middle 3000 samples are used as the validation set, and the last 5000 samples are regarded as the test set. A memory matrix with 6 rows and 100 columns is constructed from the training set, and the standardized matrix is shown in Figure 7.
The failure threshold is determined through the validation set and the application of fault warning is carried out on the test set. When the residual exceeds the threshold, it directly indicates that the current state is abnormal, and then a fault alarm is issued. The residual distribution on the validation set is shown in Figure 8.

FIGURE 8. Sliding window analysis of residual distribution based on validation set
Using the sliding window method to calculate the maximum residual average value ɛN, the sliding window width is set to 200. Through calculation, the maximum residual average value is 0.4. Setting the alarm threshold coefficient k to 2.5, then the calculation method of the fault warning threshold is as follows: 2 The 5000 sets of operating data before the failure occurred as the test set, and the residuals are compared with the estimated vector calculated by MSET. The residual results are shown in Figure 9. According to the fault warning threshold, calculate the average residual value in a sliding window with a width of 200. The first alarm is issued at the 3890th sampling point, that is, about 18 minutes before the fault occurs, and the residual value at the subsequent sampling points increases sharply. The MSET algorithm can be used to evaluate the residuals between prediction vector and observation vector, and an alarm can be issued in time, thereby winning precious time for troubleshooting. In the meanwhile, the proposed MSET method can uniformly describe the states of multiple variables without modeling each variable separately, which can quickly realize the fault warning and has higher application value.

B. HARMONIC REDUCER DEGRADATION TREND PREDICTION 1) LargeVis dimensionality reduction method
The LargeVis technique is a data dimensionality reduction method for high-dimensional space. The state data of the harmonic reducer during the operation process contains abundant state information. The high-dimensional vector is mapped to the low-dimensional space through the dimensionality reduction method, and the overall distribution law of the observation data is visualized [22]. The basic process is shown in Figure 10. The conditional probability distribution of xi to xj in the higher dimensional space can be written as: Where pj|i denotes the probability that the sample point xj is a neighbor of xi, and σi represents the variance of the Gaussian distribution with xi as the center point.
The joint probability in high-dimensional space can be given by: The conditional probability distribution of the lowdimensional space can be expressed as: ( Where yi and yj represent two points in low-dimensional space, which have a binary edge with weight eij = 1 in KNN graph, where f(x) is: The objective function of the LargeVis algorithm can be written as: Where E is the set of positive samples, E is the weight uniformly set for the negative sample edge.

2) Distance evaluation method
The Mahalanobis distance is adopted to measure the covariance distance of the data, which is an effective method to calculate the similarity between two location sample sets. The dimension reduction feature of the harmonic reducer in the healthy state is selected as the baseline data, and the Mahalanobis distance between the sample data and the baseline data under different working conditions is compared. The health index is utilized to measure the health degree of the component. The smaller the distance between the sample point to be tested and the baseline, the better the health degree. In this case, the health index is defined as 1. Conversely, it represents the decline of health to a very poor state, and the health index is defined as 0.
Where the parameter k0 is adopted to adjust the density of the health value distribution corresponding to each state, so that the health value is evenly distributed in the interval (0,1). d(x, y) represents the Mahalanobis distance, which can be described as: 3) LSTM prediction method LSTM [23] is a special type of RNN network that can solve the problems of vanishing and exploding gradients of traditional RNN networks. LSTM adds a memory unit to the neural nodes of the hidden layer of RNN to record the historical information. And three gates of input, forget and output are added to control the use of historical information. For LSTM, its parameter settings are shown in Table 5, and the internal structure of neurons is shown in Figure 11. Observing the picture in Figure 11, where i, f, c, and o represent the input gate, forget gate, unit state, and output gate, respectively. σ and tanh represent the sigmoid and hyperbolic tangent activation functions, respectively. Then, at moment t, the calculation method of the input and output values of the LSTM network can be expressed by the following formula: The Root Mean Square Error (RMSE) is the square root of the ratio of the deviation between the predicted value and the true value to the number of observations. Here, RMSE is used as the performance measurement index of the predicted results, and the formula is as follows: Where ypi is the predicted value and yti is the actual value.

4) Experimental verification
In order to verify the effectiveness of LSTM method, the 8th and 14th groups of harmonic reducer samples with lifespans of 5.7 hours and 17.3 hours are used as examples for analysis. According to the correlation degree calculation of state variables in the previous section, acceleration signals in the X, Y and Z directions with the strongest correlation are selected for research. Similarly, the total sample size is 20,000 and 1.2 million after deleting the data of abnormal and initial acceleration phases, respectively. The mean, peak-to-peak value (ppv), standard deviation (std), and kurtosis value (kv) of time domain features are extracted to form the feature vector X, then: Utilizing LargeVis to reduce the dimension of the extracted features, and effectively calculate the similarity between different data sets by means of Mahalanobis distance. Since the data of the harmonic reducer is relatively stable in the first 10 minutes of operation, it is regarded as healthy data. The Mahalanobis distance is based on the data of the first 10 minutes, and the dimensionality reduction features in the health state are taken as the baseline data to calculate the distance between the sample data to be tested and the baseline data. Afterwards, the Mahalanobis distance is nonlinearly mapped, and the health index of each sample point is calculated according to the health index expression. Then the change graph of the health values is shown in Figure 12. For the sake of a simple notation, (a) represents the 8th group of experiment, (b) displays the 14th group of experiment, the following experiments are consistent with the characterization method of this experiment, and will not be repeated one by one. Here, the value of k0 is 1. It can be clearly seen from the figure that the overall trend of the health value is on a downward trend, and the results of the health assessment are consistent with the actual situation. It is directly proved that the combination of the LargeVis dimensionality reduction and the Mahalanobis distance method can better evaluate the performance degradation state of the harmonic reducer.  LSTM is employed to predict the performance state of the harmonic reducer. Figure 13 demonstrates the comparison between the actual and the predicted degradation curves of the two groups of experiments. Where blue indicates the actual degradation curve, and red represents the predicted degradation curve. It is found that with regard to the performance degradation, the predicted results are similar to the actual value, and infinitely close to the actual value.  Here, we select the first 80% of the data in the full life cycle samples as the training set, and the last 20% as the test set. The training error and test error are shown in Figures 14 and 15, respectively. As can be obviously understood from the figures, the training errors of the two experiments are 0.0004 and 0.0002, respectively, and the test errors are 0.0188 and 0.0096, respectively. The test error is slightly higher than the training error. But in general, there is no doubt that the LSTM method for predictive analysis has less error and better effect. It directly indicates that the LSTM method has a greater prediction effect on the performance decline trend of the harmonic reducer.

IV. CONCLUSION
In this study, the performance of harmonic reducer is analyzed based on multi-variable condition monitoring method and performance degradation prediction method. The target includes the following contents: a) The condition monitoring of the harmonic reducer is carried out to realize the function of fault early alarm. b) Predict the decline trend of harmonic reducer and provide technical support for preventive maintenance. We consider utilizing the MSET to realize the fault monitoring of the harmonic reducer, employing LargeVis to reduce the dimensionality of the multi-dimensional features, and combining the LSTM network to solve the prediction of the decline trend of the harmonic reducer. The accelerated life test verifies that the proposed method can not only give fault warning signals 18 minutes in advance in the sample with a life of 5.7 hours, but also has a strong ability to predict the degradation trend.