Quantitative Diagnosis Method of Gearbox Under Varying Conditions Based on ARX Model and Generalized Canonical Correlation Analysis

Fault diagnosis of gearboxes under the condition of varying speed and varying load is a hotspot and difficulty in the research of gearboxes. The response signals of gearbox under varying conditions exhibit non-linear and non-stationary characteristics, which increase the complexity of quantitative diagnosis of gearbox faults. A quantitative diagnosis method of gearbox faults based on the improved autoregressive with exogenous (ARX) model and generalized canonical correlation analysis (GCCA) is proposed in this paper. The ARX model is improved based on incremental recursive identification of Kalman filter to build system transfer characteristic models using the excitation and response signals of gearboxes. ARX models of gearboxes are nonlinear and the GCCA is proposed to build the quantitatively relationship between models with faulty status and healthy status. Simulation and experiment results indicate that the proposed method can effectively identify the severity of the gearbox failures under varying conditions and provides a promising method for the quantitative diagnosis of rotating machinery.


I. INTRODUCTION
The fault diagnosis of gearboxes under varying operation conditions has attracted the extensive attention of researchers [1]- [4]. The development trend of most gearboxes tends to be more integrated, and gearboxes will be replaced when the fault reaches a certain level. Therefore, diagnosis of the existence and severity of a fault is more in line with the engineering requirements. And, there are two key factors that need to be solved in the quantitative diagnosis The associate editor coordinating the review of this manuscript and approving it for publication was Yu Wang . method of gearbox faults under the variable working conditions. The first is the model of the gearbox system, and the second is a quantitative evaluation of the model.
Traditional gearbox models are typically built in two ways: the first one is a mass-spring-vibrator model [5]- [7], in which a linear approximation research method is used, and the other one is built via a subsystem modelling method [8], [9] to simplify the torque, force, and other factors in the relationship between the force and reaction in the system. An external description is often referred to as an output-input description. The starting point is to treat the gearbox system as a ''black box''. Based on this premise, an external description avoids VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ the characteristic information inside the system and directly reflects the dynamic causal relationship between the external variable groups of the system. A differential equation, transfer function model, and time-series model are common external models of the system. The canonical correlation analysis (CCA) is a common pattern recognition method, which can realize the recognition of linear correlation components in random vectors [10], [11]. However, if there is a nonlinear relation between the variables, a CCA does not always extract useful features [12], [13]. A generalized canonical correlation analysis (GCCA) can highlight the nonlinear relationship between vectors, and can solve this contradiction. It is therefore more universal and more suitable for a fault diagnosis based on the system characteristics.
A typical dynamical system is composed of three main subsystems from the perspective of the transfer path of system response, namely: active sub-system; generating response signals such as noise and vibration, mechanical and airborne transfer paths and passive subsystem; receiving the noise and vibration [14]- [16]. The fault diagnosis methods can be divided into two categories: excitation-based and responsebased methods. From the perspective of the load effect of the system for fault diagnosis and condition monitoring, an excitation-based fault diagnosis method mainly includes a method based on the stator current [17], an instantaneous speed analysis [18], [19], and a torsional vibration analysis [20], which mainly makes fault diagnosis from the perspective of system input, without considering the impact of system input on system output. A response-based fault diagnosis method mainly includes diagnosis method based on vibration application [21], [22], lubrication performance [23], acoustic signal [24], temperature signal [25] and the Hertz deformations [26] which are mainly used for a fault diagnosis and condition monitoring from the system output perspective, and do not care about the impact of the transfer path of response signals on the fault diagnosis.
The diagnosis method based on the system characteristics is studied in this paper, in order to overcome the lack of excitation-based and response-based methods. From the perspective of system characteristics, the system-based diagnostic method comprehensively considers the effects of input signals, output signals and signal transmission paths on fault characteristics. The transfer function is an inherent property of the system, which will not change under varying conditions, and only change when the gearbox fails. The stator current and vibration acceleration signals are respectively selected as the excitation and response signals, and a transfer function is introduced to model the gearbox system characteristics under varying conditions.
The variational mode decomposition (VMD) is recognized as a powerful tool for analyzing nonlinear and nonstationary signals and hence is extensively employed for mechanical vibration analysis [22], [27]. Combined with order analysis can effectively diagnose the gearbox failure [28]- [30]. It can be used as a powerful reference for the diagnostic effect of the method proposed in this paper. In this study, an ARX model of incremental recursive identification (IRI) based on a Kalman filter (KF) is proposed, which can better reflect the real status of the system, improve the stability of the model, and achieve a strong anti-interference capability under varying starting and stopping conditions. Aiming at the problem of a non-linear quantitative identification of the ARX model, a nonlinear quantitative evaluation method based on GCCA is proposed to extract the non-linear correlativity of the system transfer function between gearbox features under a faulty status and those under a healthy status, and the quantitative evaluation results is expressed using the canonical correlation coefficients.
A flow chart of the proposed method is shown in Figure 1. To elaborate on this method, the input and output signals of the gearbox under varying conditions are first determined in Section 2.1. In Section 2.2, a model of the system characteristics of the gearbox is established from the perspective of a transfer function and a Laplace transform. Based on this, a gearbox system characteristic model based on ARX is described in Section 2.3. And in Section 2.4, an improved ARX model for an incremental recursive system parameter identification based on the application of a KF is proposed. The improved model verified through a simulation is described in Section 3.1, and a nonlinear quantitative evaluation method based on a GCCA is described and compared it with the analysis results of CCA, in Sections 3.2 and 3.3. An experimental verification of the proposed method is provided and compared with the traditional fault diagnosis method in Section 4. Finally, the conclusions are drawn in Section 5.

II. MODELLING OF GEARBOX SYSTEM UNDER VARYING CONDITIONS
Under variable operating conditions, the dynamic signals representing mechanical faults not only have non-linear 40630 VOLUME 8, 2020 characteristics, they also have time-varying characteristics, creating many difficulties in the feature extraction and fault diagnosis of a gearbox [31]. On the one hand, nonlinear coupling may occur between certain frequencies of the excited dynamic signals owing to mechanical component failures or defects [32]. On the other hand, the fault frequency under variable operating conditions is a function of time and follows changes with the working conditions [33]. Therefore, both changes in the operating conditions and mechanical failures can cause changes in the nonlinear coupling relationship and coupling degree of the signal, which is the result of their combination, rather than a simple superposition, which brings about new challenges to the modelling of a gearbox.

A. SELECTION OF INPUT AND OUTPUT SIGNALS 1) SELECTION OF INPUT SIGNAL FOR VARYING CONDITIONS GEARBOX SYSTEM IDENTIFICATION
A monitoring and diagnosis method based on a current signal was first applied to a fault diagnosis of a motor body structure. Numerous research results have shown that a diagnosis method based on a motor stator current can be used to effectively diagnose and analyze equipment faults.
Because the current contains torsional vibration and frequency information of the power supplied to the AC motor, torsional vibration information of the motor can be reflected in the stator current. Any torsional vibrations with a specific frequency will produce a sideband around the fundamental frequency component of the current signal. Moreover, the size of the sideband reflects the magnitude of the input torque, and the width of the sideband represents the frequency of the input torsional vibration. The existence of a gear fault and its location can be qualitatively distinguished by identifying the torsional vibration frequency. Therefore, the stator current is selected as the input signal of the ARX model of the gearbox under the starting and stopping conditions.

2) SELECTION OF OUTPUT SIGNAL FOR VARYING CONDITIONS GEARBOX SYSTEM IDENTIFICATION
The vibration acceleration signal contains the characteristic signals related to the faults and working conditions, and is widely used in fault diagnosis and condition monitoring. The vibration response measured at the surface of the gearbox housing is mainly composed of the gear meshing frequency, the characteristic frequency of faulty gears, the bearing frequency, the motor, and the interference caused by the complicated transmission path between the excitation source and the sensor. Because the response function of the gearbox transmission path is a function of frequency, the vibration response and side frequency feature of a gearbox vary with the gearbox speed. Figure 2 shows a time-frequency diagram of the vibration acceleration with a sinusoidal change of the given velocity, which further verifies that the vibration acceleration signal contains the speed information. In addition, the vibrationbased acceleration signal contains the characteristic information not only related to the fault, but also related to the working conditions.

B. MODELLING OF GEARBOX SYSTEM
The gearbox is a continuous-time dynamic system. For a gearbox system with a continuous single-input and singleoutput, if the input of the system is u(t) and the output is y(t), the corresponding differential equation can be expressed as follows: Taking the Laplace transform on both sides, m a 0 s n + a 1 s n−1 + · · · + a n−1 s + a n (2) Therefore, the characteristic transfer function of the gearbox system can be uniquely determined by two coefficient vectors.
From the perspective of system diagnostics, vectors P and Q represent the inherent characteristic of the gearbox. In the case of a normal gearbox during the starting and stopping process, the vectors P and Q should remain unchanged. Once the vectors P and Q change significantly, the gearbox becomes faulty.

C. MODELLING OF GEARBOX SYSTEM BASED ON ARX
To establish a system model, it is necessary to select different system identification methods. Auto regressive (AR) and auto regressive and moving average model (ARMA) are frequently used methods for establishing the parameter models. ARX is a controlled autoregressive model that contains not only the autoregressive term but also the inputs that the AR and ARMA models do not have. Therefore, the ARX models can fuse the function of analyzing the influence of the input on the output, which is compatible with the condition monitoring, and a fault diagnosis based on the system characteristics [2], [34]- [36]. According to the model described in Section 3.1, to establish a system ARX model reflecting the status of the gearbox. The motor stator current and vibration response signals are VOLUME 8, 2020 used as the input and output variables of the ARX system, respectively. It is assumed that the known output and input variables of the gearbox are y(1),y(2) · · ·y(N + n) and u (1) , u (2) · · · u (N + m), respectively, and the output and input orders of the gearbox system characteristics are n and m.
The error of the system model is (t), which assumes that the model parameters are A i and B j . A series of systems composed of N equations can be obtained by combining the time-series relation of the input and output of the gearbox: where φ is the matrix-vector of the gearbox system characteristic, θ is matrix-vector of the gearbox, and θ = [A 1 , · · · ,A n , B 1 , · · · ,B m ] T . The traditional ARX identification minimises its loss function according to the least-squares principle. The loss function is shown through Equation (7).
The estimated value of parameter θ can then be obtained using Equation (8).
Measuring the input time-series {u t } and output time-series {y t } by using the system, the model of the gearbox system characteristics can be established, and the system model parameters of the gearbox can be estimated as θ = [A 1 , . . . ,A n , B 1 , . . . ,B m ] T , that is, the system characteristics and health status of the gearbox are contained in these model parameters. Therefore, the system analysis and fault diagnosis can be conducted based on the established model.

D. IMPROVED ARX MODEL OF IRI BASED ON KF
The parameter estimation of the transmission characteristics of the traditional ARX model of a gearbox uses the leastsquares method, which has the advantage of being easy to understand. However, it has a large identification error, particularly when the operating conditions are complicated, and the external interference will seriously affect the recognition speed and increase the error, which is difficult to be used for online identification. Therefore, an improved ARX model of IRI based on KF method is proposed in this study, which can reduce the memory usage, improve the accuracy of the algorithm, and realise the real-time status monitoring of the gearbox under varying conditions.
The IRI algorithms generally take the form shown in Equation (9) [37].
whereθ(t) and y(t) are the parameter estimate and system response, respectively, at time t;ŷ(t) is the predicted system response obtained from the identification model at time t; and Because ARX can realise a modelling with a linear regression form, (t) is the specific change ofŷ(t) for parameter θ.
whereθ(t) is the response parameter of a gearbox, e(t) is the noise signal for the excitation and response of the gearbox system. Supposing a change in θ 0 (t), the actual excitation and response variable of the gearbox satisfy Equations (11) and (12).θ The optimal incremental matrix of the system identification algorithm can be calculated, and the KF can be further applied. The IRI algorithm based on a KF can be completely expressed as follows.
where S 2 is the identification variance matrix of the system. To improve the application of the incremental recursive algorithm, a KF was used, and the forgetting factor λ is added during the identification process. The recursive algorithm of system can then be modified as follows.
ARX not only contains the regression of the ARX model itself, it also contains the influence of the input. Therefore, it has a strong anti-interference capability for noise signals. When a noise is small, the model has fine robustness and a simple algorithm. When the noise is large, the influence of noise on the identification accuracy can be compensated by increasing the order of the model. The use of the IRI model based on a KF to replace the one-time least-squares operation can greatly reduce the memory occupancy and calculation time of the system, which is suitable for online identification.

III. SIMULATION OF IMPROVED MODEL AND QUANTITATIVE EVALUATION BASED ON GCCA A. SIMULATION OF IMPROVED ARX MODEL
For a signal with noise interference, the formula can be expressed as Equation (20).
where f, τ and q(t) are the signal oscillation frequency, delay factor, and random noise, respectively, W s is the transient waveform period, and ξ is the damping coefficient The formula of the signal without noise interference can be expressed through Equation (21), and its waveform is shown in Figure 3.
After adding random noise q(t), the output after identification using the traditional ARX model is as shown in Figure 4.
The simulation results in both Figures 5 and 6 show the existence of boundary effects. Ignoring the boundary effect, the improved ARX model clearly has more anti-noise interference ability and can better reflect the real status of the system.

B. PRINCIPLE OF GENERALISED CANONICAL CORRELATION ANALYSIS
Through a GCCA, a more accurate template of the health of a gearbox can be established to better realise a fault identification and fault quantitative diagnosis. The following is a description of the GCCA method applied. It is   assumed that a random variable has the following nonlinear relationship [13].
where f i (x) denotes the nonlinear correlation form of random variables y and x. The random variable x is mapped to the high-dimensional vector m in the form of f i (x).
where m denotes a mapped high-dimensional random vector, and f i (x) denotes the mapping function corresponding to m. Through the above mapping, the nonlinear relationship between y and x is transformed into a linear relationship between y and m.
After high-dimensional mapping, the linear correlation between y and m can be extracted using the CCA. By analysing the correlation between y and m, the nonlinear relationship between y and x can be obtained. VOLUME 8, 2020

C. OF QUANTITATIVE EVALUATION BASED ON GCCA
The output of the transfer function obtained by the ARX model described in Section 3 is a nonlinear curve. To evaluate a non-linear curve quantitatively, a method for applying a GCCA is proposed. Through this method, an accurate quantitative relationship between the transfer function and the health template of the gearbox under varying conditions can be obtained, and the gearbox under varying conditions can thus be quantitatively diagnosed. To verify the effectiveness of the GCCA further, the simulation data are used to verify its validity in extracting the nonlinear correlation. The specific steps are as follows:

1) GENERATE THE SIMULATION DATA
Firstly, two columns of random variables for samples x 1 and x 2 are generated, and a sufficient number of samples, 5,000, is used to meet the needs of the correlation analysis, and thus two columns of 5,000 rows of random sample vectors are generated.
Secondly, the samples y 1 and y 2 are generated according to the random samples x 1 and x 2 , where y 1 = x 2 1 + randn, and y 2 = x 2 + randn. Thus, two 5,000 * 2 datasets, X and Y, are obtained, in which y 1 is squared with x 1 and y 2 is linear with x 2 . The correlation between the sample sets X and Y is extracted using the GCCA to verify whether it can extract the nonlinear relationship mentioned above.

2) ANALYSE THE RELATIONSHIPS BETWEEN DATASETS USING THE CCA
The traditional CCA was directly applied to the sample sets X and Y , and after the analysis, it was found that only one pair of canonical correlation variables (CCV) has a correlation coefficient of greater than 0.01. The correlation coefficient of the pair of variables reached 0.8922, and the coefficients of each variable are as shown in Table 1.
As can be seen from Table 1, if the CCA is used directly, only the linear correlation between x 2 and y 2 can be extracted, and the nonlinear correlation between x 2 1 and y 1 is not extracted.

3) ANALYSE RELATIONSHIPS BETWEEN DATASETS USING THE GCCA
For the generated sample sets X 5000 * 2 and Y 5000 * 2 , the polynomials are selected to map them at high dimensions. A nonlinear mapping of random variable x is applied to a higher dimensional vector m. After mapping, m is extracted according to the generalised eigenvalues of the Lagrange matrix, which contains six random variables, and its sample matrix form is [x 1 x 2 1 x 3 1 x 2 x 2 2 x 3 2 ] 5000 * 6 . The relationship between m and y is extracted using the CCA to analyse the nonlinear relationship between x and y.
The GCCA is used to extract the correlation between y and m, and the six pairs of generalised CCVs are extracted according to the eigenvalues and there are only 2 correlation coefficients larger than 0.01. The correlation coefficient of the first pair of CCVs reached 0.9996, which shows a strong correlation. The specific coefficients of the variables are provided in Table 2.
As can be seen from Table 2, the CCV reflects the correlation between −0.001x 1 − 0.0278x 2 1 + 0.0008x 2 − 0.0001x 3 2 and 0.0278y 1 . It can be determined that x 2 1 is the main provider of the correlation of dataset M , and y 1 is the primary provider of the correlation of dataset Y . Both have the same symbol, representing a positive correlation. Therefore, the CCV successfully extracted the correlation between x 2 1 and y 1 .
The correlation coefficient of the second pair of CCVs reached 0.8922, and the coefficients of each variable are as shown in Table 3.
From Table 3, we can see that the generalized CCV obtained can accurately reflect the correlation between 0.0034x 1 − 0.0005x 2 1 − 0.0001x 3 1 − 0.4984x 2 + 0.0002x 2 2 − 0.0003x 3 2 and −0.0003y 1 − 0.4465y 2 . An analysis of its coefficient composition clearly shows that x 2 and y 2 are the main providers of the correlation of datasets M and Y , respectively. The two symbols are identical, representing a positive correlation. Therefore, the CCV extracted the correlation between x 2 and y 2 .
To summarise, the GCCA can extract the nonlinear correlation between vectors, and establish a more accurate template of the health of the gearbox through the GCCA to better  realise a fault quantitative diagnosis, and establish a more accurate correlation model of a complex system.

IV. EXPERIMENTAL VERIFICATION OF QUANTITATIVE EVALUATION OF GEARBOX SYSTEM UNDER VARYING CONDITIONS
The gearbox is a vital transmission component in rotating machinery and is a component with a high failure rate in the equipment, so it has great significance to diagnose the fault. Nowadays, the development trend of most gearboxes tends to be more integrated, and when the fault reaches a certain level, it is replaced. Therefore, diagnosis of the existence and degree of a fault is more in line with the engineering requirements. Thus, this paper aims to study the quantitative diagnosis of gearboxes. From the perspective of the characteristics of a gearbox system, according to the improved ARX model of IRI based on a KF and a nonlinear GCCA, the input and output signals are comprehensively analysed. And the transmission characteristic curve (TCC) of the gearbox and the correlation coefficient between the faulty gear and the normal gear are obtained, thereby quantitatively evaluating the fault state of the gearbox.

A. EXPERIMENT DESIGN
In this study, normal, worn, and broken tooth gears were further tested under the starting and stopping conditions on an SQ fixed-axis gearbox test bench to verify the effectiveness of the proposed method in terms of its gearbox fault diagnosis.
The layout and mechanical structure of the experimental table are shown in Figure 7. To facilitate the experimental comparison, three gears, namely, a normal gear, gear #1 with a broken tooth, and gear #1' with a worn teeth surface, were mounted to the input shaft of the gearbox, and the three gears could be manually switched without disassembly. Figure 8 shows photographs of gears #1' and #1, in which a  broken tooth on gear #1 and a generally worn failure on each tooth on gear #1'.
In this experiment, a PCB acceleration sensor with a sensitivity of 100 mV/g and an AEMC current plier with a sensitivity of 10 mV/A were used to collect the vibration and current signals of the gearbox, respectively, and the sensor installation position is as shown in Figure 7(b). The sampling frequency of the vibration acceleration signal is 10 kHz, and the sampling length is 22 s. The sampling frequency of the current signal is 1 kHz and the sampling length is also 22 s, and both are sampled simultaneously. To ensure the comparability of the test data, the faulty tooth and normal tooth of the gearbox experienced the same nonlinear speed-up process presupposed using a programmable computer, where the speed increased from 800 to 1,800 rpm, as shown in Figure 9. Because the characteristic frequency of the gearbox is time-varying under a varying speed, the frequency is given in an ordered form for analysis, as shown in Table 4.

B. DIAGNOSTIC EFFECT OF TRADITIONAL METHODS
Traditional gearbox fault diagnosis methods can be divided into excitation-based and response-based fault diagnosis methods. In this paper, an original method is proposed to use   stator current and vibration acceleration signals as excitation and response signals respectively, and comprehensively analyse excitation and response signals to monitor the failure status of gearbox from the perspective of system characteristics, as shown in Figure 10.
Response based fault diagnosis method is widely used in traditional diagnosis methods, especially the fault diagnosis method based on vibration signal, which is the earliest method used in fault diagnosis. Moreover, time-domain analysis and order analysis are the most used analysis methods in fault diagnosis. The VMD is recognized as a powerful tool for analysing nonlinear and nonstationary signals, and hence is extensively employed for mechanical vibration analysis [22], [27]. Therefore, VMD combined with order analysis is used in this section as the traditional response-based fault diagnosis method to analyse the vibration signal of the gearbox with a broken tooth, and the analysis results are used to verify the effectiveness of the proposed method. Figures 11(a) and 11(b) show the time-domain waveform under the start of a faulty gear and for a normal gear, respectively. However, it is difficult to identify the waveform anomaly caused by the fault from Figure 11(a). As the root cause of this phenomenon, although gear #1 has a partially broken tooth, the tooth surface is basically intact along the involute, and thus the disengagement of the tooth surface does not occur during the meshing process. That is, the fault only affects the local meshing stiffness of the gear, but does not cause a severe tooth surface impact. Therefore, for such a   failure, a time domain analysis often has difficulty achieving the desired diagnostic results.
The vibration signal of the gearbox with a broken tooth is decomposed by VMD, and the result is shown in Figure 12. It can be noticed in Figure 12 that there are five intrinsic modal functions (IMF) were correctly recovered by the VMD approach. The IMFs contain the broken tooth fault information of the gearbox. Reconstructing the gearbox fault signal using the IMFs, the time-domain waveform of the reconstructed signal is shown in Figure 13.
The change in speed causes many frequency components are mixed together and cannot distinguish the characteristic frequency of the fault. The order analysis is able to correlate the frequency spectrum and time history with the rotational speed of rotating components. Moreover, after VMD and signal reconstruction, some unnecessary signals are removed, and the fault is more easily identified. So the order spectrum analysis of the reconstructed signal data is carried out in this section, the analysis results are shown in Figure 14.
It can be seen from Figure 14, the order spectral curve has prominent peaks at the 32 and 38 orders. According to order analysis theory and gearbox characteristics, order 32 correspond to the meshing frequency of #1 and #2, which is independent of the gearbox faults. However, the peak corresponding to the order 38 represents the meshing frequency of the faulty gear, which characterizes the broken tooth fault feature of the gear. Therefore, the VMD combined with order spectrum analysis can effectively diagnose the fault type of the gearbox. However, Order analysis distorts the amplitude of signals, which will affect quantitative diagnostic results. Many papers have further improved the performance of order analysis, such as tacho-less order analysis [38] et al., which has significantly improved the defect.
This proves that the experimental scheme designed in this paper is reasonable and the fault of gear can be detected effectively. It provides a powerful reference for the diagnostic effect of the method proposed in this paper.

C. DIAGNOSTIC EFFECT OF THE PROPOSED METHOD
In this paper, a new quantitative diagnostic method of gearbox under variable working conditions is proposed independently. The method comprehensively analyses the excitation and response signals from the perspective of system characteristics. To compare the diagnostic effect with the VMD and order analysis, and further verify the effectiveness of the quantitative diagnosis of the proposed method. The stator current and vibration acceleration signal of gearboxes with a broken tooth gear, a worn gear and a normal gear under the starting and stopping conditions were analyzed by the method.
If the order of ARX model selected is too small, the transfer characteristics of the system cannot be fully expressed. However, if the order of the ARX model selected is too large, a signal redundancy will occur and many unnecessary burrs will appear on the characteristic curve of the system, which may affect the recognition performance of the subsequent identification systems. According to the principle of the AIC information criterion combined with the characteristics of the experimental objects and experimental experience, the order of the ARX model was determined to be 18 in this study. From the perspective of the gearbox system characteristics according to the improved gearbox ARX model and GCCA, the transmission characteristics of the gearboxes with a normal gear, a worn gear, and a broken tooth gear are analysed using the stator current signal and the vibration acceleration signal. And the TCCs of three kinds of gearboxes are obtained, respectively, as shown in Figure 15.
As is clear from Figure 15, the TCCs of the gearboxes of a worn gear, a broken tooth gear and normal gear have respective remarkable characteristics and are easily distinguished. By comparing with the TCC of the normal gearbox, it can be judged whether a gearbox has faults and fault types. And, the diagnosis results are consistent with the results of VMD  and order analysis in section 5.2. Therefore, the method can effectively identify the fault type of the gearbox.
Through the analysis of TCCs of gearboxes, it can be found that, the TCC of the gearbox with a worn gear is very similar to that of the normal gearbox, except that a significant peak appears near the frequency of 500 Hz. Therefore, for the worn gear used in this experiment, as long as its operating frequency is greater than 500Hz, it can still be used normally, but this practice is not recommended in practical engineering applications. And the TCC of the gearbox with a broken tooth maintains a high amplitude throughout the entire frequency band, which indicates that the broken tooth fault is a more serious fault and must be replaced once it occurs.
From the perspective of the characteristics of the gearbox system, according to the improved gearbox ARX model and GCCA, the canonical correlation coefficients of the gearboxes of the worn gear and the broken tooth gear relative to the normal gearbox are 0.8925 and 0.3454, respectively, as shown in Table 5. Clearly, the correlation coefficient between a broken tooth and normal gearbox is less than that between a worn gear and normal gearbox. It demonstrates numerically that the fault of broken tooth is more serious than that of wear gear, which is consistent with the previous analysis. This indicates that the smaller the correlation coefficient between a faulty gearbox and a normal gearbox, the more serious the failure. It is proved that the proposed method can quantitatively diagnosis the fault severity using the correlation coefficient.

V. CONCLUSION
The basic idea of system characteristic diagnosis is that the system characteristic is an inherent property of the gearbox system and has nothing to do with the running state. When the system characteristics change, it indicates that the gearbox system is malfunctioning. The system characteristic model of gearbox is established based on the improved ARX model. Considering the quantitative diagnosis of the severity of gearbox faults, GCCA was used to evaluate the quantitative relationship between the faulty gearbox and the healthy gearbox. After normalization, the proposed method pays more attention to the differences between morphological features of the TCC, and does not care about the relationship between the amplitude or energy of the TCC. According to the different TCC forms and the correlation coefficients with the TTC form of the normal gear, the type and the severity of the gears fault can be diagnosed. Simulation and experimental results verify the effectiveness of the system characteristic diagnosis method. Comparing with VMD and order analysis, the proposed method can realize the quantitative diagnosis of gearbox faults under varying condition, and has more engineering value which can provide reference guidance for the replacement of the gearbox.