Introduction
Rolling bearings are vital to modern mechanical systems, but challenging operating conditions and extended use can degrade equipment performance and shorten its lifespan. Therefore, swift and precise fault diagnosis is essential for maintaining durability [1], [2], [3].
With advancements in computer technology, machine learning-based fault diagnosis methods have recently shown substantial success in classifying bearing faults [4], [5]. These methods fall into two categories: traditional and deep learning [6]. Traditional methods [7], [8] typically depend on expert knowledge, which can limit their effectiveness. In contrast, deep learning techniques, such as convolutional neural network (CNN), long short-term memory (LSTM) network, and generative adversarial network autonomously extract fault features directly from raw data, overcoming the limitations of traditional methods. Zhu et al. [9] transformed one-dimensional vibration signals into two-dimensional time-frequency maps using the Fourier transform and trained a CNN for fault classification. Zhao et al. [10] developed a model integrating CNN with the bidirectional gated recurrent unit for improved bearing fault classification.
Although deep learning models have advanced bearing fault diagnosis, increasing the neural network depths often complicates parameter optimization in the initial layers. The residual network (ResNet) addresses this issue with identity mapping. Hu et al. [11] enhanced diagnostic accuracy in few-shot scenarios using time-frequency augmentation. Liang et al. [12] developed a model combining wavelet transform with an optimized ResNet, where time-frequency signals are used for training and validated for optimal model selection. Tang et al. [13] employed transfer learning in a CNN to boost diagnostic accuracy, while Xu and Wang [14] introduced the FB (Fusion Bidirectional)-LSTM ResNet network model with promising outcomes. Zhang et al. [15] refined ResNet with a hybrid attention mechanism, weighting wavelet transform coefficients during signal preprocessing and testing the model on a gearbox fault dataset. Addressing noise interference ineffective fault feature extraction, Zhao et al. [16] incorporated a soft threshold function into ResNet, creating the deep residual shrinkage network (DRSN).
These methods have advanced fault diagnosis and classification, yet challenges remain. For time-series data, the local perception and parameter sharing traits of convolution networks limit their effectiveness with long-duration sequences, such as in aerospace bearings. Furthermore, the soft threshold function in DRSN can filter out critical fault features during signal filtering, potentially reducing classification accuracy.
To address these challenges, this paper introduces the LSTM-IDRSN (improved deep residual shrinkage network) model for aerospace-bearing fault diagnosis, which integrates LSTM with an improved soft threshold function in the residual shrinkage network (RSN). First, the LSTM module extracts initial features from the raw signal, reducing redundancy for subsequent processing. These extracted features undergo convolution operations before entering the improved residual shrinkage module, where an improved semi-soft threshold function (ISSTF) performs further denoising to complete deep feature extraction. Finally, a fully connected layer conducts the fault classification.
Research Methods
A. Long Short-Term Memory Network
LSTM [17] is a specialized type of recurrent neural network (RNN), as shown in Fig. 1. It integrates both current and past information to predict future states, effectively addressing gradient explosion and vanishing gradient issues that often affect traditional RNN.
LSTM uses three gating functions to control the cell state: the forget gate \begin{align*}\begin{cases} \displaystyle \boldsymbol {f}_{t} = \sigma (\boldsymbol {W}_{f} \cdot [\boldsymbol {h}_{t-1},\boldsymbol {X}_{t} ]+\boldsymbol {b}_{f} ) \\ \displaystyle \boldsymbol {C}_{t} = \boldsymbol {f}_{t} \otimes \boldsymbol {C}_{t-1} +\boldsymbol {i}_{t} \otimes \tilde {C}_{t} \\ \end{cases} \tag {1}\end{align*}
The input gate \begin{align*} \begin{cases} \displaystyle \boldsymbol {i}_{t} = \sigma (\boldsymbol {W}_{t} \cdot [\boldsymbol {h}_{t-1},\boldsymbol {X}_{t} ]+\boldsymbol {b}_{i} ) \\ \displaystyle \tilde {C}_{t} = \tanh (\boldsymbol {W}_{C} \cdot [\boldsymbol {h}_{t-1},\boldsymbol {X}_{t} ]+\boldsymbol {b}_{C} ) \\ \end{cases} \tag {2}\end{align*}
The output gate \begin{align*} \begin{cases} \displaystyle \boldsymbol {O}_{t} = \sigma (\boldsymbol {W}_{O} \cdot [\boldsymbol {h}_{t-1},\boldsymbol {X}_{t} ]+\boldsymbol {b}_{O} ) \\ \displaystyle \boldsymbol {h}_{t} = \boldsymbol {O}_{t} \otimes \tanh (\boldsymbol {C}_{t} ) \\ \end{cases} \tag {3}\end{align*}
In comparison with other time series analysis techniques, LSTM efficiently manages long-term dependencies in extended sequences. LSTM, when applied to multi-dimensional inputs, is capable of adaptively capturing intricate nonlinear patterns and providing superior fitting results, particularly in the context of highly fluctuating or complex data, without the need for intricate data transformations or feature engineering. Furthermore, LSTM does not rely on specific assumptions about the data distribution and can learn patterns directly from the data, making it broadly applicable across various use cases, particularly in cases where datasets do not meet the assumptions of conventional methods. Consequently, the incorporation of LSTM in this research enhances the model’s generalization capability.
B. Improved Deep Residual Shrinkage Network
1) Deep Residual Network
ResNet [18] mainly consists of residual block units (RBUs), as shown in Fig. 2, featuring batch normalization, rectified linear unit activation functions, convolution layers (Conv), and identity shortcuts. The innovation is the identity shortcuts, which enable input to bypass layers and reach deeper layers directly, helping prevent gradient explosion and vanishing gradient issues common in deep neural networks.
The deep ResNet is primarily composed of multiple residual modules, as shown in Fig. 3.
2) Deep Residual Shrinkage Network
The core component of DRSN is the residual shrinkage block unit (RSBU), as shown in Fig. 4. This enhanced residual shrinkage block integrates a subnet that adaptively adjusts the threshold based on input data variations, improving the network’s transmission efficiency and learning capability through dynamic parameter tuning. This design strengthens the network’s expressive power, adaptability, and generalization across diverse datasets.
The DRSN substitutes the RBU with the RSBU, preserving the benefits of the original deep ResNet structure while enhancing noise reduction and irrelevant feature filtering. Its structure is illustrated in Fig. 5.
3) Improved Semi-Soft Threshold Function
The soft threshold function is commonly used in signal denoising, setting values within the threshold range \begin{align*} y=\begin{cases} \displaystyle {x-\tau } & {x\gt \tau } \\ \displaystyle 0 & {-\tau \le x\le \tau } \\ \displaystyle {x+\tau } & {x\lt -\tau } \\ \end{cases} \tag {4}\end{align*}
\begin{align*}y=\begin{cases} \displaystyle {x-\frac {\tau }{\textrm {e}^{x-\tau }}} & {x\gt \tau } \\ \displaystyle 0 & {-\tau \le x\le \tau } \\ \displaystyle {x+\frac {\tau }{\textrm {e}^{x+\tau }}} & {x\lt -\tau } \\ \end{cases} \tag {5}\end{align*}
Fig. 6 compares the ISSTF and soft threshold function. As illustrated, when the signal falls within the threshold range
4) Improved Deep Residual Shrinkage Network
The IDRSN introduced in this paper replaces the original soft threshold function with a semi-soft threshold function in the DRSN to denoise one-dimensional vibration signals, thereby enhancing deep feature extraction. The improved residual shrinkage module unit is illustrated in Fig. 7, replacing the original module and improving the overall network performance.
C. Network Model Classification Evaluation
To quantitatively assess the fault diagnosis classification performance of the proposed model, precision is used as the evaluation metric. The calculation is given by the (6):\begin{equation*} \textrm {p}_{\textrm {re}} = \frac {\textrm {TP}}{\textrm {TP}+\textrm {FP}} \tag {6}\end{equation*}
D. The Model Presented in This Paper: LSTM-IDRSN
The LSTM-IDRSN structure, as shown in Fig. 8, consists of the following components from input to output: LSTM module, Conv, multi-layer improved RSBU, and fully connected output layer. The process is as follows.
First, the raw one-dimensional vibration signals are labeled and randomly split into training and testing datasets. The training dataset is then input into the LSTM module to extract temporal features. These features undergo convolution operations and are passed into the RSBU module for deeper feature extraction. The resulting features are processed through a fully connected layer and the SoftMax activation function for fault classification. Finally, the model is validated with different fault testing datasets, and its performance is evaluated under varying noise signal intensities to assess its robustness in noisy environments.
Data Experiments and Validation Analysis
A. Aircraft Bearing Dataset
The dataset used in this study was collected from aircraft bearings by the DIRG Laboratory at the Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Italy [20]. An overview of the experimental setup is provided in Fig. 9(a). Data were collected across various damage categories, severities, and sensor positions, with a sampling frequency of 51,200 Hz. The experimental bearing comprises three roller bearings: B1, B2, and B3, as shown in Fig. 9(c). The data were captured using triaxial IEPE accelerometers mounted on the bearing and spindle support at positions A1 and A2, as illustrated in Fig. 9(b). These accelerometers, as shown in Fig. 9(d), have a frequency range of 1–12,000 Hz (amplitude ±5%, phase ±10°), a nominal resonant frequency of 55 kHz, and a nominal sensitivity of 1 mV/ms2.
Experimental platform for bearing data collection: (a) the test bench overview, (b) installation position of accelerometer, (c) experimental bearing, (d) three axis IEPE accelerometer.
The dataset in this study includes six fault types and healthy conditions, recorded at various positions and severities under a 100 Hz rotational speed and no-load operation, as shown in Table 1. Specifically, 0A indicates the healthy bearing condition, 1A-3A corresponds to varying damage levels in the raceway fault, and 4A-6A corresponds to varying damage sizes in the bearing’s rolling elements.
B. Data Preprocessing
Data collection uses a sliding window sampling method, as shown in Fig. 10. Since the window size is critical for capturing high-frequency components and detailed features of vibration signals, selecting an appropriately sized window is essential. If the window width is too small, the algorithm becomes more complex, and computational speed is negatively affected. Conversely, if the window width is too large, it becomes difficult to effectively analyze the overall distribution of the data. A 2048-point sliding window offers a better balance between high-frequency resolution and computational efficiency compared to larger or smaller window sizes [21].
Consequently, the study adopts a 2048-point sliding window for time-series data sampling, with the data categorized based on different fault types. Multiple samples are generated for each fault category. For each fault type, 80% of the samples are randomly selected for the training set, while the remaining 20% are reserved for the test set.
C. Experimental Setup and Results Analysis
The experiment is conducted on a computer with a Windows OS, NVIDIA GeForce GTX 2080 Ti, Intel Core i9-9900 processor, and 32 GB RAM. Python 3.6 is used for programming, with the Pytorch 1.9.0 deep learning framework. Each fault type has a total of 250 samples, with 200 samples allocated to the training set and 50 samples to the test set. The Adam optimizer is employed to accelerate model convergence.
The number of LSTM layers significantly influences the extraction of bearing fault features [22], while the learning rate governs the update of model parameters [23]. To comprehensively assess the impact of both the LSTM layer count and learning rate on the experimental results, the experiment was repeated 10 times. For each repetition, the LSTM layer counts were set to 1, 2, and 3, and the learning rates were set to 0.01, 0.001, and 0.0005 for each corresponding layer count. The control variable method was used to investigate the effects of these hyperparameters on model performance, thereby identifying the optimal combination. Since the accuracy of the test set reflects network performance, the average test set results were used as the final evaluation criterion. The classification accuracy results are provided in Table 2.
Table 2 shows that the highest classification accuracy is achieved when the initial LSTM extraction layer is set to 2 and the learning rate is set to 0.001. This setting is selected as the optimal configuration for the network model. The corresponding classification accuracy curve for the aerospace bearing test set is illustrated in Fig. 11.
The accuracy curve in Fig. 11 shows that the model begins to converge at a higher classification accuracy when epoch = 45. To provide a clearer visualization of fault classification, the accuracy confusion matrix in Fig. 12 is presented, where [0-6A] denotes different bearing fault types and sizes. The horizontal axis represents predicted values, and the vertical axis represents actual values. The darker colors in the matrix indicate higher classification accuracy.
In order to assess the computational complexity of the proposed model, the single run time of the proposed model is compared with that of traditional models. The experimental results are presented in Table 3.
As shown in Table 3, although IDRSN shows some improvement over the traditional DRSN model in terms of single run time, the proposed model saves 60.98% and 66.39% compared to DRSN and IDRSN respectively, demonstrating its significant advantages in computational efficiency and resource utilization. By leveraging the long-short term memory capability of LSTM and the optimization of the IDRSN architecture, the LSTM-IDRSN model not only reduces redundant computations but also enhances the speed and accuracy of the model when processing time-series data, providing a significant real-time performance advantage.
The proposed model is compared with several others, including the traditional DRSN [5], ResNet18, IDRSN, LSTM, ResNet, LSTM-ResNet, a deep convolutional neural network with wide first-layer kernels (WDCNN), and multilayer perceptron neural network (MLP). To minimize experimental variability, each experiment is conducted 10 times, and the average result is reported. Table 4 shows the fault classification accuracy for each model.
As shown in Table 4, the LSTM-IDRSN model achieves the highest classification accuracy, improving by 4.98% over the traditional DRSN model, reaching 96.43%. This indicates that the proposed model more effectively extracts and identifies fault features. Compared to IDRSN and ResNet18, the accuracy of LSTM-IDRSN improves by 2.56% and 8.19%, respectively. This suggests that for tasks like aerospace bearing fault diagnosis, which involve strong temporal characteristics, the LSTM-IDRSN model is better suited to handle complex temporal data and enhance temporal feature extraction. Compared to LSTM-ResNet, the LSTM-IDRSN model improves by 4.42%, further highlighting the advantages of the enhanced deep residual shrinkage network in feature extraction and noise reduction. The LSTM-IDRSN model also shows improvements of 11.3% and 55.18% over WDCNN and MLP, respectively. This suggests that the LSTM model can more accurately identify and classify fault types when dealing with complex fault patterns.
To assess the model’s robustness against noise interference, noise of different intensities was added to the acquired signals, which were then processed into the network for bearing fault classification. Given the unpredictable and complex working conditions of aerospace bearings, Gaussian white noise [24] was introduced to the bearing dataset, with noise intensity controlled by the signal-to-noise ratio (SNR). The SNR is defined as the ratio of signal power to noise power, typically measured in decibels (dB), and calculated as 10 times the logarithm of the signal-to-noise power ratio. The formula is given by (7):\begin{equation*} \textrm {SNR(dB) = 10log}_{10} \left ({{\frac {P_{s}}{P_{n}}}}\right ) \tag {7}\end{equation*}
To evaluate the proposed model’s performance on aerospace bearings under various noise conditions, noise levels of −4 dB, −2 dB, 0 dB, 2 dB, and 4 dB were added to the original bearing samples. Lower SNR values represent higher noise levels, making fault identification more challenging. To minimize experimental variability, the average accuracy over 10 runs was used as the evaluation metric. For instance, Fig. 13 illustrates the original fault-free sample data and the corresponding data after adding noise at an SNR of −4 dB.
Fig. 14 compares the average fault classification accuracy of each model at various SNR level. The results indicate that the classification accuracy of LSTM-IDRSN, IDRSN, and DRSN improves with increasing SNR, while models without denoising capabilities exhibit lower and more variable diagnostic accuracy at different noise conditions. Notably, the LSTM-IDRSN model maintains an accuracy above 80% even at higher noise levels, exceeding 90% when the SNR exceeds 0. These results indicate the proposed model’s strong classification performance and robustness, with minimal fluctuation across different noise environments.
Conclusion
In this study, an LSTM-IDRSN fault diagnosis model is proposed for bearing fault classification. The model exploits its unique recurrent structure and gating mechanisms to effectively incorporate both past and future information from bearing time-series data. This allows the model to capture long-term dependencies more flexibly and effectively, while also performing initial feature extraction throughout the entire network, thereby minimizing redundant work in subsequent modules. Additionally, the model’s noise resistance and diagnostic performance are enhanced through modifications to the soft threshold function. The model was validated using an aviation bearing fault dataset, yielding the following conclusions:
Compared to traditional DRSN, LSTM-IDRSN achieved a 4.98% improvement in classification accuracy, reaching 96.43%. Moreover, when compared to five other models—IDRSN, ResNet18, LSTM-ResNet, WDCNN, and MLP—the classification accuracy was also similarly improved.
In a comparison of single run execution times, the LSTM-IDRSN model demonstrated a runtime of 2.06 s, significantly reducing execution time compared to 5.28 s for DRSN and 6.13 s for IDRSN, thereby resulting in a substantial improvement in efficiency. Additionally, the model exhibited strong robustness under various noise conditions, maintaining high fault classification accuracy even when the signal-to-noise ratio was low.
With an SNR below 0, the model still shows potential for further accuracy improvements. Future work will focus on enhancing the algorithm’s performance through signal enhancement techniques.
Although the LSTM-IDRSN model performed well in the experiments, it has certain limitations. The current research primarily focuses on fault diagnosis based on vibration signals. Future research could explore the integration of vibration signals with other modal data to enhance the accuracy and reliability of fault diagnosis. Multimodal data fusion enables the comprehensive utilization of information from diverse data sources, compensating for the limitations of a single data source. This approach facilitates improved fault identification and classification, providing a more holistic solution for fault diagnosis in complex mechanical systems.