An Intelligent Hybrid Scheme for Identification of Faults in Industrial Ball Screw Linear Motion Systems

Reliability of high precision linear motion system is one of the main concerns in industrial and military systems. The performance and repeatability of these systems are influenced by their respective linear drives and load bearings. A fault in these members severely affects the safe working of overall system. This paper gives a reliable intelligent approach to detect and classify faults for linear motion systems based on deep learning methods. Accuracy in faults identification is highly dependent on improved features extraction. For this purpose, a novel Residual Twin CNN (ResT-CNN) is proposed that uses combination of 1-D and 2-D CNN in parallel learning which improves features extraction performance; followed by knowledge base-Remnant-PCA (Kb-Rem-PCA) architecture in combination with multi-class support vector machine (Mc-SVM). This novel hybrid combination proved very effective in accurate faults identification. The performance of proposed methodology was also validated by IMS-UC (Intelligent Maintenance Systems – University of Cincinnati) public bearing dataset. The results confirm the effectiveness of proposed scheme in comparison to existing state of the art techniques.


I. INTRODUCTION
Linear motion system (LMS) is the most common choice in precision motion applications, especially where high speed repeatability is desired under load. LMS covers wide range of applications including precision machining centers, industrial robots, automatic guided vehicles as well as surface actuation requirements in aerospace industries. LMS incorporates various subsystems, including power and control drive, actuation drive and linear motion structure. The actuation drive provides linear motion while carrying thrust loading. Ball screw (BS) is the most suitable linear actuation drive currently in practice due to high transmission efficiency, low friction and less backlash problems. Any degradation in ball screw drive leads to failure of entire system. These linear drives The associate editor coordinating the review of this manuscript and approving it for publication was Xinyu Du . carry big research potential for automation and aerospace industries and have gained popularity by researchers working with precision motion control systems. The utilization of BS linear drives for high precision applications also needs additional knowledge of possible failure modes and their effective investigation, since they have to accomplish critical tasks.
BS linear drives include servo motor with required speed reduction gear set and necessary bearings along with ball screw arrangement as shown in Figure 1. For high speed actuation drives, ball-screw and load bearings need to be considered as key components and therefore typical fault modes for these elements should be examined. Critical fault modes for these systems available in literature [1] include friction due to inadequate lubrication in bearings and ball screw, backlash and channel jamming in ball screw, wear or spall at bearing and ball screw surface and other structural faults. Accurate identification of these faults is of great industrial concern to ensure system availability and has been considered by many researchers in the past [1], [2], [4], and [5]. For this purpose, dedicated monitoring setups are required which essentially need expertise in electro-mechanics, controls and intelligent systems. This eventually becomes high resource solution.
Fault data monitoring for BS linear systems have been the area of primary concern by various researchers in the past [2], [3]. For this purpose, system input data was monitored after installing linear motion drive on actual system to ensure monitoring of reliable data [4]. Data monitoring was also performed by collecting fault signals using some suitable experimentation [5]. More specifically, BS linear drives were explored in many analytical, simulation and experimental research studies using different approaches.
Mechanical faults identification can be treated as pattern recognition problem; therefore extraction of appropriate fault signal features with non-stationary behavior represents multiple failure pattern characteristics. This signal pattern must include fault sensitive features; as the performance of intelligent monitoring system is greatly influenced by these features. Accurate features extraction, being sensitive to dynamics of mechanical systems, requires high degree of domain expertise. Due to these concerns, design of appropriate feature extraction technique is vital for intelligent fault diagnosis system.

II. BRIEF DETAILS OF PREVIOUS WORK
A number of techniques have been developed that give model-based engineering solutions and numerical approaches to solve system diagnosis problems [6]. These include analytical and experimental approaches [7] to solve system level problems [8] and diagnostics [9]. These solutions although give reliable results, however, undergo application limitation due to customized modeler for each system. Other methodologies utilize vibration and temperature signal data measurement by using suitable sensors and applying different feature extraction techniques for faults identification in critical elements [10], [11]. Health assessment of BS drives based on experimental and systematic studies was also performed for early fault diagnosis [12], [13]. Recently, the applications of machine learning and deep learning based intelligent techniques have rapidly increased for mechanical systems fault diagnosis [14], [15] and different approaches have been successfully applied for rotating machinery fault detection [16], bearings fault diagnosis [17] and gears fault classification [18]. These intelligent techniques work automatically by collecting and processing fault signals to identify condition of the system. An intelligent feature extraction technique for combination of faults in rotating machinery was proposed on the basis of Empirical mode decomposition (EMD) and Bayesian classifier was applied to diagnose faults [19]. Other contributions include faults classification using Support vector machine (SVM) for vibration signal data [20] and nearest neighbor classifier [21], [22]. In comparison to above mentioned machine learning techniques, deep learning techniques achieve better results, although its implementation in this area is still developing [23]. For BS drives data monitoring, a deep learning technique based on deep belief network (DBN) was performed using multisensor vibration signal data [23]. For motor faults, a lot of study has already been done previously [24] and will not be considered here. Among deep learning methods, Convolutional Neural Network (CNN) emerged as one of the most leading technique especially when dealing with more complex features [25], [26].
The reliable monitoring of BS drives is affected by insufficient and inappropriate data. The fault signal data from BS drive system is difficult to collect than from other systems. This implies that for normal working of linear motion systems, data collection can be performed by different methods; however, for different fault modes, sufficient data is not available. This unbalanced system data distribution affects the performance of algorithm leading to unsatisfactory results for minor data set [17]. Additionally, various fault detection techniques are in practice and have utilized successfully in different combinations [27]- [29], however, fault classifiers like Support Vector Machine (SVM) [30], [34] and Back Propagation Neural Network (BPNN) may sometimes misclassify samples because of low domain adaptability.
In order to deal with aforementioned concerns, this paper proposes data monitoring of BS linear drive using significant signal features from position error measurements. These remarkable features represent changes in system dynamics due to any upcoming failure. The focus of this work is to improve signal representation for improved feature extraction and classification. A novel combination of improved deep learning and knowledge base systems is proposed that gives better feature extraction and accurate faults classification for BS linear drive system. This hybrid arrangement gives superior performance over other techniques.
The summary of our contributions in this paper are as follows: a) Position measurement data was collected for fault-free and faulty conditions. Faults were induced in BS drive and load bearing to observe their combined effects on collected signals. b) Improved feature extraction with better domain adaptability was achieved using Residual Twin-CNN VOLUME 9, 2021 (ResT-CNN) structure, that uses parallel learning of 1-D and 2-D CNN's and fusing the extracted features; followed by knowledge-based Deep PCA (Kb-Rem-PCA) architecture that ensures distinct features reduction resulting in high classification accuracy using Multi-class Support Vector Machine (Mc-SVM). c) This novel hybrid scheme was successfully utilized to detect different faults in BS LMS with improved results comparable with state of the art techniques. The remaining sections in this paper are composed as follows: Section III describes details of our proposed hybrid methodology. Section IV gives experimentation details that include testing setup description and details of fault cases. Section V elaborates the detailed results of our technique followed by comprehensive discussion in section VI.

III. PROPOSED HYBRID SYSTEM
Mechanical faults in BS LMS, like those stated in Section I, disturb entire system process and degrade its performance to great extent. For high precision systems, LMS continuously undergoes acceleration and deceleration for each travel cycle between two extremes. This gives variation in position measurements which is affected by system dynamics, axial and thrust loading and behavior of control system. These remarkable deviations in position behavior can be well analyzed for different operation modes. Variation of these signal features gives an indication of changed mechanical behavior of actuation drive due to any possible fault. Observed measurements for different fault modes including ball screw (BS) friction, BS backlash, load bearing (LB) friction and LB roller damage are shown in Figure 2. Position error measurement was performed for different load conditions using sinusoidal motion profile to assess the capability of developed technique. Sinusoidal profile gives smooth motion transition and was tested with 20mm linear travel which was completed in 4sec with 1sec waiting time at position change. Three load scenarios were tested i.e. 600N, 750N & -750N. The tests were performed for both fault and no-fault (normal) conditions. To get significant number of observations, the sequence of motion profiles was repeated 3 times for each test. The dataset was generated for each test with 12 times repetition for each case under study.
Position measurement was conducted for normal function as well as for different faults as described in Section IV. The acquired dataset was transformed into 2-D gray image using CWT technique. This gives an opportunity to utilize two-dimensional features from collected signal data, and therefore time-signal can be considered as image feature set [31].
A. 1-D TO 2-D SIGNAL TRANSFORMATION USING CWT Different techniques have been utilized for signal to image transformation including Hilbert-Huang transform [44], Recurrence plots to transform 1-D signal to 2-D texture image [45], Wavelet Transformation [49], [50], signal to gray image conversion using energy values [51], etc. Wavelet Transform gives time-frequency analysis by decomposing input signal into a family of wavelet components; each one with a resolution according to corresponding scale. This gives high frequency (low time) resolution in low frequency region and similarly low frequency in high region. Continuous wavelet transform utilizes wavelet functions generated by translating wavelet function ψ q,p (t) with translation parameter q and scaling with factor p. The scaled translated and normalized wavelet is given by Continuous wavelet transform (CWT) of observed function f(q,p), where f ∈ R is given by [52] CWf (q, where,ψ q,p (t) represents wavelet complex conjugate for mother function ψ q,p (t). It should be noted that CWT is sensitive to transitions in local occurrence which means time distant CWT will remain unaffected from local transitions. Among different mother wavelet functions, Morlet wavelet has proved its effectiveness for non-stationary signal data because of transient impulse similarity [53], [55]. Time domain Morlet wavelet is defined by where, β is the shape factor for mother wavelet. Increase in β value increases time domain resolution. Morlet CWT represents improved signal characteristics and gives better time-frequency resolution. For different scaling parameters, CWT generates different coefficients for signal segments. Using these coefficients, signal can be expressed as 2-D image. The gray image matrix transformation P trans , 35138 VOLUME 9, 2021 by putting wavelet coefficients, can be represented as where Max K and Min K are max and min elements of matrix P. P trans gives gray values from 0 to 255. This gives CWT gray image for original signal as shown in Figure 3.

B. PROPOSED ResT-CNN ARCHITECTURE
At present, convolutional neural networks (CNNs) are extensively used for fault identification and image recognition problems. CNN comprises combination of multiple filtering and classification layers that extract significant features from data. Filtering stage constitutes a number of convolutional and pooling layers. Convolutional layer contains multiple weights (kernels) that provide new feature maps when input feature is convolved with kernels which are sensitive to certain features. The function of pooling layer is to minimize size of feature map without any change in number of feature maps. The classification stage contains fully connected layer, a multi stage perceptron, in which high-level features are extracted and given as input to the final layer, where output of CNN model is generated. The convolution process can be expressed mathematically as shown in Eq. (5).
where, K l i denotes the i th kernel weights for l layer. m l n j shows j th region in l th convolutional layer. W represents width of kernel and K l i (j ) gives j th weight of the kernel. The output of general model [46] can be given as in Eq. (6): . . .
where, b is the bias of each layer. More literature details are included in [46]. The mentioned CNN model has been used in various combinations to solve engineering problems. For deeper networks the computation time increases, which creates over-fitting problems causing a decrease in network accuracy. These problems were solved by Kaiming He [26] using residual blocks in deep networks as shown in Figure 4. This paper proposes a novel Residual Twin CNN (ResT-CNN) architecture inspired from residual learning scheme. The proposed network combines the benefits of two training networks 1-D with raw signal data and 2-D with transformed images in parallel learning. Both networks include residual connections that improve system performance and extract fault related strong features. Additionally, these connections construct input-output identity mapping, which improves network speed by enabling information flow across stacked layers. Remaining structure comprises 1 * 12 size first convolutional kernel along with 1 * 3 size four successive kernels. The input signal mapping is required in the interval [0, 1], which was performed using max-min normalization as per following relation.
where, x l gives l th sample in dataset with N-1 samples and max and min of x l gives maximum and minimum sample values.
The forward propagation in 1-D network from ay convolutional layer k-1 to the input in layer k is given by where, b k l shows bias of l th layer, m k−1 j gives the output of j th neuron at layer k − 1, n k−1 jk is the kernel from j th neuron at layer k − 1 to the i th neuron at layer k that computes k th layer input x k l . The output feature for weight layer K j is obtained by sliding convolutional kernel with size 1xl on input feature signal K i . The output feature signal y at j th node is given by The output of residual learning block will be given by An improved 2-D CNN structure is proposed that includes two residual connections; each comprises a single 3 * 3 convolutional layer in between coupled 1 * 1 convolutional layers. To minimize high frequency noise effects, a wide size 9 * 9 first convolution layer is applied. To make deeper network and improve input data representation, 3 * 3 sized remaining convolution kernels are selected. At the end of residual connections, max pooling layer is used to avoid image data over-smoothing, therefore minimizing the errors in desired features extraction. Moreover, two Dropout and FC layers are utilized for better structure adaptability. This composition makes CNN structure deeper than previously implemented CNN models. The output O (res) of residual blocks can be expressed mathematically as; where, f r being ReLU non-linear activation function. O res1 and O res2 shows feature output at first & second residual structure, obtained by sliding kernel with signal stride. Identity mapping m(x) was added to the first residual block and the output was passed by ReLU activation function. The second filter weight W 2 was found by the output of first weight layer O res1 and gives final output of residual block structure Figure 6 shows the basic architecture of proposed residual structure. For x data series where x ∈ {x1, x2. . .x m };m being the signal length, x is the output of signal feature data that learns the residual function Res(x) in the neural network more easily without any deterioration thus providing better CNN model learning and improves the performance of system. The architecture of developed 2D CNN is shown in Figure 7.
1-D and 2-D CNNs work independently for features extraction. These features are concatenated before classification processing for faults identification. For 1-D CNN structure, the raw signal data was used to train the model. For 2-D CNN, signal data was transformed to gray image using CWT conversion. If we assume 1D-CNN features as C 1f and 2D-CNN features as C 2f , then the features of concatenation layer (C f ) can be expressed as; These final features were given to the proposed deep PCA for dimensionality reduction. The fused residual twin CNN (ResT-CNN) architecture is shown in Figure 8. To reduce high dimensionality of features extracted by ResT-CNN, a deep knowledge-based Remnant PCA (Kb-Rem-PCA) is proposed that takes the advantage of deep residual connections in PCA followed by knowledge-based feature selection technique based on improvements in existing Rough set theory [36]. This scheme improves classification accuracy along with reduction in computation cost.
The existing state of the arts PCA techniques are characterized by unsupervised dimensional reduction from extracted features and are employed in different applications with different combinations [37], [38]. However, these techniques  undergo some limitations that may lead to incompetency specially for extracting rich features.
The proposed PCA architecture overcomes these limitations easily by utilizing deep multiple feature extraction layers that transform high level features to low level effectively with improved algorithm speed as shown in Figure 9. Deep PCA, however, experiences information loss in each process layer which gives undesirable results. To cater these effects, residual connections are added between each successive layer of deep PCA structure.
Mathematically, for N training set features, I ∈ {i 1 , i 2 , . . . i x } in R N , the dimensions reduction for extracted features occur in successive learning layers of PCA, J ∈ {j 1 , j 2 , . . . j y } with projection matrix P and non-linear feature matrix N t . The layer output can be given by F ∈ [0, L-1] as: The non-linear feature matrix is introduced to improve PCA performance by providing additional non-linear function with the traditional architecture. For this purpose, the nonlinear feature matrix N t as mentioned is given as where ϕ (·) is the non-linear mapping function. The output F 0 for the first layer will be given by Eq. 16. For layers [1, L-1], the output will be given as: where, C rem gives the output of remnant connections given as The function G (·) gives the activation function while stacking successive layers of PCA in rem connections.
In order to collect more compact set of features and to improve PCA performance, we have integrated a knowledge-based feature selection technique with the output of PCA layers. This technique is based on some improvements in existing Rough set theory which was originally proposed by Z. Pawlak and has successfully been utilized to eliminate imprecise, redundant and uncertain features [36], [39], [40].
Rough set can be defined as a knowledge base K b [O, S] where O and S represent overall feature set and selected feature set respectively. The probabilistic based rough set model considers variable precision probability which is more robust against noisy data. A probabilistic precision threshold δ is added which depends on noise magnitude. Based on this threshold, three distinct feature regions (FR) are defined which are as follows.
FR1: δ -True Region: Using feature regions, the correlation degree between target and equivalent class can be found by,

D. MULTI-CLASS SVM (MC-SVM) FAULT CLASSIFIER
For classification of faults, a strong multi-class Support vector machine (Mc-SVM) classifier was applied based on Gaussian radial kernel function that gives efficient fault classification [41]. The Gaussian radial kernel function has already proved its outstanding ability and performance in handling nonlinear faults classification. Mathematically, this function can be given by Eq. 19 [41].
where, R G (u m , u n ) gives radial kernel function, u m and u n shows input features and σ represents hyper parameter selected on the basis of effective kernel width.

E. PROPOSED OVERALL HYBRID STRUCTURE
A basic outline of our proposed hybrid framework is shown in Figure 10. The flowchart shows the sequential working of model for faults classification in BS-LMS as summarized below.

IV. EXPERIMENTATION DETAILS
To evaluate the performance of proposed algorithm, an instrumented setup is developed. The testing setup is based on BS linear motion mechanism where BS shaft provides necessary ball bearings channel retained inside suiting ball nut. A number of distinctive faults were induced in BS drive and load bearing under varying conditions of external load to consider the effect of faults on measured system parameter as well as to acquire the signal data for detection of these faults.

A. TESTING SETUP DETAILS
The testing setup was developed using available mechanical components including precision rolled miniature BS drive (that consists of BS threaded shaft fitted with ball nut). Ball nut is provided with threaded attachment collar for assembly integration. Other components required for motion system include motor in combination with necessary gear train along bearings, linear miniature guides and position feedback device etc. The experimental setup was also equipped with different sensors for data acquisition purpose. The resolution, bandwidth and repeatability of sensors should be well analyzed for reliable data acquisition. Linear transducer having linearity error 0.05 %, Hall-effect current sensor with 32.7 mV rated sensitivity @ 12V DC and a tensioncompression load cell having 0.02 % rated output error were also integrated in the setup for data monitoring. The developed experimental setup provides 100mm linear movement with a load capacity of 1000N (in both directions). Figure 11 illustrates some details of experimental setup.

B. BS LINEAR DRIVE FAULT CASES
The testing setup was initially operated with no induced fault to get significant signal information and observe LMS behavior under different load and motion profiles. This data was labeled as reference data with acceptable behavior of mechanical system. In the next phase, different faults were induced in LMS and the system was run to collect position measurement data. This data was used to evaluate the performance of developed monitoring scheme. The faults were selected based on repetitive nature and high criticality obtained from previous history. The faults were induced in ball screw (BS) and load bearing (LB) for testing and include the following. → BS Friction (Insufficient lubrication) fault, developed by ceasing the lubricant and pressing the ball nut seals firmly (as shown in Figure 12).  → BS Backlash fault, induced by substituting original balls with undersize recirculating balls (as shown in Figure 13).
→ LB Roller Damage, developed by creating surface flaw at load bearing roller (as shown in Figure 14). Since it takes sufficient time for natural failure, therefore, the above mentioned faults were added to simulate system behavior and competency for signal features and faults classification. VOLUME 9, 2021 In case of BS in-adequate lubrication fault, the lubricant carried by ball nut was initially removed. Signal data was observed and recorded as partial friction fault. Since BS drive offers low characteristic friction coefficient, major signal changes were not observed. Friction severity was increased in the next phase by tightening sealing locks provided at the end of ball drive nut as depicted in Figure 12. BS backlash was studied by substituting original dimensioned balls with undersize balls in ball nut channel. The original size of recirculating balls was 3mm in diameter which was replaced with 2.8mm and then 2.4mm successively. LB friction was analyzed by removing synthetic-based protective grease from bearing. LB roller damage fault was simulated in multiple phases, initiating with small sized flaw at contact surface of roller nearly 1mm wide (low level) as shown in Figure 14.
In the next stage, the width of surface defect was increased to 2mm (medium level), simulating fault propagation. In the last stage (high level), the flaw size was increased to 3mm wide with additional surface marks on multiple rollers as shown in Figure 14.
The above faults were tested for BS linear drive against different external loadings to acquire signal measurements and observe performance of developed scheme. One motion cycle of BS linear drive under consideration is completed in 6sec including 1sec dwell time at direction reversals and gives 80mm linear movement. The loads applied include rated load (600 N), positive peak load (750 N) and negative peak load (-750 N). Initially, a number of motion cycles were given to attain system steady state (normalize thermal and mechanical effects). Later on, position measurements were observed for both normal and faulty conditions.

C. DATASET DESCRIPTION 1) BS LINEAR DRIVE DATA SET
The position measurements for mentioned fault cases were obtained using aforementioned experimentation. Signal data augmentation was performed, required for improved feature extraction accuracy and higher performance [35]. This gives increased number of signal samples, therefore improving CNN model generalization [35]. The observed signal data can generate large amount of training data by samples segmentation with signal data overlap (Figure 3). The original signal data with 40000 points can produce 100 samples, which can be extended to 800 training samples with a stride length of 360.
Dataset for four fault categories i.e., BS friction, BS backlash, LB friction and LB roller damage was generated with different severity levels under rated load. The experimental dataset details are given in Table 1. The faults are designated by BF (ball screw friction), BB (ball screw backlash), LF (load bearing friction), LR (load bearing roller damage) and ND (no defect). The intensity levels of these faults are given as H (high), M (medium) and L (low). For 1-D signal to 2-D conversion, as performed in section III-A, 224 × 224 gray scale image was generated using 50176 signal sampling points. Gray images for different faults using CWT conversion are shown in Figure 15. It can be seen that each fault shows distinct gray image pattern which can be easily distinguished.

2) IMS-UC PUBLISHED BEARING DATASET
IMS-UC (Intelligent Maintenance Systems -University of Cincinnati), USA published bearing dataset was also considered for validation of proposed scheme [55] since it is publically available and used by many researchers. IMS-UC dataset provides RTF (run to failure) maintenance testing results that gives vibration signal data recorded at 20 kHz with 20,480 input data points. 4 accelerometers were attached to acquire vibration signals for bearings faults that include outer race (O R ), inner race (I R ) and rolling element (R E ). The data packet includes three sets. Random samples were selected for each set that includes all types of faults data.

D. IMPLEMENTATION DETAILS
The ResT-CNN model was implemented in TensorFlow Python using Intel core-i7 CPU desktop system with GTX1070 GPU, 12 GB memory. 1-D and 2-D CNN's were trained with 80 epochs and 150 epochs respectively. Adam optimization algorithm was adopted with a decaying learning rate of 0.001; reduce to one-half after 80 iterations. The dataset was split into training and testing data (along with validation) data. Augmented dataset was utilized for network training since it provides large training samples that avoid data over-fitting and improve model performance. Initial weights of network model were arbitrarily generated. Random samples were selected from defined testing dataset with different sample size to monitor its effect on performance of network.
The feature vectors obtained from 1-D and 2-D CNN's (C 1f and C 2f ) give the characteristics of different faults. The information of both feature vectors can be combined, as shown in section III-B, by concatenating the two types to form a new feature vector C f that carries strong and abundant information as compared with individual feature vectors. However, C f contains high dimension features which not only require extra processing power, but a strong SVM classifier as well, for accurate classification. For this purpose, Kb-Rem-PCA was proposed and utilized to reduce redundant data from extracted features. Experiments have shown that the proposed combination gives low dimension feature mapping which gives better classification along with reduced memory and computation cost.

E. PERFORMANCE PARAMETERS
Two critical performance parameters considered commonly to evaluate the system performance include classification accuracy and precision. Accuracy gives a comprehensive classification performance of overall system. Mathematically, it can be expressed as; where, F P and T P show false positive and true positive samples whereas F N and T N represents false negative and true negative samples. False positive comprises positive samples which were classified incorrect and false negative means negative samples which were classified incorrect. True positive and true negative also gives similar interpretation. Precision represents a measure of correct classification of samples outcome. Mathematically precision is given as;

V. RESULTS
The suggested hybrid scheme was successfully trained to learn features from BS linear drive dataset as well as IMS-UC dataset for each fault case. The model was then tested and validated. Figure 16 (a) shows the curves for training and validation results for BS dataset. It can be seen that the proposed hybrid combination gives close to 100% accuracy. Figure 16 (b) gives precision results for BS linear drive dataset. The average precision result for each fault is shown. The precision ratio for BS friction, LB friction and LB roller damage fault is higher than BS backlash fault. The overall results indicate that the proposed scheme successfully deals with combination of critical fault in BS linear drives.

A. PERFORMANCE COMPARISON WITH MAINSTREAM TECHNIQUES
The developed scheme was also compared with few selected sate of the arts to evaluate classification performance. These include LeNet [47], Adaptive deep CNN [56] by converting 1-D signal to 2-D images and implementing adaptive deep CNN structure, RBF-SVM [21] by training time domain signal data feature parameters and our proposed hybrid scheme replacing Mc-SVM with Softmax [48]. Performance results are mentioned in Table 2. It can be observed that RBF-SVM and LeNet-5 gives low accuracy as compared with adaptive CNN. The proposed scheme replacing Mc-SVM with softmax (Hybrid scheme with Softmax) further promotes better results since it improves computational accuracy. The developed hybrid architecture gives much improved performance achieving higher accuracy of 99.3%. This proves the superior performance of developed hybrid methodology for high-level faults classification.

B. CONFUSION MATRIX FOR BS LINEAR DRIVE FAULTS
Confusion matrix for comparison of BS linear drive faults was computed using the trained network as given in Figure 17. The matrix shows fault accuracies for different predicted classes. It can be observed that the network model VOLUME 9, 2021  is quite perfect for no defect (ND) class and gives 100% identification accuracy. BS friction (BF), LB friction (LF) and LB roller damage (LR) shows higher classification accuracy in comparison to BS backlash (BB). In general, BB faults have little bit higher confusion possibilities than with other fault modes. This implies BB faults are more complex to analyze and therefore need additional features information to avoid misclassification or false identification.

C. MODEL VALIDATION WITH IMS-UC BEARING DATASET
In addition to BS dataset, the developed network was also trained and validated by IMS-UC (Intelligent Maintenance Systems -University of Cincinnati) bearing dataset that include inner race (IR), outer race (OR) and rolling element (RE) faults data. The dataset were collected at 20 KHz sampling rate and 2000 RPM which gives 600 points per rotation. Continuous data samples were recorded for above 20K data points. 1360 data samples with 300 input points were used for model training and testing. The rolling element bearing considered for testing has 16 rollers with 8.4 mm roller diameter, 71.5 mm pitch diameter of bearing, 0.265 radians contact angle and 33.75 Hz rotational speed. Four accelerometers were used to acquire vibration data. The same dataset was applied by Jagath et al. [57] and Eren et al. [58] for validation of their fault detection techniques. Results are compared in Figure 18. The work of Jagath et al. [57] provides early and real time faults identification for bearings using SVM. Time and frequency domain features were considered for SVM training. Eren et al. [58] apply their proposed adaptive 1-D CNN technique with 1224 training samples and 5440 testing samples (from 4 fault classes including no-fault data class). The proposed hybrid scheme was also trained and tested with similar data samples from each class.
The results for each fault including inner race fault, outer race fault and rolling element fault are compared below. The classification accuracy results indicate more reliable and improved performance with developed model in comparison to other two techniques.

VI. DISCUSSIONS
The combined hybrid model has achieved higher classification accuracy with improved feature extraction for desired diagnostic requirement. Typical characteristic aspects of our developed framework are discussed below.

A. SUPERIORITY OF RESIDUAL CONNECTIONS IN ResT CNN (1-D & 2-D CNN's)
The utilization of residual connections significantly improved the model accuracy and network training speed through identity mapping of input data. The stacked residual connections effectively minimized deep network training difficulties with better system performance. For ResT CNN, the comparison of model performance with and without residual connections is shown in Figure 19. It was found that the training data accuracy increases by an average of 5% using residual connections in 1-D and 2-D CNN's which indicate better learning rate of proposed architecture. Another important aspect is the improvement in performance accuracy of validation data which was increased by 7%. This shows that the overall performance of developed network is greatly   The effectiveness of proposed knowledge-based PCA architecture was evaluated by computing average accuracies for proposed Kb-Rem-PCA with non-linear mapping, residual and deep PCA structures using typical parameters and G (·) activation function. BS linear drive dataset was considered for accuracy comparison. 250 random samples were selected (50 samples per class) from 5 classes that include high that Kb-Rem-PCA offers better features selection and quite stable behavior even with variation in input parameters settings. Although deep PCA shows good features discrimination, however there occurs loss of information in successive layers resulting in reduced accuracy especially when handling non-linear data. By adding residual connections in deep PCA, more effective features extraction is achieved than deep PCA. The addition of knowledge base rough set in combination with residual blocks significantly improves features classification accuracy by eliminating uncertain information from features. Kb-Rem-PCA, therefore, gives more promising classification results by removing data redundancy and uncertainty. Figure 20 gives outcome of this comparative analysis for 250 random sample points. The average accuracy for Kb-Rem-PCA has increased nearly 6% than with residual PCA structure. It can, therefore, be concluded that non-linear mapping with knowledge base structure minimizes redundancy and inaccuracies from PCA structure.

C. SYSTEM PERFORMANCE AGAINST LOAD PROFILE VARIATIONS
In addition to performance evaluation against rated load (600 N), the model was also studied for load variation in magnitude and direction. This include positive peak load (750 N) and negative peak load (-750 N) applied at attachment section of BS LMS under consideration. Multiple experiments were performed with above 3 load cycles in repetition. This evaluates system's performance for stability retention against different load domains. 12 experiments were performed for 3 load cases (4 tests per load cycle). Each load cycle was repeated 4 times in a sequence. These reversals of load cycles validate model's stability and adaptive performance to endure domain variations in profile. This scheme becomes very helpful for practical engineering problems and provides system response for correct classification at same computation time under variable load domains. From average accuracy results in Figure 21, it can be seen that in all experiments, the model retains high level of accuracy with average variance of ±0.0025 (approx). The average computation time for each sample classification was found as 0.96 millisecs. This ensures good network stability under varying load cycles with acceptable testing time which makes the proposed system suitable for real-time monitoring and fault detection of engineering systems.

VII. CONCLUSION
This paper presents a reliable and effective novel hybrid combination of collective techniques based on improvements in deep residual connections to identify and classify BS LMS faults. The major contributions include application of position measurement data for accurate detection of different BS linear drive faults including BS friction, BS backlash, LB friction and LB roller damage. ResT CNN structure was developed that comprises parallel learning of 1-D and 2-D improved deep residual CNN's followed by knowledge base Kb-Rem-PCA architecture for reduction of redundant data and multi-class SVM classifier. This composition greatly improves system accuracy for typical features extraction and non-linear faults identification. The proposed methodology was also validated by IMS-UC published dataset. The results prove that the developed scheme not only gives effective extraction of characteristic features but also yields more systematic classification of faults with different severities and load variations. The accuracy results also show the superiority of proposed algorithm comparable with state of the art techniques. This research can be extended to include some other failure modes of BS LMS elements to explore combined failure scenarios for more accurate system diagnosis. A comparison can also be done with fault signals from real failure case along with induced faults. This would be more helpful to predict BS drives performance with proposed scheme comparable with real complexities in actual framework.
NAVEED RIAZ was born in Pakistan. He received the bachelor's and master's degrees in engineering from the National University of Sciences And Technology (NUST), Islamabad. He has more than 11 years of vast experience in the research industry and academics. His research area includes robotics, deep learning, medical image analysis, computer vision, artificial intelligence, robot controls, and aerial robotics. He has a lot of conference and journal research publications in this research area.
SYED IRTIZA ALI SHAH was born in Pakistan. He graduated from the Georgia Institute of Technology, Atlanta, GA, USA, and received the B.E. degree from NED University and the M.S. degree in aerospace engineering from NUST in fluid dynamics, and the M.S. degree in flight mechanics and controls, the M.S. degree in machine vision and robot controls, and the Ph.D. degree in aerial robotics from GeorgiaTech, Atlanta, GA, USA.
FAISAL REHMAN was born in Islamabad, Pakistan. He received the B.S., M.S., and Ph.D. degrees in engineering from NUST Islamabad. He worked in both industry and academics. He also has research experience in various research organizations as a junior and senior researcher. His research area is deep learning, medical image analysis, and artificial intelligence. He has a lot of conference and journal research publications in his research field.
MUHAMMAD JAWAD KHAN received the B.E. and M.S. degrees in mechatronics engineering from Air University, Pakistan, and the Ph.D. degree in mechanical engineering from Pusan National University, South Korea, in 2018. He worked on brain-robot interfaces and control of haptic device during his Ph.D., where he published over 40 papers. His research interests include hybrid brain-computer interfaces, brain signal processing and control, AI, machine learning, computer vision, and rehabilitation.