Introduction
Rolling bearing is one of the key components of mechanical equipment, and fault diagnosis of rolling bearing is essential to ensure long-term efficient and stable operation of mechanical equipment [1]. The traditional rolling bearing fault diagnosis methods based on signal processing technology have been widely used, such as enhanced singular spectrum decomposition [2], frequency phase space empirical wavelet transform [3], adaptive generalized demodulation [4], high-order synchrosqueezing transform [5], recycling variational mode decomposition [6], and resonance-based sparse signal decomposition [7], etc. The above methods can effectively diagnose rolling bearing faults when the time-frequency domain features of vibration signals are obvious. However, the signal processing technologies have certain limitations to deal with the complex vibration signals with noise and unobvious features.
In recent years, with the rapid development of machine learning and deep learning, there are more and more data-driven rolling bearing fault diagnosis methods based on machine learning and deep learning, such as naive bayes algorithm [8], least square support vector machine [9], iterative random forest [10], BP neural network [11], one-dimensional convolutional neural network [12], two-dimensional convolutional neural network [13], LSTM recurrent neural network [14], deep belief network [15], generative adversarial network [16], deep residual network [17], and transfer learning [18], etc. The above researches mainly focus on improving the accuracy, generalization, anti-noise ability, and adaptability of rolling bearing fault diagnosis model. They provide effective ways to mine the underlying fault features from the complex rolling bearing vibration signals, which can establish an effective mapping between the complex vibration signals and the output results of rolling bearing fault diagnosis model. However, the fault diagnosis models of rolling bearing based on machine learning and deep learning are generally complex and require a long time to be trained. Especially in the big data environment, the use of massive training samples requires a huge computational cost. The above studies seldom consider how to improve the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model in the big data environment.
With the rapid development of big data technology, many scholars have carried out extensive research on fault diagnosis in industrial big data scenarios [19]–[25]. For example, some studies combine big data technology and data-driven fault diagnosis methods to diagnose faults of mobile robot [21], sulfur hexafluoride electrical equipment [22], power grid equipment [23], wind turbine gearbox [24], and reciprocating air compressor [25]. Most of the above-mentioned studies use MapReduce [26] or Spark [27] to parallelize the fault diagnosis models to improve the performance of industrial equipment fault diagnosis in the big data environment. Compared with MapReduce, Spark introduces resilient distributed data set (RDD) and implements an efficient directed acyclic graph execution engine, it has a faster processing speed, and thus it is more suitable for efficient fault diagnosis in the big data environment.
Due to the many-core GPU has the advantages of high-performance, low-power, and low-cost, recently some work has been done to explore how to combine Spark and GPU to accelerate solving domain-specific applications, such as urban traffic vehicle recognition [28], magnetic resonance imaging [29], and remote sensing image processing [30], etc. The experimental results from [28]–[30] show that combining Spark and GPU can significantly improve the performance of these applications, but the implementations of them are complicated and it is difficult to port their implementation methods to other fields. The newly released Spark 3.0 already supports the accelerator-aware scheduling, allowing users to discover and request GPU computing resources at Executor, Driver, and Task levels, which simplifies the development of applications based on Spark and GPU.
The authors’ previous work [31] proposed a rolling bearing fault diagnosis method based on QPSO-BPNN and Dempster-Shafer evidence theory, which can effectively and accurately diagnose different types of rolling bearing faults under different working conditions. With the expansion of industrial production scale and the increase in the complexity of mechanical equipment, the vibration data of rolling bearing collected by multiple sensors in real time are growing rapidly in the actual production environment. It is difficult to efficiently perform model training and fault diagnosis for large-scale rolling bearing vibration data using the serial QPSO-BPNN proposed in the previous work. For the massive rolling bearing vibration data, how to improve the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model is an urgent problem to be solved.
The authors’ another previous work [32] proposed a rolling bearing fault diagnosis method based on Spark and ACO-K-Means clustering algorithm, the ACO-K-Means clustering algorithm is successfully parallelized on Spark platforms, which can efficiently carry out clustering analysis on the massive rolling bearing vibration data in parallel. The proposed method focuses on improving the model training efficiency and fault diagnosis efficiency by fully utilizing all available CPU and memory resources on a Spark cluster. Compared with Spark platform, Spark-GPU platform has stronger distributed parallel computing ability, thus it is more helpful for improving the model training efficiency and fault diagnosis efficiency. Compared with ACO-K-Means clustering algorithm, QPSO-BPNN with strong non-linear mapping ability, high self-learning and adaptive abilities can obtain a higher and more stable fault diagnosis accuracy.
Therefore, a rolling bearing fault diagnosis method based on parallel QPSO-BPNN under Spark-GPU platform is proposed, which aims to fully exploit the powerful distributed parallel computing capabilities provided by the Spark-GPU platform and take advantage of QPSO-BPNN with low computational complexity and high diagnosis accuracy to achieve more efficient and accurate fault diagnosis of rolling bearing in the big data environment. However, it is still a challenge to efficiently implement QPSO-BPNN on a Spark-GPU platform. The current work focuses on how to efficiently perform model training and fault diagnosis for large-scale rolling bearing vibration data using the parallel QPSO-BPNN on Spark-GPU platforms.
The main contributions of this paper are summarized as follows.
The distributed parallelization of QPSO-BPNN model based on Spark-GPU platform is realized, which significantly improves the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model based on QPSO-BPNN in the big data environment.
A parameter update strategy suitable for the distributed parallel training of QPSO-BPNN model is proposed. At each iteration during training, the local parameters of each worker node are collected to the master node, and the global parameters are updated according to the weights and synchronized to each worker node, which improves the convergence speed of rolling bearing fault diagnosis model in the distributed parallel environment.
A combination strategy of multiple QPSO-BPNN models based on ensemble learning is proposed. The output results of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end of rolling bearing are combined by weighted voting to obtain the best fault diagnosis result of a sample, which improves the fault diagnosis accuracy of rolling bearing to a certain extent.
The effectiveness of the proposed rolling bearing fault diagnosis method is verified by a large number of experiments. Experimental results show that this method can not only make full use of the computing resources of the Spark-GPU platform to quickly perform model training and fault diagnosis on the massive rolling bearing vibration data, but also obtain a higher fault diagnosis accuracy.
The rest of this paper is organized as follows. The QPSO-BPNN model is introduced in Section II. The rolling bearing fault diagnosis method based on parallel QPSO-BPNN under Spark-GPU platform is described in Section III. The experimental results and analysis are presented in Section IV. The conclusions and future work are given in Section V.
The Previous QPSO-BPNN Model
BPNN [33] is a classic multi-layer feedback forward neural network, which is characterized by forward propagation of signals and backward propagation of errors. Because it has a simple network structure and a strong nonlinear mapping ability, it is widely used to handle classification problems. The training of BPNN is divided into two stages: signal propagation, and weights and thresholds update. In the first stage, at first the signals are propagated from the input layer to the hidden layer, then the signals are propagated to the output layer in turn according to the weights and activation function of each neuron in the hidden layer, and finally the output results are obtained. In the second stage, at first the errors between the output results and targets are calculated and backward propagated, and then the weights and threshold of each neuron in each layer are corrected according to the errors. The above two stages are executed iteratively to complete the training of BPNN. However, BPNN has the disadvantages of slow convergence speed and easy to trap in local minimums, this is mainly because BPNN randomly initializes the weights and threshold of each neuron. As a result, recently some researchers began to adopt intelligent optimization algorithms to improve the initialization of the weights and thresholds of BPNN, such as genetic algorithm (GA) [34], differential evolution (DE) algorithm [35], and particle swarm optimization (PSO) algorithm [36], etc.
The classic PSO algorithm [37] is a swarm intelligence random search algorithm, which searches for the optimal solution according to the optimal particle in the solution space through iteration. However, since PSO algorithm has the problem of easily falling into a local optimal solution, recently some improved PSO algorithms have been developed, such as adaptive particle swarm optimization (APSO) algorithm [38], selective particle swarm optimization (SPSO) algorithm [39], and QPSO algorithm [40], etc. APSO algorithm adopts the nonlinear function to dynamically adjust the inertia weights and the contribution of each particle to avoid falling into the local optimal solution. SPSO algorithm changes the search space from a real-valued space into a set of selected values, which can reduce the computational cost of fitness values. QPSO algorithm mainly improves the PSO algorithm as follows. For one thing, the position of each particle is updated according to the average best position of quantum particle swarm, and the moving speed of the particle is no longer considered, which increases the randomness of the particle movement, so it can avoid falling into the local optimal solution. For another thing, only the shrinkage factor that controls the update of the particle position needs to be tuned, which is easier for performance tuning and enhances the global convergence ability. Compared with APSO algorithm and SPSO algorithm, QPSO algorithm can obtain a faster convergence speed with less computational cost, and it is more likely to obtain the optimal initial weights and thresholds of BPNN due to it has a stronger global convergence ability. Therefore, QPSO algorithm is more suitable for optimizing the initial weights and thresholds of BPNN.
The authors’ previous work [31] adopted QPSO algorithm to optimize the initial weights and thresholds of BPNN, as shown in Fig. 1. First, the random initial weights and thresholds of BPNN are obtained and the quantum particle swarm is initialized, including the number of particles, the dimension of particles, and the initial position of each particle (i.e., the initial weights and thresholds of BPNN). Second, the position of each particle is used as the weights and thresholds of BPNN to train the BPNN model once. Third, the fitness value and best position of each particle are calculated, where the fitness value of each particle is the error of the BPNN model training. Fourth, the global best fitness value and the global best position of quantum particle swarm are calculated. Fifth, the average best position of quantum particle swarm is calculated. Sixth, the position of each particle is updated according to the best position of each particle, the global best position and average best position of quantum particle swarm, and the shrinkage factor. Finally, determining whether the max iterations is reached or a satisfactory solution is obtained, if so, the optimal initial weights and thresholds of BPNN are obtained; otherwise, the next iteration is continue to be executed.
The Proposed Fault Diagnosis Method of Rolling Bearing
A. Process of Rolling Bearing Fault Diagnosis Based on Parallel QPSO-BPNN
The overall process of rolling bearing fault diagnosis based on parallel QPSO-BPNN under Spark-GPU platform is shown in Fig. 2, which includes the following four stages: data preprocessing, data storage, model training and testing, and fault diagnosis.
Process of rolling bearing fault diagnosis based on parallel QPSO-BPNN under Spark-GPU platform.
In the data preprocessing stage, first, the abnormal data contained in the original vibration signals collected by the sensors deployed on the base end, drive end, and fan end of rolling bearing are eliminated. Second, the cleaned data are divided into several samples. Third, each sample is standardized. Finally, the wavelet packet decomposition is performed on each sample to obtain the eigenvectors of different running states of rolling bearing.
In the data storage stage, all eigenvectors are stored in Hadoop Distributed File System (HDFS), and the eigenvectors used for model training and testing are divided into training set and test set.
In the model training and testing stage, first, the network structures and training parameters of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end are determined. Second, the training samples corresponding to the base end, drive end, and fan end from HDFS are used as the input of these three models respectively, and the distributed parallel trainings of these three models are performed on Spark-GPU platform. Finally, the test set from HDFS is used to test the three models.
In the fault diagnosis stage, first, the data to be diagnosed are read from HDFS. Second, the above three trained QPSO-BPNN models are performed to diagnose these data on Spark-GPU platform, respectively. Finally, the weighted voting method is adopted to combine the output results of these three models to obtain the final fault diagnosis results.
B. Parallel Design of QPSO-BPNN Model Based on Spark-GPU Platform
1) Overall Distributed Parallel Design Scheme
According to the idea of data parallelism, the overall distributed parallel design scheme of QPSO-BPNN model based on Spark-GPU platform is proposed, as shown in Fig. 3.
Overall distributed parallel design scheme of QPSO-BPNN model based on Spark-GPU platform.
On a Spark-GPU platform, the master node is responsible for the task scheduling and resource management of the entire cluster, and each worker node can use one or more Spark executors to train one or more QPSO-BPNN models. Each Spark executor can exploit the RAPIDS library [41] developed by means of CUDA to call GPU computing resources to accelerate the training of QPSO-BPNN model. When starting the QPSO-BPNN model training program on Spark-GPU platform, firstly, a SparkContext object is initialized on the master node; secondly, the rolling bearing training set is read from HDFS to create an RDD; thirdly, each Spark executor reads the data of an RDD partition to train a QPSO-BPNN model. The distributed parallel training of QPSO-BPNN model based on Spark-GPU platform mainly includes the following two stages.
In the first stage, the QPSO algorithm is executed in parallel to optimize the initial weights and thresholds of BPNN. In each iteration of QPSO algorithm, firstly, the fitness value and best position of each particle are calculated. Due to the computational tasks of different particles are independent of each other, the computational tasks of all particles can be reasonably allocated to each Spark executor, and multiple Spark executors can be used to perform computational tasks of different particles in parallel. Secondly, the fitness value and best position of each particle are collected to calculate the global best fitness value, global best position, and average best position of quantum particle swarm. Finally, the position of each particle is updated according to the best position of each particle, the average best position of quantum particle swarm, and the shrinkage factor. Due to the updating processes of different particles are independent of each other, the position of each particle can be updated in parallel.
In the second stage, the QPSO-BPNN model is trained in parallel. After obtaining the optimal initial weights and thresholds of BPNN, the data-parallel strategy is adopted to realize the parallel training of QPSO-BPNN model. A large-scale training set is divided into
Both the calculation of the fitness value of each particle in the first stage and the training of each QPSO-BPNN model in the second stage involve a large number of matrix operations, thus it is very suitable to use GPU to speed up the training of the entire model.
2) Design of Parameter Update Strategy
In the distributed parallel training of QPSO-BPNN model, using the idea of the parameter server architecture [42], the master node is used as the parameter server node to collect the weights and thresholds of QPSO-BPNN model of each worker node on Spark-GPU platform, the global weights and thresholds are updated according to the weight of each model, and the updated global weights and thresholds are synchronized to each worker node.
In the
Step 1.
On the
worker nodes, thek QPSO-BPNN models corresponding ton training subsets are trained in parallel according to the current weights and thresholdsn , and the new weights and thresholds(g_{1}^{t},g_{2}^{t},\ldots,g_{n}^{t}) and the losses(g_{1.temp}^{t},g_{2.temp}^{t},\ldots,g_{n.temp}^{t}) of(loss_{1}^{t},loss_{2}^{t},\ldots, loss_{n}^{t}) QPSO-BPNN models are obtained and collected to the parameter server node, wheren represents the weights and thresholds used for training theg_{i}^{t} -th QPSO-BPNN model andi .1\leq i\leq n Step 2.
On the parameter server node, firstly, the losses of all QPSO-BPNN models are normalized by Min-Max normalization method. Secondly, according to the normalized losses
, the weight of each model in the global parameter update is calculated by(l_{1}^{t},l_{2}^{t},\ldots,l_{n}^{t}) where\begin{equation*} \eta _{i}=(1-l_{i}^{t})/\sum _{j=1}^{n}1-l_{j}^{t},\tag{1}\end{equation*} View Source\begin{equation*} \eta _{i}=(1-l_{i}^{t})/\sum _{j=1}^{n}1-l_{j}^{t},\tag{1}\end{equation*}
represents the weight of the\eta _{i} -th QPSO-BPNN model andi represents the normalized result of the loss obtained after thel_{i}^{t} -th iterative training of thet -th QPSO-BPNN model. Thirdly, according to the weighti , the global weights and thresholds are updated by\eta _{i} where\begin{equation*} G^{t+1}=G^{t}+\sum _{i=1}^{n}\eta _{i}\left ({g_{i.temp}^{t}-G^{t}}\right),\tag{2}\end{equation*} View Source\begin{equation*} G^{t+1}=G^{t}+\sum _{i=1}^{n}\eta _{i}\left ({g_{i.temp}^{t}-G^{t}}\right),\tag{2}\end{equation*}
denotes the current global weights and thresholds.G^{t} Step 3.
The parameter server node broadcasts the new global weights and thresholds
to all worker nodes, and the weights and thresholds of all QPSO-BPNN models on each worker node are updated synchronously, i.e.,G^{t+1} .g_{1}^{t+1}=g_{2}^{t+1}=\cdots =g_{n}^{t+1}=G^{t+1}
C. Parallel Implementation of QPSO-BPNN Model Based on Spark-GPU Platform
According to the above-mentioned distributed parallel design scheme of QPSO-BPNN model based on Spark-GPU platform, this subsection describes the parallel implementation of QPSO-BPNN model based on Spark-GPU platform. The flowchart of parallel implementation of QPSO-BPNN model based on Spark-GPU platform is shown in Fig. 4 and the pseudo-code of that is described in Algorithm 1, which mainly includes the following two stages.
Algorithm 1 Parallel Implementation of QPSO-BPNN Model Based on Spark-GPU Platform
The trained QPSO-BPNN model
Initialize the weights and thresholds
Broadcast
Read the training set with
for
for all Spark executors on GPUs in parallel do
if
Initialize the positions: (
else
Update the positions (
end if
Create or update the key-value pair RDD:
pRDD = (
Calculate the fitness values of all particles: (
Calculate the best positions of all particles: (
end for
Calculate the global best fitness value on the master node:
Calculate the global best position on the master node:
if
end if
Calculate the average best position on the master node:
Broadcast
if
end for
Get the best initial weights and thresholds:
for
for all Spark executors on GPUs in parallel do
Initialize or update the weights and thresholds of
Create or update the key-value pair RDD: gRDD = (
Train
end for
Broadcast
end for
The first stage is the parallel implementation of QPSO algorithm for optimizing the initial weights and thresholds of BPNN based on Spark-GPU platform, including the following steps.
Step 1.
Initialize BPNN and quantum particle swarm on the master node. Firstly, the network structure of BPNN needs to be determined. The number of the input layer nodes is set to 8 which is the dimension of an eigenvector. The number of the hidden layers is set to 2, and the number of the first and second hidden layer nodes are set to 20 and 12, respectively, which are determined by Hofferding inequality and [43]. The number of the output layer nodes is set to 4 which is the number of classes of rolling bearing running states. Secondly, the weights and thresholds
of BPNN are randomly initialized, and the particle numberG^{1} of quantum particle swarm and the dimension of particle are initialized. The dimension of particle is determined by the number of weights and thresholds of the input layer, hidden layer, and output layer of BPNN, i.e.,n . Finally, the random initial weights and thresholds of BPNN and initial parameters of quantum particle swarm are broadcasted to all worker nodes.(8\times 20+20)+(20\times 12+12)+(12\times 4+4)=484 Step 2.
Read the training set from HDFS and use multiple Spark executors to call GPU computing resources to create an RDD tRDD in parallel, where each element of tRDD is an eigenvector. tRDD can be equally divided into
RDD partitions according to the particle numbern , if tRDD containsn eigenvectors, then them -th partition of tRDD can be denoted byx , where(E_{(x-1)m/n+1},E_{(x-1)m/n+2},\ldots,E_{xm/n}) .1\leq x\leq n Step 3.
Use multiple Spark executors to call GPU computing resources to initialize the positions of all particles and create a new RDD in parallel. Firstly,
is adopted to initialize the positionsG^{1} of all particles, i.e.,(P_{1}^{1},P_{2}^{1},\ldots,P_{n}^{1}) . Secondly, a key-value pair RDD pRDD is constructed by taking the position of each particle as a key and each eigenvector of tRDD as a value, and theP_{1}^{1}=P_{2}^{1}=\cdots =P_{n}^{1}=G^{1} -th partition of pRDD can be represented byx , where(\langle P_{x}^{1},E_{(x-1)m/n+1}\rangle,\langle P_{x}^{1},E_{(x-1)m/n+2}\rangle,\ldots,\langle P_{x}^{1},E_{xm/n}\rangle) .1\leq x\leq n Step 4.
Use multiple Spark executors to call GPU computing resources to calculate the fitness values and best positions of all particles in parallel. Firstly, the data of each partition in pRDD is used to train a BPNN model respectively, and each model is trained once to obtain the fitness values
, where(f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i}) denotes the fitness value obtained by thef_{x}^{i} -th particle in thex -th iteration. Secondly, the best positionsi of all particles are determined by comparing the fitness values(P_{1.best}^{i},P_{2.best}^{i},\ldots,P_{n.best}^{i}) obtained by all particles in the current iteration with the fitness values(f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i}) obtained by all particles in the previous iteration. If(f_{1}^{i-1},f_{2}^{i-1},\ldots,f_{n}^{i-1}) andi\geq 2 , thenf_{x}^{i-1}\leq f_{x}^{i} ; otherwise,P_{x.best}^{i}=P_{x}^{i-1} , whereP_{x.best}^{i}=P_{x}^{i} .1\leq x\leq n Step 5.
Calculate and broadcast the global best position and average best position of quantum particle swarm on the master node. Firstly, the master node collects the fitness values and best positions of all particles. Secondly, the global best fitness value
and the global best positionf_{best}^{i}=\min (f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i}) of quantum particle swarm are obtained by comparing the fitness values of all particles in theP_{best}^{i}=P_{(\mathop {\arg \min }\limits _{x\in \{1,2,\ldots,n\}}f_{x}^{i}).best}^{i} -th iteration. Thirdly, the global best position of quantum particle swarm is updated by comparing the global best fitness valuei obtained in the current iteration with the global best fitness valuef_{best}^{i} obtained in the previous iteration. Iff_{best}^{i-1} andi\geq 2 , thenf_{best}^{i-1}\leq f_{best}^{i} andP_{best}^{i}=P_{best}^{i-1} ; otherwise, there is no need to update the global best position. Fourthly, the average best position of quantum particle swarm is calculated asf_{best}^{i}=f_{best}^{i-1} . Finally,M_{best}^{i}=\sum \limits _{x=1}^{n}P_{x.best}^{i}/n andP_{best}^{i} are broadcasted to all worker nodes.M_{best}^{i} Step 6.
Determine whether the current iteration number reaches the max iterations or whether the global best fitness value is lower than the error goal. If so, the iteration is terminated and the optimal initial weights and thresholds of BPNN are returned, i.e.,
; otherwise, the positions of all particles are updated and the next iteration will be continued by going back to Step 4. The process of using multiple Spark executors to call GPU computing resources to update the positions of all particles and pRDD in parallel is as follows. Firstly, according to the best position of all particles and the global best position and average best position of quantum particle swarm, the latest positionsG^{1}=P_{best}^{i} of all particles are calculated by(P_{1}^{i+1},P_{2}^{i+1},\ldots,P_{n}^{i+1}) where\begin{equation*} P_{x}^{i+1}=\alpha P_{x.best}^{i}+(1-\alpha)P_{best}^{i}\pm \varphi \left |{M_{best}^{i}-P_{x}^{i}}\right |\ln \frac {1}{\beta },\tag{3}\end{equation*} View Source\begin{equation*} P_{x}^{i+1}=\alpha P_{x.best}^{i}+(1-\alpha)P_{best}^{i}\pm \varphi \left |{M_{best}^{i}-P_{x}^{i}}\right |\ln \frac {1}{\beta },\tag{3}\end{equation*}
represents the latest position of theP_{x}^{i+1} -th particle in thex -th iteration,(i+1) and\alpha are uniform distributions on (0, 1), and\beta represents the shrinkage factor [44]. To increase the randomness of the particle movement, ± is used before the absolute term that is the distance between the average best position of quantum particle swarm and the latest position of the particle. Secondly, Min-Max normalization is performed on the latest positions of all particles, and the key of each key-value pair in pRDD is updated accordingly.\varphi
The second stage is the parallel implementation of QPSO-BPNN model training based on Spark-GPU platform, including the following steps.
Step 1.
Use multiple Spark executors to call GPU computing resources to initialize the weights and thresholds of all QPSO-BPNN models and create a new RDD in parallel. Firstly, the optimal initial weights and thresholds
are used as the initial weights and thresholdsG^{1} of all QPSO-BPNN models, i.e.,(g_{1}^{1},g_{2}^{1},\ldots,g_{n}^{1}) . Secondly, a key-value pair RDD gRDD is constructed by taking the weights and thresholds of each QPSO-BPNN model as a key and each eigenvector of tRDD as a value, and theg_{1}^{1}=g_{2}^{1}=\cdots =g_{n}^{1}=G^{1} -th partition of gRDD can be denoted byx , where(\langle g_{x}^{1},E_{(x-1)m/n+1}\rangle,\langle g_{x}^{1},E_{(x-1)m/n+2}\rangle,\ldots,\langle g_{x}^{1},E_{xm/n}\rangle) .1\leq x\leq n Step 2.
Use multiple Spark executors to call GPU computing resources to train all QPSO-BPNN models in parallel. The data of each partition in gRDD is used to train a QPSO-BPNN model respectively, and each model is trained once to obtain the latest weights and thresholds
, where(g_{1.temp}^{i},g_{2.temp}^{i},\ldots,g_{n.temp}^{i}) denotes the latest weights and thresholds of theg_{x.temp}^{i} -th QPSO-BPNN model obtained in thex -th iteration.i Step 3.
Calculate and broadcast the global weights and thresholds according to the weight of each QPSO-BPNN model on the master node. Firstly, the master node collects the latest weights and thresholds of all QPSO-BPNN models. Secondly, the global weights and thresholds
are updated by (1) and (2) and broadcasted to all worker nodes.G^{i+1} Step 4.
Determine whether the current iteration number reaches the max iterations. If so, the iteration is terminated and the final QPSO-BPNN model is obtained; otherwise, the weights and thresholds of all QPSO-BPNN models are updated and the next iteration will be continued by going back to Step 2. The process of using multiple Spark executors to call GPU computing resources to update the weights and thresholds and gRDD in parallel is as follows. Firstly, the weights and thresholds
of all QPSO-BPNN models are updated according to(g_{1}^{i+1},g_{2}^{i+1},\ldots,g_{n}^{i+1}) , i.e.,G^{i+1} . Secondly, the key of each key-value pair in gRDD is updated as the new weights and thresholds.g_{1}^{i+1}=g_{2}^{i+1}=\cdots =g_{n}^{i+1}=G^{i+1}
D. Combination Strategy of Multiple QPSO-BPNN Models Based on Ensemble Learning
The proposed rolling bearing fault diagnosis model is composed of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end, which can reduce the risk of misdiagnosis caused by the wrong classification of a single QPSO-BPNN model, and can improve the fault diagnosis accuracy to a certain extent. The common combination strategies include Dempster-Shafer (DS) evidence theory [45] and ensemble learning [46]. Considering that ensemble learning can avoid the explosive growth of the exponential function and the problem of more parameters required for calculating the basic probability distribution function in DS evidence theory, a combination strategy of multiple QPSO-BPNN models based on ensemble learning is proposed, as shown in Fig. 5.
In the combination of multiple QPSO-BPNN models based on ensemble learning, the weighted voting method is used to combine the classification results of multiple basic classifiers (i.e., QPSO-BPNN models) to obtain the best fault diagnosis result of a sample. The classification results of multiple basic classifiers for sample \begin{equation*} H(x)=\mathop {\arg \max }\limits _{y\in \{1,2,\ldots,j\}}\sum _{i=1}^{s}\omega _{i}^{y}h_{i}^{y}(x),\tag{4}\end{equation*}
\begin{equation*} \omega _{i}^{y}=Acc_{i}^{y}/\sum _{k=1}^{s}Acc_{k}^{y},\tag{5}\end{equation*}
Experimental Results and Analysis
A. Experimental Setup
The experimental platform used in this paper is a distributed cluster. The hardware environment of the cluster is shown in Table. 1, and the software environment of the cluster is shown in Table. 2. In order to compare and analyze the impact of using GPU and not using GPU on the fault diagnosis accuracy, model training efficiency, and fault diagnosis efficiency, a series of experiments are carried out on the experimental platform with GPU (called Spark-GPU platform) and the experimental platform without GPU (called Spark platform).
The experimental data used in this paper are the vibration data of rolling bearing in different running states provided by the Case Western Reserve University Bearing Data Center [47]. They are collected by sensors deployed on the base end, drive end, and fan end of rolling bearing under different working conditions. Due to a large-scale data set is more helpful to verify the effectiveness of the proposed fault diagnosis method, at first the sliding window method [48] is adopted to enhance the original vibration data, then the enhanced data are preprocessed (see Section III-A), and finally the three different size data sets composed of eigenvectors are obtained. Table. 3 presents the description of the rolling bearing data set. Each data set includes the following different running-state monitoring data of rolling bearing: normal state data, inner race fault data, ball fault data, and outer race fault data. Each data set is randomly divided into the training set and test set at the ratio of 8:2. In the training of rolling bearing fault diagnosis model based on Spark platform or Spark-GPU platform, the size of the training set that can be used should consider not only the hardware resource limitations of the cluster but also the model training efficiency. If a larger-scale rolling bearing data set is used, more worker nodes are required or the hardware configuration of each worker node is needed to be enhanced.
B. Analysis of Fault Diagnosis Accuracy
In this experiment, for DataSet 1, DataSet 2, and DataSet 3, BPNN implemented with Spark (Spark-BPNN), QPSO-BPNN implemented with Spark (Spark-QPSO-BPNN), BPNN implemented with Spark-GPU (Spark-GPU-BPNN), and QPSO-BPNN implemented with Spark-GPU (Spark-GPU-QPSO-BPNN) are used for training and testing the rolling bearing fault diagnosis models, respectively. In the training of these fault diagnosis models, the key parameter settings of QPSO and BPNN are as follows.
QPSO: The number of particles is set to 100, the shrinkage factor is set to 0.8, the max iterations is set to 50, and the error goal is set to 0.001.
BPNN: The learning rate is set to 0.003, the momentum is set to 0.9, and the max iterations is set to 50.
The number of particles is one of the most important parameters of QPSO algorithm, too many particles will increase the computational cost, but too few particles will decrease the optimization effect. The setting of the shrinkage factor will affect the convergence speed of QPSO algorithm, if it is set too small, the convergence speed will be very slow; if it is set too large, the algorithm may fail to converge to an optimal solution. The learning rate is one of the most important parameters of BPNN, the setting of the learning rate will directly affect the convergence performance of BPNN, and it is usually between 0.001 and 0.01. The setting of the momentum will also affect the convergence speed of BPNN, and generally a larger value of momentum will increase the convergence speed.
Fig. 6 shows the diagnosis accuracies achieved using four different fault diagnosis methods and three different size data sets on the cluster described in Table. 1. As depicted in Fig. 6, the fault diagnosis accuracy achieved with Spark-QPSO-BPNN is 2.40% higher than that achieved with Spark-BPNN on average, and the fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 2.41% higher than that achieved with Spark-GPU-BPNN on average. The results show that QPSO algorithm can effectively optimize the initial weights and thresholds of BPNN, thereby obtaining a higher fault diagnosis accuracy.
Diagnosis accuracies achieved using different fault diagnosis methods and different size data sets.
It can be seen from Fig. 6 that the diagnosis accuracies achieved with Spark-GPU-QPSO-BPNN reach 98.66%, 98.70%, and 98.73% for DataSet 1, DataSet 2, and DataSet 3, respectively, which shows that the fault diagnosis accuracy is improved with the increase of data set size. This is because the fault features contained in the training samples become more and more with the increase of rolling bearing data set size, which helps to improve the fault diagnosis accuracy.
It can also be seen from Fig. 6 that the fault diagnosis accuracy achieved with Spark-BPNN and that achieved with Spark-GPU-BPNN are almost the same, and the fault diagnosis accuracy achieved with Spark-QPSO-BPNN and that achieved with Spark-GPU-QPSO-BPNN are also almost the same. The results show that the use of GPU will not affect the fault diagnosis accuracy of rolling bearing. The use of GPU in the proposed fault diagnosis method is mainly to improve the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model.
Fig. 7 presents the loss curves of four different fault diagnosis methods for DataSet 3. As shown in Fig. 7, the loss curves of the four methods all decrease rapidly at the first 10 iterations, then they decrease slowly with the increase of iterations, and they become stable gradually after the 40th iteration. The results show that the fault diagnosis models are well trained, the weights and thresholds of BPNN are continuously optimized during the training period, and the optimal weights and thresholds are obtained at the end of the training of the models. It can be found from Fig. 7 that the loss values of Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN are smaller than that of Spark-BPNN and Spark-GPU-BPNN, this is because the initial weights and thresholds of BPNN are effectively optimized by QPSO algorithm.
C. Performance Analysis of Model Training and Fault Diagnosis Under Different Size Data Sets
In order to analyze the performance of model training and fault diagnosis achieved with the proposed fault diagnosis method under different size data sets, for three different size data sets, Local-QPSO-BPNN, Spark-QPSO-BPNN, and Spark-GPU-QPSO-BPNN are used to train rolling bearing fault diagnosis models, and then the trained models are used for fault diagnosis. In this experiment, Local-QPSO-BPNN uses one CPU core of a single worker node to perform model training and fault diagnosis in local model, whereas Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN perform model training and fault diagnosis on the cluster with 4 worker nodes. To better analyze the performance of fault diagnosis achieved with the proposed rolling bearing fault diagnosis method on the massive data, all the data in each data set are diagnosed, namely the data of 8 GB, 16 GB, and 32 GB are diagnosed respectively. For three different size data sets, the time spent on model training and fault diagnosis using Local-QPSO-BPNN, Spark-QPSO-BPNN, and Spark-GPU-QPSO-BPNN, respectively, are shown in Table. 4.
Fig. 8 shows the speedups of Spark-GPU-QPSO-BPNN over Local-QPSO-BPNN under different size data sets. The speedup is the ratio of the model training time or fault diagnosis time achieved with Local-QPSO-BPNN to the model training time or fault diagnosis time achieved with Spark-GPU-QPSO-BPNN. As seen from Fig. 8, the proposed Spark-GPU-QPSO-BPNN achieves a significant performance improvement compared with Local-QPSO-BPNN. For DataSet 1, DataSet 2, and DataSet 3, Spark-GPU-QPSO-BPNN obtains the speedups of
As can also be seen from Fig. 8, the speedups obtained for model training and fault diagnosis are gradually increased with the increase of data set size. This is because when Spark-GPU-QPSO-BPNN is used for model training and fault diagnosis on the cluster with 4 worker nodes, with the increase of data set size, the utilization of computing resources of GPU in each worker node is increased, and the parallel efficiencies of model training and fault diagnosis are also increased. Thus, the proposed fault diagnosis method is more suitable to deal with large-scale data sets.
Fig. 9 presents the speedups of Spark-GPU-QPSO-BPNN over Spark-QPSO-BPNN under different size data sets. The speedup is the ratio of the model training time or fault diagnosis time achieved with Spark-QPSO-BPNN to the model training time or fault diagnosis time achieved with Spark-GPU-QPSO-BPNN. As shown in Fig. 9, compared with Spark-QPSO-BPNN, the performance of model training and fault diagnosis achieved with Spark-GPU-QPSO-BPNN is significantly improved for different size data sets. For DataSet 1, DataSet 2, and DataSet 3, Spark-GPU-QPSO-BPNN obtains the speedups of
D. Performance Analysis of Model Training and Fault Diagnosis Under Different Size Clusters
In order to analyze the performance of model training and fault diagnosis of the proposed rolling bearing fault diagnosis method under different size clusters, Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN are adopted to perform model training and fault diagnosis respectively for DataSet 3 on the clusters with different numbers of worker nodes.
Table. 5 presents the model training time and fault diagnosis time obtained under different size clusters. As seen in Table. 5, as the number of worker nodes in the cluster increases, the model training time and fault diagnosis time achieved with Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN are gradually reduced. Compared with the cluster with a single worker node, on the clusters with 2, 3, and 4 worker nodes, the model training time achieved with Spark-GPU-QPSO-BPNN are reduced by 46.80%, 63.43%, and 72.53% respectively, and the fault diagnosis time achieved with Spark-GPU-QPSO-BPNN are reduced by 22.22%, 43.83%, and 59.88% respectively. The results show that the increase of cluster size can effectively improve the performance of model training and fault diagnosis for the proposed rolling bearing fault diagnosis method. Moreover, compared with Spark-QPSO-BPNN, the model training time and fault diagnosis time achieved with Spark-GPU-QPSO-BPNN are significantly reduced under different size clusters, which once again proves that the use of GPU can significantly improve the speeds of model training and fault diagnosis.
Fig. 10 shows the speedups achieved with Spark-GPU-QPSO-BPNN for model training. The speedup is the ratio of the model training time achieved with a single worker node to the model training time achieved with multiple worker nodes. As shown in Fig. 10, the speedups achieved with Spark-GPU-QPSO-BPNN are increased with the increase of the number of worker nodes in the cluster. On the clusters with 2, 3, and 4 worker nodes, the speedups achieved with Spark-GPU-QPSO-BPNN are
Fig. 11 presents the parallel efficiencies achieved with Spark-GPU-QPSO-BPNN for model training. The parallel efficiency is the ratio of the speedup obtained for model training to the number of worker nodes in the cluster. As shown in Fig. 11, when the numbers of worker nodes in the cluster are 2, 3, and 4, the parallel efficiencies achieved with Spark-GPU-QPSO-BPNN reach 93.99%, 91.15%, and 91.00% respectively, which shows that the computing resources of the Spark-GPU platform have been fully utilized. However, as the number of worker nodes increases, the parallel efficiency is gradually decreased. This is because the increase of cluster size will lead to the increases of communication overhead and task scheduling overhead between nodes, which will affect the performance of model training.
E. Analysis of the Impact of QPSO on the Performance of Model Training
In order to analyze the impact of QPSO on the performance of model training, Spark-BPNN, Spark-QPSO-BPNN, Spark-GPU-BPNN, and Spark-GPU-QPSO-BPNN are used for training and testing the fault diagnosis models respectively for DataSet 3 on the cluster with 4 worker nodes.
Fig. 12 gives the comparison of the model training time achieved using QPSO and without QPSO. The model training time achieved with Spark-QPSO-BPNN is increased by 84.14% than that achieved with Spark-BPNN, and the model training time achieved with Spark-GPU-QPSO-BPNN is increased by 78.29% than that achieved with Spark-GPU-BPNN. The main reason for the increase of model training time is that QPSO algorithm is used to optimize the initial weights and thresholds of BPNN in Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN, which requires more computational cost than the weights and thresholds of BPNN are randomly initialized in Spark-BPNN and Spark-GPU-BPNN. Although it takes a lot of time to optimize the initial weights and thresholds of BPNN, after obtaining the optimal initial weights and thresholds, Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN can converge to the global optimal weights and thresholds at a faster speed than Spark-BPNN and Spark-GPU-BPNN. Although using QPSO algorithm to optimize the initial weights and thresholds of BPNN will affect the model training efficiency, it can significantly improve the fault diagnosis accuracy. As shown in Fig. 6, the fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 2.11% higher than that achieved with Spark-GPU-BPNN for DataSet 3.
F. Analysis of Combination Effect of Multiple QPSO-BPNN Models
In order to verify the effectiveness of the proposed combination strategy of multiple QPSO-BPNN models based on ensemble learning, four different classifiers are adopted for training and testing the fault diagnosis models respectively for DataSet 3, including the QPSO-BPNN model of base end (QPSO-BPNN-BA), the QPSO-BPNN model of drive end (QPSO-BPNN-DE), the QPSO-BPNN model of fan end (QPSO-BPNN-FE), and the ensemble classifier composed of the above three different QPSO-BPNN models based on ensemble learning (QPSO-BPNN-EL).
Fig. 13 presents the fault diagnosis accuracies achieved with different classifiers. Compared with QPSO-BPNN-BA, QPSO-BPNN-DE, and QPSO-BPNN-FE, the fault diagnosis accuracy achieved with QPSO-BPNN-EL is increased by 1.79%, 0.85%, and 0.67% respectively. The results show that the ensemble classifier can effectively improve the fault diagnosis accuracy.
To further demonstrate the effectiveness of the proposed combination strategy, a sample whose real state is the inner race fault but that is misdiagnosed by one basic classifier is selected. Table. 6 presents the diagnosis results of three basic classifiers and one ensemble classifier for the sample. As seen in Table. 6, the diagnosis results achieved with QPSO-BPNN-BA, QPSO-BPNN-DE, and QPSO-BPNN-FE are the inner race fault, inner race fault, and ball fault respectively, and the diagnosis result achieved with QPSO-BPNN-EL is the inner race fault. The results show that when the diagnosis results of different basic classifiers are inconsistent for a sample, the ensemble classifier can obtain the best diagnosis result according to the output results of each basic classifier.
G. Comparison With Other Intelligent Optimization Algorithms
In order to better evaluate the optimization effect of QPSO algorithm, GA [34], APSO algorithm [38], and SPSO algorithm [39] are also adopted to optimize the initial weights and thresholds of BPNN. Spark-GPU-GA-BPNN, Spark-GPU-APSO-BPNN, Spark-GPU-SPSO-BPNN, and Spark-GPU-QPSO-BPNN are respectively used to perform model training on the cluster with 4 worker nodes for DataSet 3. During the training period, the max iterations of GA, APSO, SPSO, and QPSO are set to 50, and the other key parameter settings are as follows.
GA: The size of population is set to 100, the mutation probability is set to 0.2, and the crossover probability is set to 0.5.
APSO: The number of particles is set to 100 and two acceleration constants are set to 1.4945.
SPSO: The number of particles is set to 100, two acceleration constants are set to 2, the initial weight value is set to 0.9, and the final weight value is set to 0.4.
QPSO: See Section IV-B.
Table. 7 presents the fault diagnosis accuracies and model training time of different fault diagnosis methods. The fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 1.65%, 0.26% and 0.58% higher than that achieved with Spark-GPU-GA-BPNN, Spark-GPU-APSO-BPNN, and Spark-GPU-SPSO-BPNN, respectively. This is mainly because QPSO algorithm introduces the average best position of quantum particle swarm, which has better randomness and stronger global optimization ability than the other three algorithms when optimizing the initial weights and thresholds of BPNN.
As seen in Table. 7, the model training speed of Spark-GPU-QPSO-BPNN is
H. Comparison With Other Fault Diagnosis Methods
To further verify the effectiveness of the proposed rolling bearing fault diagnosis method, the following four different methods are used for model training and fault diagnosis on the cluster with 4 worker nodes: AlexNet [49] implemented with Spark-GPU (Spark-GPU-AlexNet), VGG-19 [50] implemented with Spark-GPU (Spark-GPU-VGG-19), ResNet-18 [51] implemented with Spark-GPU (Spark-GPU-ResNet-18), and the proposed Spark-GPU-QPSO-BPNN. For Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18, the original vibration data are converted into
The network structures and hyper-parameter settings of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are listed in Table. 8. Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 contain 11, 25, and 23 neural network layers, respectively. The detailed network structures of AlexNet, VGG-19, and ResNet-18 can be found in [49], [50], and [51]. The batch sizes of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are set to 128, 128, and 64 respectively, which can achieve a better fault diagnosis accuracy within the limit of the available GPU memory. The learning rate of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are set to 0.008, 0.005, and 0.003 respectively, which can provide a better convergence performance and avoid fluctuations in model training. The momentum of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are all set to 0.9, which can increase the convergence speed. The number of epochs of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are all set to 50, which can not only ensure a higher fault diagnosis accuracy, but also prevent the model training time from being too long.
Table. 9 presents the diagnosis accuracies, model training time, and fault diagnosis time of four different rolling bearing fault diagnosis methods based on Spark-GPU platform. The fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 1.11%, 1.16%, and 1.19% lower than that achieved with Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18, respectively. However, the model training speed of Spark-GPU-QPSO-BPNN is
Conclusion
To perform fast and accurate rolling bearing fault diagnosis in the big data environment, a rolling bearing fault diagnosis method based on parallel QPSO-BPNN under Spark-GPU platform is proposed. According to the idea of data parallelism, the distributed parallelization of QPSO-BPNN model is effectively realized on Spark-GPU platform, which significantly improves the performance of model training and fault diagnosis under large-scale rolling bearing data sets. In the distributed parallel training of QPSO-BPNN model, the master node collects the local parameters of each worker node and updates the global parameters according to the weights, and the updated global parameters are synchronized to each worker node, which effectively improves the convergence speed of the model. The combination strategy based on ensemble learning is adopted, and the output results of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end of rolling bearing are combined according to the weighted voting method to obtain the best fault diagnosis result of a sample. The effectiveness of the proposed fault diagnosis method is verified through experiments. The results illustrate that the proposed method can not only make full use of the computing resources of a Spark-GPU platform to efficiently perform model training and fault diagnosis but also obtain a higher fault diagnosis accuracy for the massive rolling bearing vibration data.
In the actual industrial production environments, the sensors deployed on rolling bearing can continuously collect vibration data. Facing a large amount of rolling bearing vibration data collected in real time, the rapid and accurate online fault diagnosis can effectively ensure the safe operation of mechanical equipment and reduce the maintenance cost. In future work, an online fault diagnosis method of rolling bearing based on parallel QPSO-BPNN in the big data environment will be explored.