Journals & Magazines >IEEE Access >Volume: 9

Rolling Bearing Fault Diagnosis Method Based on Parallel QPSO-BPNN Under Spark-GPU Platform

Flowchart of parallel implementation of QPSO-BPNN model based on Spark-GPU platform.

Abstract:

Facing the massive rolling bearing vibration data, how to improve the training efficiency, diagnosis efficiency, and diagnosis accuracy of the rolling bearing fault diagn...Show More

Metadata

Abstract:

Facing the massive rolling bearing vibration data, how to improve the training efficiency, diagnosis efficiency, and diagnosis accuracy of the rolling bearing fault diagnosis model is a challenge. Considering that the Spark-GPU platform provides powerful distributed parallel computing capabilities and back propagation neural network (BPNN) optimized by quantum particle swarm optimization (QPSO) algorithm has the characteristics of low computational complexity and high diagnosis accuracy, a rolling bearing fault diagnosis method based on parallel QPSO-BPNN under Spark-GPU platform is proposed. First, the distributed parallelization of QPSO-BPNN model based on Spark-GPU platform is realized, which can improve the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model in the big data environment. Second, in order to improve the convergence speed of fault diagnosis model, a parameter update strategy suitable for the distributed parallel training of QPSO-BPNN model is designed. At each iteration during training, the local parameters of each worker node are collected to the master node, and the global parameters are updated according to the weights and synchronized to each worker node. Third, a combination strategy of multiple QPSO-BPNN models based on ensemble learning is proposed. The weighted voting method is adopted to combine the output results of different QPSO-BPNN models to obtain the best fault diagnosis result of a sample, which can improve the fault diagnosis accuracy to a certain extent. Experimental results show that the proposed method can quickly perform model training and fault diagnosis for large-scale rolling bearing vibration data, and the fault diagnosis accuracy reaches 98.73%.

Flowchart of parallel implementation of QPSO-BPNN model based on Spark-GPU platform.

Published in: IEEE Access ( Volume: 9)

Page(s): 56786 - 56801

Date of Publication: 12 April 2021

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2021.3072596

Funding Agency:

No metrics found for this document.

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Rolling bearing is one of the key components of mechanical equipment, and fault diagnosis of rolling bearing is essential to ensure long-term efficient and stable operation of mechanical equipment [1]. The traditional rolling bearing fault diagnosis methods based on signal processing technology have been widely used, such as enhanced singular spectrum decomposition [2], frequency phase space empirical wavelet transform [3], adaptive generalized demodulation [4], high-order synchrosqueezing transform [5], recycling variational mode decomposition [6], and resonance-based sparse signal decomposition [7], etc. The above methods can effectively diagnose rolling bearing faults when the time-frequency domain features of vibration signals are obvious. However, the signal processing technologies have certain limitations to deal with the complex vibration signals with noise and unobvious features.

In recent years, with the rapid development of machine learning and deep learning, there are more and more data-driven rolling bearing fault diagnosis methods based on machine learning and deep learning, such as naive bayes algorithm [8], least square support vector machine [9], iterative random forest [10], BP neural network [11], one-dimensional convolutional neural network [12], two-dimensional convolutional neural network [13], LSTM recurrent neural network [14], deep belief network [15], generative adversarial network [16], deep residual network [17], and transfer learning [18], etc. The above researches mainly focus on improving the accuracy, generalization, anti-noise ability, and adaptability of rolling bearing fault diagnosis model. They provide effective ways to mine the underlying fault features from the complex rolling bearing vibration signals, which can establish an effective mapping between the complex vibration signals and the output results of rolling bearing fault diagnosis model. However, the fault diagnosis models of rolling bearing based on machine learning and deep learning are generally complex and require a long time to be trained. Especially in the big data environment, the use of massive training samples requires a huge computational cost. The above studies seldom consider how to improve the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model in the big data environment.

With the rapid development of big data technology, many scholars have carried out extensive research on fault diagnosis in industrial big data scenarios [19]–[25]. For example, some studies combine big data technology and data-driven fault diagnosis methods to diagnose faults of mobile robot [21], sulfur hexafluoride electrical equipment [22], power grid equipment [23], wind turbine gearbox [24], and reciprocating air compressor [25]. Most of the above-mentioned studies use MapReduce [26] or Spark [27] to parallelize the fault diagnosis models to improve the performance of industrial equipment fault diagnosis in the big data environment. Compared with MapReduce, Spark introduces resilient distributed data set (RDD) and implements an efficient directed acyclic graph execution engine, it has a faster processing speed, and thus it is more suitable for efficient fault diagnosis in the big data environment.

Due to the many-core GPU has the advantages of high-performance, low-power, and low-cost, recently some work has been done to explore how to combine Spark and GPU to accelerate solving domain-specific applications, such as urban traffic vehicle recognition [28], magnetic resonance imaging [29], and remote sensing image processing [30], etc. The experimental results from [28]–[30] show that combining Spark and GPU can significantly improve the performance of these applications, but the implementations of them are complicated and it is difficult to port their implementation methods to other fields. The newly released Spark 3.0 already supports the accelerator-aware scheduling, allowing users to discover and request GPU computing resources at Executor, Driver, and Task levels, which simplifies the development of applications based on Spark and GPU.

The authors’ previous work [31] proposed a rolling bearing fault diagnosis method based on QPSO-BPNN and Dempster-Shafer evidence theory, which can effectively and accurately diagnose different types of rolling bearing faults under different working conditions. With the expansion of industrial production scale and the increase in the complexity of mechanical equipment, the vibration data of rolling bearing collected by multiple sensors in real time are growing rapidly in the actual production environment. It is difficult to efficiently perform model training and fault diagnosis for large-scale rolling bearing vibration data using the serial QPSO-BPNN proposed in the previous work. For the massive rolling bearing vibration data, how to improve the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model is an urgent problem to be solved.

The authors’ another previous work [32] proposed a rolling bearing fault diagnosis method based on Spark and ACO-K-Means clustering algorithm, the ACO-K-Means clustering algorithm is successfully parallelized on Spark platforms, which can efficiently carry out clustering analysis on the massive rolling bearing vibration data in parallel. The proposed method focuses on improving the model training efficiency and fault diagnosis efficiency by fully utilizing all available CPU and memory resources on a Spark cluster. Compared with Spark platform, Spark-GPU platform has stronger distributed parallel computing ability, thus it is more helpful for improving the model training efficiency and fault diagnosis efficiency. Compared with ACO-K-Means clustering algorithm, QPSO-BPNN with strong non-linear mapping ability, high self-learning and adaptive abilities can obtain a higher and more stable fault diagnosis accuracy.

Therefore, a rolling bearing fault diagnosis method based on parallel QPSO-BPNN under Spark-GPU platform is proposed, which aims to fully exploit the powerful distributed parallel computing capabilities provided by the Spark-GPU platform and take advantage of QPSO-BPNN with low computational complexity and high diagnosis accuracy to achieve more efficient and accurate fault diagnosis of rolling bearing in the big data environment. However, it is still a challenge to efficiently implement QPSO-BPNN on a Spark-GPU platform. The current work focuses on how to efficiently perform model training and fault diagnosis for large-scale rolling bearing vibration data using the parallel QPSO-BPNN on Spark-GPU platforms.

The main contributions of this paper are summarized as follows.

The distributed parallelization of QPSO-BPNN model based on Spark-GPU platform is realized, which significantly improves the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model based on QPSO-BPNN in the big data environment.
A parameter update strategy suitable for the distributed parallel training of QPSO-BPNN model is proposed. At each iteration during training, the local parameters of each worker node are collected to the master node, and the global parameters are updated according to the weights and synchronized to each worker node, which improves the convergence speed of rolling bearing fault diagnosis model in the distributed parallel environment.
A combination strategy of multiple QPSO-BPNN models based on ensemble learning is proposed. The output results of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end of rolling bearing are combined by weighted voting to obtain the best fault diagnosis result of a sample, which improves the fault diagnosis accuracy of rolling bearing to a certain extent.
The effectiveness of the proposed rolling bearing fault diagnosis method is verified by a large number of experiments. Experimental results show that this method can not only make full use of the computing resources of the Spark-GPU platform to quickly perform model training and fault diagnosis on the massive rolling bearing vibration data, but also obtain a higher fault diagnosis accuracy.

The rest of this paper is organized as follows. The QPSO-BPNN model is introduced in Section II. The rolling bearing fault diagnosis method based on parallel QPSO-BPNN under Spark-GPU platform is described in Section III. The experimental results and analysis are presented in Section IV. The conclusions and future work are given in Section V.

SECTION II.

The Previous QPSO-BPNN Model

BPNN [33] is a classic multi-layer feedback forward neural network, which is characterized by forward propagation of signals and backward propagation of errors. Because it has a simple network structure and a strong nonlinear mapping ability, it is widely used to handle classification problems. The training of BPNN is divided into two stages: signal propagation, and weights and thresholds update. In the first stage, at first the signals are propagated from the input layer to the hidden layer, then the signals are propagated to the output layer in turn according to the weights and activation function of each neuron in the hidden layer, and finally the output results are obtained. In the second stage, at first the errors between the output results and targets are calculated and backward propagated, and then the weights and threshold of each neuron in each layer are corrected according to the errors. The above two stages are executed iteratively to complete the training of BPNN. However, BPNN has the disadvantages of slow convergence speed and easy to trap in local minimums, this is mainly because BPNN randomly initializes the weights and threshold of each neuron. As a result, recently some researchers began to adopt intelligent optimization algorithms to improve the initialization of the weights and thresholds of BPNN, such as genetic algorithm (GA) [34], differential evolution (DE) algorithm [35], and particle swarm optimization (PSO) algorithm [36], etc.

The classic PSO algorithm [37] is a swarm intelligence random search algorithm, which searches for the optimal solution according to the optimal particle in the solution space through iteration. However, since PSO algorithm has the problem of easily falling into a local optimal solution, recently some improved PSO algorithms have been developed, such as adaptive particle swarm optimization (APSO) algorithm [38], selective particle swarm optimization (SPSO) algorithm [39], and QPSO algorithm [40], etc. APSO algorithm adopts the nonlinear function to dynamically adjust the inertia weights and the contribution of each particle to avoid falling into the local optimal solution. SPSO algorithm changes the search space from a real-valued space into a set of selected values, which can reduce the computational cost of fitness values. QPSO algorithm mainly improves the PSO algorithm as follows. For one thing, the position of each particle is updated according to the average best position of quantum particle swarm, and the moving speed of the particle is no longer considered, which increases the randomness of the particle movement, so it can avoid falling into the local optimal solution. For another thing, only the shrinkage factor that controls the update of the particle position needs to be tuned, which is easier for performance tuning and enhances the global convergence ability. Compared with APSO algorithm and SPSO algorithm, QPSO algorithm can obtain a faster convergence speed with less computational cost, and it is more likely to obtain the optimal initial weights and thresholds of BPNN due to it has a stronger global convergence ability. Therefore, QPSO algorithm is more suitable for optimizing the initial weights and thresholds of BPNN.

The authors’ previous work [31] adopted QPSO algorithm to optimize the initial weights and thresholds of BPNN, as shown in Fig. 1. First, the random initial weights and thresholds of BPNN are obtained and the quantum particle swarm is initialized, including the number of particles, the dimension of particles, and the initial position of each particle (i.e., the initial weights and thresholds of BPNN). Second, the position of each particle is used as the weights and thresholds of BPNN to train the BPNN model once. Third, the fitness value and best position of each particle are calculated, where the fitness value of each particle is the error of the BPNN model training. Fourth, the global best fitness value and the global best position of quantum particle swarm are calculated. Fifth, the average best position of quantum particle swarm is calculated. Sixth, the position of each particle is updated according to the best position of each particle, the global best position and average best position of quantum particle swarm, and the shrinkage factor. Finally, determining whether the max iterations is reached or a satisfactory solution is obtained, if so, the optimal initial weights and thresholds of BPNN are obtained; otherwise, the next iteration is continue to be executed.

FIGURE 1.

Process of optimizing the initial weights and thresholds of BPNN with QPSO algorithm.

SECTION III.

The Proposed Fault Diagnosis Method of Rolling Bearing

A. Process of Rolling Bearing Fault Diagnosis Based on Parallel QPSO-BPNN

The overall process of rolling bearing fault diagnosis based on parallel QPSO-BPNN under Spark-GPU platform is shown in Fig. 2, which includes the following four stages: data preprocessing, data storage, model training and testing, and fault diagnosis.

FIGURE 2.

Process of rolling bearing fault diagnosis based on parallel QPSO-BPNN under Spark-GPU platform.

In the data preprocessing stage, first, the abnormal data contained in the original vibration signals collected by the sensors deployed on the base end, drive end, and fan end of rolling bearing are eliminated. Second, the cleaned data are divided into several samples. Third, each sample is standardized. Finally, the wavelet packet decomposition is performed on each sample to obtain the eigenvectors of different running states of rolling bearing.

In the data storage stage, all eigenvectors are stored in Hadoop Distributed File System (HDFS), and the eigenvectors used for model training and testing are divided into training set and test set.

In the model training and testing stage, first, the network structures and training parameters of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end are determined. Second, the training samples corresponding to the base end, drive end, and fan end from HDFS are used as the input of these three models respectively, and the distributed parallel trainings of these three models are performed on Spark-GPU platform. Finally, the test set from HDFS is used to test the three models.

In the fault diagnosis stage, first, the data to be diagnosed are read from HDFS. Second, the above three trained QPSO-BPNN models are performed to diagnose these data on Spark-GPU platform, respectively. Finally, the weighted voting method is adopted to combine the output results of these three models to obtain the final fault diagnosis results.

B. Parallel Design of QPSO-BPNN Model Based on Spark-GPU Platform

1) Overall Distributed Parallel Design Scheme

According to the idea of data parallelism, the overall distributed parallel design scheme of QPSO-BPNN model based on Spark-GPU platform is proposed, as shown in Fig. 3.

FIGURE 3.

Overall distributed parallel design scheme of QPSO-BPNN model based on Spark-GPU platform.

On a Spark-GPU platform, the master node is responsible for the task scheduling and resource management of the entire cluster, and each worker node can use one or more Spark executors to train one or more QPSO-BPNN models. Each Spark executor can exploit the RAPIDS library [41] developed by means of CUDA to call GPU computing resources to accelerate the training of QPSO-BPNN model. When starting the QPSO-BPNN model training program on Spark-GPU platform, firstly, a SparkContext object is initialized on the master node; secondly, the rolling bearing training set is read from HDFS to create an RDD; thirdly, each Spark executor reads the data of an RDD partition to train a QPSO-BPNN model. The distributed parallel training of QPSO-BPNN model based on Spark-GPU platform mainly includes the following two stages.

In the first stage, the QPSO algorithm is executed in parallel to optimize the initial weights and thresholds of BPNN. In each iteration of QPSO algorithm, firstly, the fitness value and best position of each particle are calculated. Due to the computational tasks of different particles are independent of each other, the computational tasks of all particles can be reasonably allocated to each Spark executor, and multiple Spark executors can be used to perform computational tasks of different particles in parallel. Secondly, the fitness value and best position of each particle are collected to calculate the global best fitness value, global best position, and average best position of quantum particle swarm. Finally, the position of each particle is updated according to the best position of each particle, the average best position of quantum particle swarm, and the shrinkage factor. Due to the updating processes of different particles are independent of each other, the position of each particle can be updated in parallel.

In the second stage, the QPSO-BPNN model is trained in parallel. After obtaining the optimal initial weights and thresholds of BPNN, the data-parallel strategy is adopted to realize the parallel training of QPSO-BPNN model. A large-scale training set is divided into $n$ smaller training subsets, and one training subset is used to train one QPSO-BPNN model. Due to the training of each model is independent of each other, multiple Spark executors can be used to perform model training tasks in parallel. In each iterative training of QPSO-BPNN model, firstly, each Spark executor replaces the weights and thresholds of the current model with the new global weights and thresholds. Secondly, each Spark executor uses its own training subset to perform the model training once to obtain the new weights and thresholds. Finally, the weights and thresholds of all models are collected to update the global weights and thresholds.

Both the calculation of the fitness value of each particle in the first stage and the training of each QPSO-BPNN model in the second stage involve a large number of matrix operations, thus it is very suitable to use GPU to speed up the training of the entire model.

2) Design of Parameter Update Strategy

In the distributed parallel training of QPSO-BPNN model, using the idea of the parameter server architecture [42], the master node is used as the parameter server node to collect the weights and thresholds of QPSO-BPNN model of each worker node on Spark-GPU platform, the global weights and thresholds are updated according to the weight of each model, and the updated global weights and thresholds are synchronized to each worker node.

In the $t$ -th iterative training of QPSO-BPNN model, the process of parameter update includes the following steps.

Step 1.
On the $k$ worker nodes, the $n$ QPSO-BPNN models corresponding to $n$ training subsets are trained in parallel according to the current weights and thresholds $(g_{1}^{t},g_{2}^{t},\ldots,g_{n}^{t})$ , and the new weights and thresholds $(g_{1.temp}^{t},g_{2.temp}^{t},\ldots,g_{n.temp}^{t})$ and the losses $(loss_{1}^{t},loss_{2}^{t},\ldots, loss_{n}^{t})$ of $n$ QPSO-BPNN models are obtained and collected to the parameter server node, where $g_{i}^{t}$ represents the weights and thresholds used for training the $i$ -th QPSO-BPNN model and $1\leq i\leq n$ .
Step 2.
On the parameter server node, firstly, the losses of all QPSO-BPNN models are normalized by Min-Max normalization method. Secondly, according to the normalized losses $(l_{1}^{t},l_{2}^{t},\ldots,l_{n}^{t})$ , the weight of each model in the global parameter update is calculated by
$\begin{equation*} \eta _{i}=(1-l_{i}^{t})/\sum _{j=1}^{n}1-l_{j}^{t},\tag{1}\end{equation*}$ View Source where $\eta _{i}$ represents the weight of the $i$ -th QPSO-BPNN model and $l_{i}^{t}$ represents the normalized result of the loss obtained after the $t$ -th iterative training of the $i$ -th QPSO-BPNN model. Thirdly, according to the weight $\eta _{i}$ , the global weights and thresholds are updated by $\begin{equation*} G^{t+1}=G^{t}+\sum _{i=1}^{n}\eta _{i}\left ({g_{i.temp}^{t}-G^{t}}\right),\tag{2}\end{equation*}$ View Source where $G^{t}$ denotes the current global weights and thresholds.
Step 3.
The parameter server node broadcasts the new global weights and thresholds $G^{t+1}$ to all worker nodes, and the weights and thresholds of all QPSO-BPNN models on each worker node are updated synchronously, i.e., $g_{1}^{t+1}=g_{2}^{t+1}=\cdots =g_{n}^{t+1}=G^{t+1}$ .

C. Parallel Implementation of QPSO-BPNN Model Based on Spark-GPU Platform

According to the above-mentioned distributed parallel design scheme of QPSO-BPNN model based on Spark-GPU platform, this subsection describes the parallel implementation of QPSO-BPNN model based on Spark-GPU platform. The flowchart of parallel implementation of QPSO-BPNN model based on Spark-GPU platform is shown in Fig. 4 and the pseudo-code of that is described in Algorithm 1, which mainly includes the following two stages.

FIGURE 4.

Flowchart of parallel implementation of QPSO-BPNN model based on Spark-GPU platform.

Algorithm 1 Parallel Implementation of QPSO-BPNN Model Based on Spark-GPU Platform

Require:

$m$ eigenvectors, the number of particles $n$ , the max iterations maxIterQPSO and error goal errGoal of QPSO, the max iterations maxIterBPNN of BPNN

Ensure:

The trained QPSO-BPNN model

Initialize the weights and thresholds $G^{1}$ of BPNN and quantum particle swarm on the master node;

Broadcast $G^{1}$ and the initial parameters of quantum particle swarm to each worker node;

Read the training set with $m$ eigenvectors from HDFS to create an RDD in parallel: tRDD = ( $E_{1},E_{2},\ldots,E_{m}$ );

for $i\gets 1$ to maxIterQPSO do

for all Spark executors on GPUs in parallel do

if $i$ = 1 then

Initialize the positions: ( $P_{1}^{1},P_{2}^{1},\ldots,P_{n}^{1}$ ) $\gets ~G^{1}$ ;

else

Update the positions ( $P_{1}^{i},P_{2}^{i},\ldots,P_{n}^{i}$ ) by (3);

10:

end if

11:

Create or update the key-value pair RDD:

pRDD = ( $\langle P_{1}^{i},E_{1}\rangle,\langle P_{1}^{i},E_{2}\rangle,\ldots,\langle P_{n}^{i},E_{m}\rangle$ );

12:

Calculate the fitness values of all particles: ( $f_{1}^{i},f_{2}^{i}$ , $\ldots,f_{n}^{i}$ ) $\gets$ BPNN(( $P_{1}^{i},P_{2}^{i},\ldots,P_{n}^{i}$ ), pRDD);

13:

Calculate the best positions of all particles: ( $P_{1.best}^{i},P_{2.best}^{i},\ldots,P_{n.best}^{i}$ ) $\gets$ cmp(( $f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i}$ ), ( $f_{1}^{i-1},f_{2}^{i-1},\ldots,f_{n}^{i-1}$ ));

14:

end for

15:

Calculate the global best fitness value on the master node: $f_{best}^{i} \gets$ min( $f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i}$ );

16:

Calculate the global best position on the master node: $P_{best}^{i} \gets ~P_{(\mathop {\arg \min }\limits _{x\in \{1,2,\ldots,n\}}f_{x}^{i}).best}^{i}$ ;

17:

if $i\geq 2$ and $f_{best}^{i-1}\leq f_{best}^{i}$ then

18:

$P_{best}^{i} \gets P_{best}^{i-1}$ , $f_{best}^{i} \gets f_{best}^{i-1}$ ;

19:

end if

20:

Calculate the average best position on the master node: $M_{best}^{i} \gets \sum \nolimits _{x=1}^{n}P_{x.best}^{i}/n$ ;

21:

Broadcast $P_{best}^{i}$ and $M_{best}^{i}$ to all worker nodes;

22:

if $f_{best}^{i} <$ errGoal then break; end if

23:

end for

24:

Get the best initial weights and thresholds: $G^{1} \gets ~P_{best}^{i}$ ;

25:

for $i\gets 1$ to maxIterBPNN do

26:

for all Spark executors on GPUs in parallel do

27:

Initialize or update the weights and thresholds of $n$ QPSO-BPNN models: ( $g_{1}^{i},g_{2}^{i},\ldots,g_{n}^{i}$ ) $\gets ~G^{i}$ ;

28:

Create or update the key-value pair RDD: gRDD = ( $\langle g_{1}^{i},E_{1}\rangle,\langle g_{1}^{i},E_{2}\rangle,\ldots,\langle g_{n}^{i},E_{m}\rangle$ );

29:

Train $n$ QPSO-BPNN models: ( $g_{1.temp}^{i},g_{2.temp}^{i}$ , $\ldots, g_{n.temp}^{i}$ ) $\gets$ QPSO_BPNN(( $g_{1}^{i},g_{2}^{i},\ldots,g_{n}^{i}$ ), gRDD);

30:

end for

31:

Update the global weights and thresholds $G^{i+1}$ by (1) and (2) on the master node;

32:

Broadcast $G^{i+1}$ to all worker nodes;

33:

end for

The first stage is the parallel implementation of QPSO algorithm for optimizing the initial weights and thresholds of BPNN based on Spark-GPU platform, including the following steps.

Step 1.
Initialize BPNN and quantum particle swarm on the master node. Firstly, the network structure of BPNN needs to be determined. The number of the input layer nodes is set to 8 which is the dimension of an eigenvector. The number of the hidden layers is set to 2, and the number of the first and second hidden layer nodes are set to 20 and 12, respectively, which are determined by Hofferding inequality and [43]. The number of the output layer nodes is set to 4 which is the number of classes of rolling bearing running states. Secondly, the weights and thresholds $G^{1}$ of BPNN are randomly initialized, and the particle number $n$ of quantum particle swarm and the dimension of particle are initialized. The dimension of particle is determined by the number of weights and thresholds of the input layer, hidden layer, and output layer of BPNN, i.e., $(8\times 20+20)+(20\times 12+12)+(12\times 4+4)=484$ . Finally, the random initial weights and thresholds of BPNN and initial parameters of quantum particle swarm are broadcasted to all worker nodes.
Step 2.
Read the training set from HDFS and use multiple Spark executors to call GPU computing resources to create an RDD tRDD in parallel, where each element of tRDD is an eigenvector. tRDD can be equally divided into $n$ RDD partitions according to the particle number $n$ , if tRDD contains $m$ eigenvectors, then the $x$ -th partition of tRDD can be denoted by $(E_{(x-1)m/n+1},E_{(x-1)m/n+2},\ldots,E_{xm/n})$ , where $1\leq x\leq n$ .
Step 3.
Use multiple Spark executors to call GPU computing resources to initialize the positions of all particles and create a new RDD in parallel. Firstly, $G^{1}$ is adopted to initialize the positions $(P_{1}^{1},P_{2}^{1},\ldots,P_{n}^{1})$ of all particles, i.e., $P_{1}^{1}=P_{2}^{1}=\cdots =P_{n}^{1}=G^{1}$ . Secondly, a key-value pair RDD pRDD is constructed by taking the position of each particle as a key and each eigenvector of tRDD as a value, and the $x$ -th partition of pRDD can be represented by $(\langle P_{x}^{1},E_{(x-1)m/n+1}\rangle,\langle P_{x}^{1},E_{(x-1)m/n+2}\rangle,\ldots,\langle P_{x}^{1},E_{xm/n}\rangle)$ , where $1\leq x\leq n$ .
Step 4.
Use multiple Spark executors to call GPU computing resources to calculate the fitness values and best positions of all particles in parallel. Firstly, the data of each partition in pRDD is used to train a BPNN model respectively, and each model is trained once to obtain the fitness values $(f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i})$ , where $f_{x}^{i}$ denotes the fitness value obtained by the $x$ -th particle in the $i$ -th iteration. Secondly, the best positions $(P_{1.best}^{i},P_{2.best}^{i},\ldots,P_{n.best}^{i})$ of all particles are determined by comparing the fitness values $(f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i})$ obtained by all particles in the current iteration with the fitness values $(f_{1}^{i-1},f_{2}^{i-1},\ldots,f_{n}^{i-1})$ obtained by all particles in the previous iteration. If $i\geq 2$ and $f_{x}^{i-1}\leq f_{x}^{i}$ , then $P_{x.best}^{i}=P_{x}^{i-1}$ ; otherwise, $P_{x.best}^{i}=P_{x}^{i}$ , where $1\leq x\leq n$ .
Step 5.
Calculate and broadcast the global best position and average best position of quantum particle swarm on the master node. Firstly, the master node collects the fitness values and best positions of all particles. Secondly, the global best fitness value $f_{best}^{i}=\min (f_{1}^{i},f_{2}^{i},\ldots,f_{n}^{i})$ and the global best position $P_{best}^{i}=P_{(\mathop {\arg \min }\limits _{x\in \{1,2,\ldots,n\}}f_{x}^{i}).best}^{i}$ of quantum particle swarm are obtained by comparing the fitness values of all particles in the $i$ -th iteration. Thirdly, the global best position of quantum particle swarm is updated by comparing the global best fitness value $f_{best}^{i}$ obtained in the current iteration with the global best fitness value $f_{best}^{i-1}$ obtained in the previous iteration. If $i\geq 2$ and $f_{best}^{i-1}\leq f_{best}^{i}$ , then $P_{best}^{i}=P_{best}^{i-1}$ and $f_{best}^{i}=f_{best}^{i-1}$ ; otherwise, there is no need to update the global best position. Fourthly, the average best position of quantum particle swarm is calculated as $M_{best}^{i}=\sum \limits _{x=1}^{n}P_{x.best}^{i}/n$ . Finally, $P_{best}^{i}$ and $M_{best}^{i}$ are broadcasted to all worker nodes.
Step 6.
Determine whether the current iteration number reaches the max iterations or whether the global best fitness value is lower than the error goal. If so, the iteration is terminated and the optimal initial weights and thresholds of BPNN are returned, i.e., $G^{1}=P_{best}^{i}$ ; otherwise, the positions of all particles are updated and the next iteration will be continued by going back to Step 4. The process of using multiple Spark executors to call GPU computing resources to update the positions of all particles and pRDD in parallel is as follows. Firstly, according to the best position of all particles and the global best position and average best position of quantum particle swarm, the latest positions $(P_{1}^{i+1},P_{2}^{i+1},\ldots,P_{n}^{i+1})$ of all particles are calculated by
$\begin{equation*} P_{x}^{i+1}=\alpha P_{x.best}^{i}+(1-\alpha)P_{best}^{i}\pm \varphi \left |{M_{best}^{i}-P_{x}^{i}}\right |\ln \frac {1}{\beta },\tag{3}\end{equation*}$ View Source where $P_{x}^{i+1}$ represents the latest position of the $x$ -th particle in the $(i+1)$ -th iteration, $\alpha$ and $\beta$ are uniform distributions on (0, 1), and $\varphi$ represents the shrinkage factor [44]. To increase the randomness of the particle movement, ± is used before the absolute term that is the distance between the average best position of quantum particle swarm and the latest position of the particle. Secondly, Min-Max normalization is performed on the latest positions of all particles, and the key of each key-value pair in pRDD is updated accordingly.

The second stage is the parallel implementation of QPSO-BPNN model training based on Spark-GPU platform, including the following steps.

Step 1.
Use multiple Spark executors to call GPU computing resources to initialize the weights and thresholds of all QPSO-BPNN models and create a new RDD in parallel. Firstly, the optimal initial weights and thresholds $G^{1}$ are used as the initial weights and thresholds $(g_{1}^{1},g_{2}^{1},\ldots,g_{n}^{1})$ of all QPSO-BPNN models, i.e., $g_{1}^{1}=g_{2}^{1}=\cdots =g_{n}^{1}=G^{1}$ . Secondly, a key-value pair RDD gRDD is constructed by taking the weights and thresholds of each QPSO-BPNN model as a key and each eigenvector of tRDD as a value, and the $x$ -th partition of gRDD can be denoted by $(\langle g_{x}^{1},E_{(x-1)m/n+1}\rangle,\langle g_{x}^{1},E_{(x-1)m/n+2}\rangle,\ldots,\langle g_{x}^{1},E_{xm/n}\rangle)$ , where $1\leq x\leq n$ .
Step 2.
Use multiple Spark executors to call GPU computing resources to train all QPSO-BPNN models in parallel. The data of each partition in gRDD is used to train a QPSO-BPNN model respectively, and each model is trained once to obtain the latest weights and thresholds $(g_{1.temp}^{i},g_{2.temp}^{i},\ldots,g_{n.temp}^{i})$ , where $g_{x.temp}^{i}$ denotes the latest weights and thresholds of the $x$ -th QPSO-BPNN model obtained in the $i$ -th iteration.
Step 3.
Calculate and broadcast the global weights and thresholds according to the weight of each QPSO-BPNN model on the master node. Firstly, the master node collects the latest weights and thresholds of all QPSO-BPNN models. Secondly, the global weights and thresholds $G^{i+1}$ are updated by (1) and (2) and broadcasted to all worker nodes.
Step 4.
Determine whether the current iteration number reaches the max iterations. If so, the iteration is terminated and the final QPSO-BPNN model is obtained; otherwise, the weights and thresholds of all QPSO-BPNN models are updated and the next iteration will be continued by going back to Step 2. The process of using multiple Spark executors to call GPU computing resources to update the weights and thresholds and gRDD in parallel is as follows. Firstly, the weights and thresholds $(g_{1}^{i+1},g_{2}^{i+1},\ldots,g_{n}^{i+1})$ of all QPSO-BPNN models are updated according to $G^{i+1}$ , i.e., $g_{1}^{i+1}=g_{2}^{i+1}=\cdots =g_{n}^{i+1}=G^{i+1}$ . Secondly, the key of each key-value pair in gRDD is updated as the new weights and thresholds.

D. Combination Strategy of Multiple QPSO-BPNN Models Based on Ensemble Learning

The proposed rolling bearing fault diagnosis model is composed of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end, which can reduce the risk of misdiagnosis caused by the wrong classification of a single QPSO-BPNN model, and can improve the fault diagnosis accuracy to a certain extent. The common combination strategies include Dempster-Shafer (DS) evidence theory [45] and ensemble learning [46]. Considering that ensemble learning can avoid the explosive growth of the exponential function and the problem of more parameters required for calculating the basic probability distribution function in DS evidence theory, a combination strategy of multiple QPSO-BPNN models based on ensemble learning is proposed, as shown in Fig. 5.

FIGURE 5.

Combination of three QPSO-BPNN models based on ensemble learning.

In the combination of multiple QPSO-BPNN models based on ensemble learning, the weighted voting method is used to combine the classification results of multiple basic classifiers (i.e., QPSO-BPNN models) to obtain the best fault diagnosis result of a sample. The classification results of multiple basic classifiers for sample $x$ are combined by weighted voting according to

$\begin{equation*} H(x)=\mathop {\arg \max }\limits _{y\in \{1,2,\ldots,j\}}\sum _{i=1}^{s}\omega _{i}^{y}h_{i}^{y}(x),\tag{4}\end{equation*}$ View Source

and

$\begin{equation*} \omega _{i}^{y}=Acc_{i}^{y}/\sum _{k=1}^{s}Acc_{k}^{y},\tag{5}\end{equation*}$

View Source

where

$j$

represents the number of classes,

$s$

represents the number of basic classifiers,

$\omega _{i}^{y}$

represents the weight of the

$i$

-th basic classifier when classifying the sample

$x$

into class

$y$

$h_{i}^{y}(x)$

denotes the probability that the

$i$

-th basic classifier classifies the sample

$x$

into class

$y$

, and

$Acc_{i}^{y}$

denotes the accuracy of the

$i$

-th basic classifier to classify the sample whose true classification result is class

$y$

into class

$y$

. During the fault diagnosis, the running states of rolling bearing include normal state, inner race fault, ball fault, and outer race fault, thus

$j$

can be set to 4; three basic classifiers (i.e., QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end) are used, thus

$s$

can be set to 3.

SECTION IV.

Experimental Results and Analysis

A. Experimental Setup

The experimental platform used in this paper is a distributed cluster. The hardware environment of the cluster is shown in Table. 1, and the software environment of the cluster is shown in Table. 2. In order to compare and analyze the impact of using GPU and not using GPU on the fault diagnosis accuracy, model training efficiency, and fault diagnosis efficiency, a series of experiments are carried out on the experimental platform with GPU (called Spark-GPU platform) and the experimental platform without GPU (called Spark platform).

TABLE 1 Hardware Environment of the Cluster

TABLE 2 Software Environment of the Cluster

The experimental data used in this paper are the vibration data of rolling bearing in different running states provided by the Case Western Reserve University Bearing Data Center [47]. They are collected by sensors deployed on the base end, drive end, and fan end of rolling bearing under different working conditions. Due to a large-scale data set is more helpful to verify the effectiveness of the proposed fault diagnosis method, at first the sliding window method [48] is adopted to enhance the original vibration data, then the enhanced data are preprocessed (see Section III-A), and finally the three different size data sets composed of eigenvectors are obtained. Table. 3 presents the description of the rolling bearing data set. Each data set includes the following different running-state monitoring data of rolling bearing: normal state data, inner race fault data, ball fault data, and outer race fault data. Each data set is randomly divided into the training set and test set at the ratio of 8:2. In the training of rolling bearing fault diagnosis model based on Spark platform or Spark-GPU platform, the size of the training set that can be used should consider not only the hardware resource limitations of the cluster but also the model training efficiency. If a larger-scale rolling bearing data set is used, more worker nodes are required or the hardware configuration of each worker node is needed to be enhanced.

TABLE 3 Description of the Rolling Bearing Data Set

B. Analysis of Fault Diagnosis Accuracy

In this experiment, for DataSet 1, DataSet 2, and DataSet 3, BPNN implemented with Spark (Spark-BPNN), QPSO-BPNN implemented with Spark (Spark-QPSO-BPNN), BPNN implemented with Spark-GPU (Spark-GPU-BPNN), and QPSO-BPNN implemented with Spark-GPU (Spark-GPU-QPSO-BPNN) are used for training and testing the rolling bearing fault diagnosis models, respectively. In the training of these fault diagnosis models, the key parameter settings of QPSO and BPNN are as follows.

QPSO: The number of particles is set to 100, the shrinkage factor is set to 0.8, the max iterations is set to 50, and the error goal is set to 0.001.
BPNN: The learning rate is set to 0.003, the momentum is set to 0.9, and the max iterations is set to 50.

The number of particles is one of the most important parameters of QPSO algorithm, too many particles will increase the computational cost, but too few particles will decrease the optimization effect. The setting of the shrinkage factor will affect the convergence speed of QPSO algorithm, if it is set too small, the convergence speed will be very slow; if it is set too large, the algorithm may fail to converge to an optimal solution. The learning rate is one of the most important parameters of BPNN, the setting of the learning rate will directly affect the convergence performance of BPNN, and it is usually between 0.001 and 0.01. The setting of the momentum will also affect the convergence speed of BPNN, and generally a larger value of momentum will increase the convergence speed.

Fig. 6 shows the diagnosis accuracies achieved using four different fault diagnosis methods and three different size data sets on the cluster described in Table. 1. As depicted in Fig. 6, the fault diagnosis accuracy achieved with Spark-QPSO-BPNN is 2.40% higher than that achieved with Spark-BPNN on average, and the fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 2.41% higher than that achieved with Spark-GPU-BPNN on average. The results show that QPSO algorithm can effectively optimize the initial weights and thresholds of BPNN, thereby obtaining a higher fault diagnosis accuracy.

FIGURE 6.

Diagnosis accuracies achieved using different fault diagnosis methods and different size data sets.

It can be seen from Fig. 6 that the diagnosis accuracies achieved with Spark-GPU-QPSO-BPNN reach 98.66%, 98.70%, and 98.73% for DataSet 1, DataSet 2, and DataSet 3, respectively, which shows that the fault diagnosis accuracy is improved with the increase of data set size. This is because the fault features contained in the training samples become more and more with the increase of rolling bearing data set size, which helps to improve the fault diagnosis accuracy.

It can also be seen from Fig. 6 that the fault diagnosis accuracy achieved with Spark-BPNN and that achieved with Spark-GPU-BPNN are almost the same, and the fault diagnosis accuracy achieved with Spark-QPSO-BPNN and that achieved with Spark-GPU-QPSO-BPNN are also almost the same. The results show that the use of GPU will not affect the fault diagnosis accuracy of rolling bearing. The use of GPU in the proposed fault diagnosis method is mainly to improve the training efficiency and diagnosis efficiency of rolling bearing fault diagnosis model.

Fig. 7 presents the loss curves of four different fault diagnosis methods for DataSet 3. As shown in Fig. 7, the loss curves of the four methods all decrease rapidly at the first 10 iterations, then they decrease slowly with the increase of iterations, and they become stable gradually after the 40th iteration. The results show that the fault diagnosis models are well trained, the weights and thresholds of BPNN are continuously optimized during the training period, and the optimal weights and thresholds are obtained at the end of the training of the models. It can be found from Fig. 7 that the loss values of Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN are smaller than that of Spark-BPNN and Spark-GPU-BPNN, this is because the initial weights and thresholds of BPNN are effectively optimized by QPSO algorithm.

FIGURE 7.

Loss curves of different fault diagnosis methods for dataSet 3.

C. Performance Analysis of Model Training and Fault Diagnosis Under Different Size Data Sets

In order to analyze the performance of model training and fault diagnosis achieved with the proposed fault diagnosis method under different size data sets, for three different size data sets, Local-QPSO-BPNN, Spark-QPSO-BPNN, and Spark-GPU-QPSO-BPNN are used to train rolling bearing fault diagnosis models, and then the trained models are used for fault diagnosis. In this experiment, Local-QPSO-BPNN uses one CPU core of a single worker node to perform model training and fault diagnosis in local model, whereas Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN perform model training and fault diagnosis on the cluster with 4 worker nodes. To better analyze the performance of fault diagnosis achieved with the proposed rolling bearing fault diagnosis method on the massive data, all the data in each data set are diagnosed, namely the data of 8 GB, 16 GB, and 32 GB are diagnosed respectively. For three different size data sets, the time spent on model training and fault diagnosis using Local-QPSO-BPNN, Spark-QPSO-BPNN, and Spark-GPU-QPSO-BPNN, respectively, are shown in Table. 4.

TABLE 4 Model Training Time and Fault Diagnosis Time Obtained Under Different Size Data Sets

Fig. 8 shows the speedups of Spark-GPU-QPSO-BPNN over Local-QPSO-BPNN under different size data sets. The speedup is the ratio of the model training time or fault diagnosis time achieved with Local-QPSO-BPNN to the model training time or fault diagnosis time achieved with Spark-GPU-QPSO-BPNN. As seen from Fig. 8, the proposed Spark-GPU-QPSO-BPNN achieves a significant performance improvement compared with Local-QPSO-BPNN. For DataSet 1, DataSet 2, and DataSet 3, Spark-GPU-QPSO-BPNN obtains the speedups of $324.13\times$ , $334.91\times$ , and $355.88\times$ over Local-QPSO-BPNN for model training respectively, and Spark-GPU-QPSO-BPNN obtains the speedups of $110.48\times$ , $124.03\times$ , and $143.02\times$ over Local-QPSO-BPNN for fault diagnosis respectively. This is mainly because Spark-GPU-QPSO-BPNN can fully utilize many-core GPUs of multiple worker nodes to efficiently perform model training and fault diagnosis in parallel on the Spark-GPU platform based on memory computing, which greatly improves the performance of model training and fault diagnosis under large-scale data sets. Moreover, the speedup obtained for model training is higher than that obtained for fault diagnosis, because Spark and GPU can give full play to their computational advantages in model training with a large number of iterative computations.

FIGURE 8.

Speedups of Spark-GPU-QPSO-BPNN over Local-QPSO-BPNN.

As can also be seen from Fig. 8, the speedups obtained for model training and fault diagnosis are gradually increased with the increase of data set size. This is because when Spark-GPU-QPSO-BPNN is used for model training and fault diagnosis on the cluster with 4 worker nodes, with the increase of data set size, the utilization of computing resources of GPU in each worker node is increased, and the parallel efficiencies of model training and fault diagnosis are also increased. Thus, the proposed fault diagnosis method is more suitable to deal with large-scale data sets.

Fig. 9 presents the speedups of Spark-GPU-QPSO-BPNN over Spark-QPSO-BPNN under different size data sets. The speedup is the ratio of the model training time or fault diagnosis time achieved with Spark-QPSO-BPNN to the model training time or fault diagnosis time achieved with Spark-GPU-QPSO-BPNN. As shown in Fig. 9, compared with Spark-QPSO-BPNN, the performance of model training and fault diagnosis achieved with Spark-GPU-QPSO-BPNN is significantly improved for different size data sets. For DataSet 1, DataSet 2, and DataSet 3, Spark-GPU-QPSO-BPNN obtains the speedups of $15.88\times$ , $16.09\times$ , and $16.78\times$ over Spark-QPSO-BPNN for model training respectively, and Spark-GPU-QPSO-BPNN obtains the speedups of $11.24\times$ , $11.66\times$ , and $13.04\times$ over Spark-QPSO-BPNN for fault diagnosis respectively. The results prove that the use of GPU can greatly improve the speeds of model training and fault diagnosis. This is mainly because most of the computations in BPNN are matrix operations, and many-core GPUs are more suitable for the parallel operations of large-scale matrices than multi-core CPUs.

FIGURE 9.

Speedups of Spark-GPU-QPSO-BPNN over Spark-QPSO-BPNN.

D. Performance Analysis of Model Training and Fault Diagnosis Under Different Size Clusters

In order to analyze the performance of model training and fault diagnosis of the proposed rolling bearing fault diagnosis method under different size clusters, Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN are adopted to perform model training and fault diagnosis respectively for DataSet 3 on the clusters with different numbers of worker nodes.

Table. 5 presents the model training time and fault diagnosis time obtained under different size clusters. As seen in Table. 5, as the number of worker nodes in the cluster increases, the model training time and fault diagnosis time achieved with Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN are gradually reduced. Compared with the cluster with a single worker node, on the clusters with 2, 3, and 4 worker nodes, the model training time achieved with Spark-GPU-QPSO-BPNN are reduced by 46.80%, 63.43%, and 72.53% respectively, and the fault diagnosis time achieved with Spark-GPU-QPSO-BPNN are reduced by 22.22%, 43.83%, and 59.88% respectively. The results show that the increase of cluster size can effectively improve the performance of model training and fault diagnosis for the proposed rolling bearing fault diagnosis method. Moreover, compared with Spark-QPSO-BPNN, the model training time and fault diagnosis time achieved with Spark-GPU-QPSO-BPNN are significantly reduced under different size clusters, which once again proves that the use of GPU can significantly improve the speeds of model training and fault diagnosis.

TABLE 5 Model Training Time and Fault Diagnosis Time Obtained Under Different Size Clusters

Fig. 10 shows the speedups achieved with Spark-GPU-QPSO-BPNN for model training. The speedup is the ratio of the model training time achieved with a single worker node to the model training time achieved with multiple worker nodes. As shown in Fig. 10, the speedups achieved with Spark-GPU-QPSO-BPNN are increased with the increase of the number of worker nodes in the cluster. On the clusters with 2, 3, and 4 worker nodes, the speedups achieved with Spark-GPU-QPSO-BPNN are $1.88\times$ , $2.73\times$ , and $3.64\times$ respectively, which shows that the QPSO-BPNN model is well distributed and parallelized on Spark-GPU platform.

FIGURE 10.

Speedups achieved with Spark-GPU-QPSO-BPNN for model training.

Fig. 11 presents the parallel efficiencies achieved with Spark-GPU-QPSO-BPNN for model training. The parallel efficiency is the ratio of the speedup obtained for model training to the number of worker nodes in the cluster. As shown in Fig. 11, when the numbers of worker nodes in the cluster are 2, 3, and 4, the parallel efficiencies achieved with Spark-GPU-QPSO-BPNN reach 93.99%, 91.15%, and 91.00% respectively, which shows that the computing resources of the Spark-GPU platform have been fully utilized. However, as the number of worker nodes increases, the parallel efficiency is gradually decreased. This is because the increase of cluster size will lead to the increases of communication overhead and task scheduling overhead between nodes, which will affect the performance of model training.

FIGURE 11.

Parallel efficiencies achieved with Spark-GPU-QPSO-BPNN for model training.

E. Analysis of the Impact of QPSO on the Performance of Model Training

In order to analyze the impact of QPSO on the performance of model training, Spark-BPNN, Spark-QPSO-BPNN, Spark-GPU-BPNN, and Spark-GPU-QPSO-BPNN are used for training and testing the fault diagnosis models respectively for DataSet 3 on the cluster with 4 worker nodes.

Fig. 12 gives the comparison of the model training time achieved using QPSO and without QPSO. The model training time achieved with Spark-QPSO-BPNN is increased by 84.14% than that achieved with Spark-BPNN, and the model training time achieved with Spark-GPU-QPSO-BPNN is increased by 78.29% than that achieved with Spark-GPU-BPNN. The main reason for the increase of model training time is that QPSO algorithm is used to optimize the initial weights and thresholds of BPNN in Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN, which requires more computational cost than the weights and thresholds of BPNN are randomly initialized in Spark-BPNN and Spark-GPU-BPNN. Although it takes a lot of time to optimize the initial weights and thresholds of BPNN, after obtaining the optimal initial weights and thresholds, Spark-QPSO-BPNN and Spark-GPU-QPSO-BPNN can converge to the global optimal weights and thresholds at a faster speed than Spark-BPNN and Spark-GPU-BPNN. Although using QPSO algorithm to optimize the initial weights and thresholds of BPNN will affect the model training efficiency, it can significantly improve the fault diagnosis accuracy. As shown in Fig. 6, the fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 2.11% higher than that achieved with Spark-GPU-BPNN for DataSet 3.

FIGURE 12.

Comparison of the model training time achieved using QPSO and without QPSO.

F. Analysis of Combination Effect of Multiple QPSO-BPNN Models

In order to verify the effectiveness of the proposed combination strategy of multiple QPSO-BPNN models based on ensemble learning, four different classifiers are adopted for training and testing the fault diagnosis models respectively for DataSet 3, including the QPSO-BPNN model of base end (QPSO-BPNN-BA), the QPSO-BPNN model of drive end (QPSO-BPNN-DE), the QPSO-BPNN model of fan end (QPSO-BPNN-FE), and the ensemble classifier composed of the above three different QPSO-BPNN models based on ensemble learning (QPSO-BPNN-EL).

Fig. 13 presents the fault diagnosis accuracies achieved with different classifiers. Compared with QPSO-BPNN-BA, QPSO-BPNN-DE, and QPSO-BPNN-FE, the fault diagnosis accuracy achieved with QPSO-BPNN-EL is increased by 1.79%, 0.85%, and 0.67% respectively. The results show that the ensemble classifier can effectively improve the fault diagnosis accuracy.

FIGURE 13.

Fault diagnosis accuracies achieved with different classifiers.

To further demonstrate the effectiveness of the proposed combination strategy, a sample whose real state is the inner race fault but that is misdiagnosed by one basic classifier is selected. Table. 6 presents the diagnosis results of three basic classifiers and one ensemble classifier for the sample. As seen in Table. 6, the diagnosis results achieved with QPSO-BPNN-BA, QPSO-BPNN-DE, and QPSO-BPNN-FE are the inner race fault, inner race fault, and ball fault respectively, and the diagnosis result achieved with QPSO-BPNN-EL is the inner race fault. The results show that when the diagnosis results of different basic classifiers are inconsistent for a sample, the ensemble classifier can obtain the best diagnosis result according to the output results of each basic classifier.

TABLE 6 Diagnosis Results of Different Classifiers for a Sample

G. Comparison With Other Intelligent Optimization Algorithms

In order to better evaluate the optimization effect of QPSO algorithm, GA [34], APSO algorithm [38], and SPSO algorithm [39] are also adopted to optimize the initial weights and thresholds of BPNN. Spark-GPU-GA-BPNN, Spark-GPU-APSO-BPNN, Spark-GPU-SPSO-BPNN, and Spark-GPU-QPSO-BPNN are respectively used to perform model training on the cluster with 4 worker nodes for DataSet 3. During the training period, the max iterations of GA, APSO, SPSO, and QPSO are set to 50, and the other key parameter settings are as follows.

GA: The size of population is set to 100, the mutation probability is set to 0.2, and the crossover probability is set to 0.5.
APSO: The number of particles is set to 100 and two acceleration constants are set to 1.4945.
SPSO: The number of particles is set to 100, two acceleration constants are set to 2, the initial weight value is set to 0.9, and the final weight value is set to 0.4.
QPSO: See Section IV-B.

Table. 7 presents the fault diagnosis accuracies and model training time of different fault diagnosis methods. The fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 1.65%, 0.26% and 0.58% higher than that achieved with Spark-GPU-GA-BPNN, Spark-GPU-APSO-BPNN, and Spark-GPU-SPSO-BPNN, respectively. This is mainly because QPSO algorithm introduces the average best position of quantum particle swarm, which has better randomness and stronger global optimization ability than the other three algorithms when optimizing the initial weights and thresholds of BPNN.

TABLE 7 Comparison of Different Intelligent Optimization Algorithms

As seen in Table. 7, the model training speed of Spark-GPU-QPSO-BPNN is $1.73\times$ and $1.12\times$ as fast as that of Spark-GPU-GA-BPNN and Spark-GPU-APSO-BPNN, respectively. This is mainly because QPSO algorithm removes the velocity attribute of the particle swarm, which can greatly reduce the computational cost of optimizing the initial weights and thresholds of BPNN. In addition, the model training speed of Spark-GPU-QPSO-BPNN and Spark-GPU-SPSO-BPNN is very close. The main reason is that SPSO algorithm also can greatly reduce the computational cost by shrinking the search space.

H. Comparison With Other Fault Diagnosis Methods

To further verify the effectiveness of the proposed rolling bearing fault diagnosis method, the following four different methods are used for model training and fault diagnosis on the cluster with 4 worker nodes: AlexNet [49] implemented with Spark-GPU (Spark-GPU-AlexNet), VGG-19 [50] implemented with Spark-GPU (Spark-GPU-VGG-19), ResNet-18 [51] implemented with Spark-GPU (Spark-GPU-ResNet-18), and the proposed Spark-GPU-QPSO-BPNN. For Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18, the original vibration data are converted into $64\times 64$ pixel gray-scale images, and a gray-scale image data set with the same size as DataSet 3 is obtained. The data set is divided into training set and test set at the ratio of 8:2. For Spark-GPU-QPSO-BPNN, DataSet 3 is used as the experimental data set and divided into training set and test set at the ratio of 8:2. Moreover, Spark-GPU-QPSO-BPNN diagnoses all the data in DataSet 3, and the three deep learning methods diagnose all the data in the gray-scale image data set.

The network structures and hyper-parameter settings of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are listed in Table. 8. Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 contain 11, 25, and 23 neural network layers, respectively. The detailed network structures of AlexNet, VGG-19, and ResNet-18 can be found in [49], [50], and [51]. The batch sizes of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are set to 128, 128, and 64 respectively, which can achieve a better fault diagnosis accuracy within the limit of the available GPU memory. The learning rate of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are set to 0.008, 0.005, and 0.003 respectively, which can provide a better convergence performance and avoid fluctuations in model training. The momentum of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are all set to 0.9, which can increase the convergence speed. The number of epochs of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 are all set to 50, which can not only ensure a higher fault diagnosis accuracy, but also prevent the model training time from being too long.

TABLE 8 Network Structures and Hyper-Parameter Settings of Different Fault Diagnosis Methods Based on Deep Learning

Table. 9 presents the diagnosis accuracies, model training time, and fault diagnosis time of four different rolling bearing fault diagnosis methods based on Spark-GPU platform. The fault diagnosis accuracy achieved with Spark-GPU-QPSO-BPNN is 1.11%, 1.16%, and 1.19% lower than that achieved with Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18, respectively. However, the model training speed of Spark-GPU-QPSO-BPNN is $4.41\times$ , $56.70\times$ , and $17.85\times$ faster than that of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 respectively, and the fault diagnosis speed of Spark-GPU-QPSO-BPNN is $6.95\times$ , $27.97\times$ , and $10.91\times$ faster than that of Spark-GPU-AlexNet, Spark-GPU-VGG-19, and Spark-GPU-ResNet-18 respectively. Compared with AlexNet, VGG-19, and ResNet-18, QPSO-BPNN has a simpler network structure and fewer parameters, which can achieve higher model training efficiency and fault diagnosis efficiency. Therefore, the proposed rolling bearing fault diagnosis method not only can efficiently perform model training and fault diagnosis on massive rolling bearing vibration data, but also has good diagnosis accuracy.

TABLE 9 Comparison of Different Rolling Bearing Fault Diagnosis Methods Based on Spark-GPU Platform

SECTION V.

Conclusion

To perform fast and accurate rolling bearing fault diagnosis in the big data environment, a rolling bearing fault diagnosis method based on parallel QPSO-BPNN under Spark-GPU platform is proposed. According to the idea of data parallelism, the distributed parallelization of QPSO-BPNN model is effectively realized on Spark-GPU platform, which significantly improves the performance of model training and fault diagnosis under large-scale rolling bearing data sets. In the distributed parallel training of QPSO-BPNN model, the master node collects the local parameters of each worker node and updates the global parameters according to the weights, and the updated global parameters are synchronized to each worker node, which effectively improves the convergence speed of the model. The combination strategy based on ensemble learning is adopted, and the output results of QPSO-BPNN models respectively corresponding to the base end, drive end, and fan end of rolling bearing are combined according to the weighted voting method to obtain the best fault diagnosis result of a sample. The effectiveness of the proposed fault diagnosis method is verified through experiments. The results illustrate that the proposed method can not only make full use of the computing resources of a Spark-GPU platform to efficiently perform model training and fault diagnosis but also obtain a higher fault diagnosis accuracy for the massive rolling bearing vibration data.

In the actual industrial production environments, the sensors deployed on rolling bearing can continuously collect vibration data. Facing a large amount of rolling bearing vibration data collected in real time, the rapid and accurate online fault diagnosis can effectively ensure the safe operation of mechanical equipment and reduce the maintenance cost. In future work, an online fault diagnosis method of rolling bearing based on parallel QPSO-BPNN in the big data environment will be explored.

Usage

Select a Year

View as

Total usage sinceApr 2021:716

Year Total:60

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Scopus^®

Web
of Science^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

Rolling Bearing Fault Diagnosis Method Based on Parallel QPSO-BPNN Under Spark-GPU Platform

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

The Previous QPSO-BPNN Model

The Proposed Fault Diagnosis Method of Rolling Bearing

A. Process of Rolling Bearing Fault Diagnosis Based on Parallel QPSO-BPNN

B. Parallel Design of QPSO-BPNN Model Based on Spark-GPU Platform

1) Overall Distributed Parallel Design Scheme

2) Design of Parameter Update Strategy

Step 1.

Step 2.

Step 3.

C. Parallel Implementation of QPSO-BPNN Model Based on Spark-GPU Platform

Algorithm 1 Parallel Implementation of QPSO-BPNN Model Based on Spark-GPU Platform

Step 1.

Step 2.

Step 3.

Step 4.

Step 5.

Step 6.

Step 1.

Step 2.

Step 3.

Step 4.

D. Combination Strategy of Multiple QPSO-BPNN Models Based on Ensemble Learning

Experimental Results and Analysis

A. Experimental Setup

B. Analysis of Fault Diagnosis Accuracy

C. Performance Analysis of Model Training and Fault Diagnosis Under Different Size Data Sets

D. Performance Analysis of Model Training and Fault Diagnosis Under Different Size Clusters

E. Analysis of the Impact of QPSO on the Performance of Model Training

F. Analysis of Combination Effect of Multiple QPSO-BPNN Models

G. Comparison With Other Intelligent Optimization Algorithms

H. Comparison With Other Fault Diagnosis Methods

Conclusion

Authors

Figures

References

Citations

Keywords

Metrics

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?