A Novel Fault Detection Scheme Using Improved Inherent Multiscale Fuzzy Entropy With Partly Ensemble Local Characteristic-Scale Decomposition

At present, the multiscale fuzzy entropy has been verified to be an excellent measure of the complexity for dynamic time series. However, when using to short-time time series collected in practical application, the conventional multiscale fuzzy entropy may result in undefined or unreliable value. In this work, improved multiscale fuzzy entropy, named moving-average based multiscale fuzzy entropy (MA_MFE), is presented at first to potentially characterize the complexity of short-term time series. The MA_MFE algorithm can successfully produce more template vectors to overcome the problem of shortening the samples in the procedure of the existing approaches. The analysis experiments for both white noise signal and $1 / f$ noise signal are made and the results show MA_MFE method is more effective for the short-term datasets. Then, a novel fault detection scheme has been developed. After using non-local mean approach to reduce background noise, the non-stationary vibration signals are decomposed into several intrinsic scale components (ISCs) by a newly developed time-frequency signal analysis method – partly ensemble local characteristic-scale decomposition (PELCD); The ISCs with higher correlation coefficients are used to reconstruct into a new signal and the inherent MA_MFEs are extracted to quantify the complexity of the collected vibration signal. At last, the multiSVM and improved variable predictive model based class discrimination (VPMCD) are employed as small-sample classifiers to achieve fault detection. Two experiments have been conducted, which include both rolling bearing as vital component in rotating machinery and a piston pump as typical reciprocation machinery in hydraulic system. The comparison results show that the proposed fault detection scheme is more effective and reliable and suitable for real-time online fault detection.

always are short-term time series. As a result, it is still a huge challenge to detect the running states and fault types effectively and quickly [1]- [5].
The validity of a data-driven condition monitoring and fault detection system greatly depends on the fault features extracted and the performance of the classifier. This procedure mainly involves two key steps: feature extraction and class recognition [2], [4]- [6]. At present, various methods have been developed and applied to fault detection [2], [7]- [10]. In the stage of feature extraction, entropy-based measures have been widely used to characterize the complexity of time series from dynamical system. These commonly entropy-based complexity indexes mainly include sample entropy (SampEn) [11], fuzzy entropy (FuzzyEn) [12], [13], permutation entropy (PE) [14], [15]. However, the results of most traditional single-scale entropy methods are not always consistent with the complexity of real-world signals [16]. For example, the SampEn of white noise is higher than that of 1/f noise, but the latter has more complex structure [16]. Aiming at this issue, Costa et al proposed multiscale entropy (MSE) to represent the complexity of time series [16], [17]. In order to quantify the complexity at different time scales, the original time series are coarsegrained with time scales and then the multiscale fuzzy entropy for each coarse-grained time series is computed. In the multiscale technique, for the scale factor τ, coarse-graining procedure is achieved by calculating the arithmetic mean of τ neighboring values without overlapping. The original data with length N is divided into N/τ segments and obtain the coarse-grained time series with length N/τ , which is used to compute an entropy value with regard to scale factor τ . Integrated with the MSE conception, the multiscale fuzzy entropy (MFE) [18], [19] and the refined composite multiscale fuzzy entropy (RCMFE) were proposed and used successfully to analysis complex time series in different research fields [20], [21]. However, from the perspective of signal processing, the coarse-graining procedure averages the data within a window and the data is down-sampled by multiple time scales. The length of coarse-grained time series is shorten and may be not long enough to calculate an accurate value and obtain either undefined or unreliable values [22]. Hence, as far as the short-term vibration signals from dynamic mechanical system are concerned, it is more difficult to obtain accurate multiscale entropies using large scale factor. Targeting at this problem, Wu et al introduced a moving-average approach to complete the coarse-graining procedure so that large number of template vectors can be obtained to better MSE performance [22]. Inspired by this idea, Li et al introduced moving-average approach to improve multiscale fuzzy entropy for fault feature extraction [23].
At the same time, vibration signals from mechanical system are always non-stationary, including some superimposed trend oscillations which could influence the standard deviation of the estimated values. Hence, it is necessary to deal with the original signals using data-driven signal decomposition method before feature extraction. Empirical mode decomposition (EMD ) method is widely-used timefrequency analysis technique for non-stationary time series [24]. But, EMD has some shortcomings such as mode-mixing phenomenon, end-point effect [25] et.al. The complementary ensemble empirical mode decomposition (CEEMD) is an improved EMD method and can overcome mode-mixing problem to a certain degree [26]. However, the CEEMD needs large amount of calculation. In [23], the local mean decomposition ( LMD ) method was also served as an improved EMD method to extract multiscale fuzzy entropy for rolling bearing. However, the LMD method has also the problem of mode mixing, distorted components and large time-consuming [27]. Local characteristicscale decomposition (LCD) was proposed by Cheng et. al in [27]. Compared with EMD-based method, LCD shows much higher computation efficiency. However, mode-mixing and end-point effect [28]- [30] still exist. In the latest years, as an improved LCD method, partly ensemble local characteristic-scale decomposition (PELCD) has been developed and applied successfully for fault diagnosis in [31], [32]. Accordingly, in this paper, PELCD is employed as datadriven signal decomposition method to eliminate the superimposed trend oscillations. Firstly the original signal is decomposed into several intrinsic scale components (ISCs) by PELCD. Then, inherent ISCs are selected out using spearman correlation coefficients to reconstruct a new signal. Lastly, the MA_MFEs of the reconstructed signal, defined as inherent MA_MFEs, are extracted to evaluate the complexity of the vibration signal. Moreover, novel fault detection models are built. It is noted that though deep learning network and its improved techniques are proved to be promising intelligent identification approaches and have been widely applied in fault diagnosis field [4], [5], a large number of samples are generally required. In this paper, multiSVM and improved VPMCD [33] are used as small-sample multiclass recognition techniques to verify the proposed inherent MA_MFE method. At the end of this paper, the proposed scheme is applied not only to fault detection experiment for rolling bearing but also to multi-fault detection experiment for hydraulic piston pump.
The remaining part is organized as follows. In Section II, the algorithm of MA_MFE is developed and verified with two simulation signals. A novel fault detection scheme is presented In Section III, and applied in Section IV. The conclusions are given in Section V.

A. MULTISCALE FUZZY ENTROPY
Sample entropy and fuzzy entropy are two most prevalent approaches and have been commonly utilized to quantify the dynamical complexity of time series. As an improved sample entropy, fuzzy entropy is more reliable and less dependent on the parameters and data length [12], [13], [29]. The main conception is described as follows.
A time series is written as y = [y 1 , y 2 , · · · , y i , · · · , y N ] with the length N . Given embedding dimension m, a template vector is reconstructed as j=0 y i+j , i = 1, 2, · · · , N − m + 1 Then, define the Euclidean distance between the vectors y m i and y m j as where 0 ≤ k ≤ m − 1 and i = j In the conventional sample entropy algorithm, a match happens while d m ij ≤r; Accordingly, the similarity of the vectors y m i and y m j is defined by the Heaviside Function. But, in the fuzzy entropy, a family of exponential function is employed as the fuzzy function to calculate the similarity between the two vectors. That is, For the vector y m i , all the similarities of its neighboring vectors can be averaged as the following formula, And define a function as Repeat the above procedure, then we get ϕ m+1 (n, r) as The fuzzy entropy value of the original time series with the data length N can be estimated statistically as FuzzyEn (X, m, n, r) = − ln ϕ m+1 (n, r) ϕ m (n, r) In order to quantify the complexity in different time scales, the multiscale entropy (MSE) algorithm have been developed in the literature [16]. A coarse-graining procedure is used to extract the subsequence of the original time series in different time scales. To obtain the coarse-grained time series, the original data X = [x 1 , x 2 , · · · , x i , · · · , x N ] is separated into segments of length τ and the data points are averaged in each segment. Each element z τ j of the coarse-grained time series z τ is defined according to the formula as below.
The multiscale analysis was introduced to the fuzzy entropy to produce the conception of the multiscale fuzzy entropy (MFE) [19]. However, from the formula (8), it can be seen that the coarse-grained time series z τ is greatly shorten with the scale factor τ increasing. Therefore, for short-term time series, both the MSE and MFE could cause inaccurate and even undefined entropy values. It is worthwhile noting that a refined composite multiscale fuzzy entropy (RCMFE) has been developed as an improved multiscale fuzzy entropy algorithm recently. The RCMFE utilizes different start points to produce different coarse-grained time series in the same scale and obtain the final results by averaging the entropy values [20]. Please refer to the literatures [20], [21] for the details. The relevant research have shown that RCMFE is more reliable than MFE technique and has widely used in varying fields [12], [21], [34]. However, the shorten-data problem still exists unavoidably in the procedure. As a result, to address the issue of the MFE and RCMFE, a moving-average based multiscale fuzzy entropy (MA_MFE) algorithm is proposed as below.

B. MOVING-AVERAGE BASED MULTISCALE FUZZY ENTROPY
In this paragraph, moving-average based multiscale fuzzy entropy (MA_MFE) algorithm is proposed, including the following steps.
Step 1: Conduct coarse-graining procedure using movingaverage algorithm at a scale factor τ . A coarse-grained time series at a scale factor τ can be written as z τ = z τ j (j = 1, 2, · · · , N − τ + 1 , which can be constructed according to the following Equation (9). And the schematic illustration of the coarse-graining procedure based on moving-average is given in Fig. 1.
Step 2: For a defined scale factor τ , embedding dimension m and tolerant r,ϕ m τ (n, r) and ϕ m+1 τ (n, r)| are calculated for each of the moving-average time series Z τ according to Equation (5)-(6). Step 3: Thus the MA_MFE values can be obtained as From the description above, both the MFE and the RCMFE employ non-lapping windows to construct the template vectors, while the MA_MFE method uses lapping windows to acquire new coarse-graining time series. Thus, the template vectors used in the MA_MFE algorithm are much more than ones in the MFE and the RCMFE so that the yielding probability of undefined entropy would greatly decreased even in the case of big scale factors. In this paragraph, the MA_MFE will be compared with the MSE, the MFE and the RCMFE to verify its advantages. According to the definition of various multiscale entropy, their calculation are commonly related to the length N of a time series, embedding dimension m, similarity tolerance r and gradient n. Generally speaking, the bigger m is, the richer the information obtained from the time series would be. But, while adopting a bigger m, a bigger data length N would be required. Generally speaking, N = 10 m − 30 m . In order to be fair for comparison in this paper, the embedding dimension is set m = 2. Similarity tolerance r represents the boundary width of comparison window and mainly controls the similarity of template matching. Hence similarity tolerance r directly involves in the accessibility of templates matching and the accuracy of the statistical information in the calculating process. Too large r would make matching more difficult, but too small r could make the estimated value sensitive to noise and become inaccurate. In order to obtain better effect, it is usually set r = 0.15SD (SD is standard deviation of the analyzed time series). Regarding to gradient n, detailed information will be lost more as n rises up; therefore, a small n=2 is set. In the end, considering a fact that there is a limited data length N in the engineering practice, the maximum scale is set by τ max = 20. In summary, the parameters are set m = 2, r = 0.15SD, n = 2, τ max = 20.
The white noise and 1/f noise with different length N ranged from N = 500 to N = 3500 at the interval of 500 as two synthetic noise signals are represented to evaluate the advantages of the proposed MA_MFE method. In this simulation analysis, 200 independent noise samples are taken to compare four multiscale entropy-based feature extraction techniques-MSE, MFE, RCMFE and MA_MFE.
The entropy curves of white noise and 1/f noise are shown in Fig. 2. The MSE curve of white noise decreases with the scale factors increasing whereas those of 1/f noise remains constant [17], which can be seen in Fig. 2. As mentioned above, fuzzy entropy algorithm is the improvement of sample entropy, so it can be found that the same trend as MSE value is presented for the results of MFE, RCMFE and MA_MFE. In the meantime, the means and standard deviations of the two synthetic noise signals are given in Table 1 and 2, respectively. Moreover, whatever kind of noise signals is used, it can be observed that the MA_MFE has the smallest standard deviations (SDs), which indicates that its results are the most reliable. In addition, for 1/f noise, the MSE method causes undefined entropy values when the data length is smaller than 1500, whereas the MFE and the RCMFE can still achieve reliable entropy values, especially using the MA_MFE method. In the end column of Table 1, the SDs of mean values using the MA_MFE method are smaller in both cases, which shows the MA_MFE method is robust to the data length N. Therefore, the MA_MFE method is suitable for the short-term complex signals like 1/f noise signal.

C. PARTLY ENSEMBLE LOCAL CHARATERISTIC-SCALE DECOMPOSITION
In order to improve LCD method, following the main idea of CEEMD, the literatures [31], [32] proposed a new noise assisted data analysis method, termed as partly ensemble local characteristic-scale decomposition, whose procedure is summarized as follows.
Step 1: Add a pair of noises with opposite signs, whose amplitudes are α to a raw signal S(t). Generally, α(t) = 0.05kSD, k = 1, 2, 3, · · · , 8, SD is the standard deviate of the raw signal s(t) i = 1, 2, · · · , Ne indicates the number of noise pairs, n i (t) is the noise to add.
Step 2: Starting with j = 1, perform LCD decomposition on S + i (t) and S − i (t) respectively to obtain a series of I + ij (t) and I − ij (t) , and I j (t) can be obtained by integrating the average of the experiments Step 3: Calculate the permutation entropy ( PE ) value E pj of I j (t) according to the literature [14], and check if E pj > θ 0 (θ 0 is the threshold value of PE, generally , θ 0 is set to [0.4, 0.6]). If the answer is 'yes', then I j (t) is high frequency intermittent signal or noise, then, let j = j + 2 and return and repeat Step 2 until E pj < θ 0 .
Step 4: Separate the first (p − 1) components from the raw signal S(t) as Step 5: Perform LCD decomposition on R(t), then acquire N intrinsic scale components ( ISCs) and a residual component r(t), written as PELCD adds white noise in pair to reduce noise residue and ensure decomposition completeness. At the same time, it avoids unnecessary ensemble averaging by using PE to detect high frequency intermittent signal or noise in time.
It is noted that the key point of PELCD is to detect the high frequency intermittent signal or noise, which greatly influence the decomposition result. PE is used to check dynamic changes of the time series.The relevant theory regarding to PE can be referred in [32]. Once those high frequency intermittent signal or noise is eliminated, the normal LCD is applied to decomposition the rest signal. which will not only constrain mode-mixing to increase the analysis accuracy but also improve the computing speed.

D. PROPOSED FEATURE EXTRACTION METHOD
Although PELCD is excellent signal processing technique, some irrelevant ISCs are inevitably caused by background noise or decomposition itself. Therefore, it is essential to select the fault-relevant ISCs, which contain rich fault information. A selection method for relevant intrinsic mode functions (IMFs) using spearman correlation coefficient was proposed in the literature [35]. Recently, ZHENG et al proposed a quantified IMFs selection method based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and achieved a promising results [36]. However, CEEMDAN needs a large amount of computation cost. Inspired by those literatures, a self-adaptive ISCs selection technique is proposed in this paper. The details are described as below.
Step 1: The false ISCs are eliminated. The product of the energy density of mode component and its average period is constant for a white noise signal [25], [37]. Consequently, the product can be used to identify whether a ISC is a noise signal. The product of the energy density of the jth ISC and its average period should be computed as is the number of extra points including in the ISC and N is the length of the ISC. Then, a coefficient RP j can be obtained, shown as, If RP j ≥ 1, then the first ( j − 1) ISCs can be regarded as false components, whose products of the energy density of mode components and their average periods are constant. So they should be removed together with the residue r(t).
Step 2: Correlation coefficients between the rest ISCs and the original signal are calculated and The ISCs with high correlation coefficient are selected out. Correlation analysis is a powerful tool to measure the correlation degree between two time series. Considering that Spearman Correlation Coefficient is superior to the commonly-used Pearson correlation coefficient, it is used as a criterion to discriminate the ISCs, which carry more relevant information with the original signal. For two variables X = {x i } and Y = {y i } with the data length N , they can be converted into ranks g(x i ) and g(y i ) respectively, then the Spearman Correlation Coefficient is defined as [35]: where d i = g(x i ) − g(y i ). The greater the correlation coefficient is, the more relevant the two variables are. Hence, if the correlation coefficient of an ISC is high, then the ISC is highly relevant with the original signal. If the is low, then the ISCs is a false ISC, produced by signal decomposition procedure or by background noise. In the end, m ISCs with high correlation coefficient can be obtain.
Step 3: The selected m ISCs are reconstructed into a new signal, written as,S Step 4: In the end, the inherent MA_MFEs are extracted according to Equation (10) from the reconstructed signal as features prepared for intelligent recognition.

III. PROPOSED FAULT DETECTION SCHEME
The vibration signals generated by the local faults of machinery can be easily converted to electrical signal and collected by sensors with a large bandwidth and contain extremely rich fault information. The intrinsic oscillations included in the vibration signals will change under the working states, representing the non-linear and non-stationary characteristics and the energy distribution variety of the different frequency components in the spectrum. In the other words, the complexity of time series will greatly vary with different working states. The vibration signals contain unavoidably background noises and interference signals. The non-local means (NLM) denoising method has become an increasing popular approach in imagine denoising area due to its excellent performance for Gaussian additive and multiplicative noise [38], [39]. The NLM method can be also utilized to the vibration signals. Hence, the NLM technique is served as a preprocessing technique in this work. At the same time, the vibration signals include also superimposed trend oscillations which could influence the standard deviation of the estimated values. Therefore, following the NLM denoising, in order to eliminate the superimposed trend oscillations, the original vibration signal x(t) can be decomposed into several ISCs and a residue r(t) by the PELCD technique. Next, m ISCs with higher spearman correlation coefficients will be selected out to reconstruct into a new signal y(t) as Equation (18). The superimposed trend oscillations are eliminated after reconstructing and the new signal only contains the inherent ISCs. So the inherent MA_MFEs can be extracted from y(t) to valuate the complexity as fault features.
When a large scale factor is applied, the dimension of features is high. However, some features are not irrelevant for classes and slow down the computing speed or calculation accuracy. If these irrelevant features are removed and the information for effective class recognition is presented with fewer features, classifier would have less computational expense and higher accuracy. Therefore, it is vital to obtain the optimal features as input variables. Here, the ReliefF [42] method is employed to obtain the optimal features according to their importance values. These optimal features are used to train classifiers. Once the classifiers are achieved, the fault features from testing samples will be input into the trained classifier and produce output for detection results. The flowchart of the proposed fault detection scheme is shown in Fig. 3. In the following section, two fault diagnosis experiments are made. One is for rolling bearings, which is one of the most key components in rotating machines and its failure is one of the most frequent reasons for machine breakdown. The other is for a piston pump which is a typical reciprocating machinery and plays vital role in the hydraulic system of some large-scale equipments. Moreover, in the second experiment, multi-fault diagnosis will be completed

IV. APPLICATION TO FAULT DETECTION
A. EXPERIMENT 1 In this section, the experimental datasets shared by Case Western Reverse Bearing Data Center [40] are used to verify the fault detection scheme proposed in Section III. The single point faults were set to the driven-end bearings with fault diameters of 0.007 to 0.021 in. The sampling frequency is 12kHz. The datasets include inner race fault (IRF), ball fault (BF), outer race fault (ORF) and normal state. The motor loads is 2hp, the shaft rotation speed is f r = 1750rpm. Ten classes of vibration signals are obtained. In this paper, the datasets are divide into 55 segments as samples with the length N = 2048. The details of datasets are given in the literature [41] and listed in Table 3.
In order to show the necessity of denoising procedure envelop spectrum analysis is made for inner race fault. The left top figure of Fig. 4 shows that the vibration signal with inner race fault contains plenty of background noise, and the right top figure is denoised signal after applying the NLM approach. According to the theoretical calculation, the fault frequency for inner race is f i = 5.415f r [41], equaling to 158Hz. As shown in the bottom figures, though the fault frequency can be found in the left bottom figure, more than frequency components exist in the envelop spectrum graphic of the original signal and confuse the operator. Whereas, as seen in the right bottom figure, the fault frequency f i and 2f i can outstand more clearly in the envelop spectrum graphic of the denoised signal. Subsequently, each signal is decomposed into several ISCs and the correlation coefficients were  computed and the ISCs with the higher correlation coefficients are built into a new signal in which superimposed trend oscillations are eliminated. Lastly, the inherent MA_MFEs of the signals are extracted as fault features. According the length of dataset, the scale factor is set to maximum 20 here. In order to make a comparison, RCMFEs are computed at the same time. In this experiment, 550 samples (55 samples of each class) are used and the results of inherent MA_MFE and RCMFE under ten different states are demonstrated in Fig. 5, from which some conclusions can be obtained. First of all, the inherent MA_MFE values are almost monotonically decreasing with the scale factor growing, which is consistent with the definition of multiscale entropy and the nature of the vibration signals of rolling bearing. However, the RCMFE values do not meet the basic law. Hence, the inherent MA_MFE method has more physical sense than the RCMFE technique and be more suitable to analysis the dynamic characteristics of rolling bearings. Moreover, from the observation to the inherent MA_MFE curves, it can be found that the inherent MA_MFE values from normal state are much larger than those from the normal state so that the faulty states are easily distinguished out. This can be explained by the fact that the self-similarity of the vibration signals increase when the local faults occur, which cause the entropy decrease correspondingly. The main reason can be explained that the periodic impact characteristics generated by the faults in the vibration signals become more intense with the fault deepening and the self-similarity is more evident. The vibration signals present more regularity and smaller entropy values, which are consistent with the dynamics of rolling bearings.
In this experiment, there are 10 classes and 55 samples in each class as described in Table 3. These samples are randomly divided to 10 × 10 = 100 samples for training and 10 × 45 = 450 samples for test. After feature extraction by the MA_MFE method, initial fault features can be obtained. The training sample matrix is noted as Train 100×19 , and the test matrix as Test 450×19 . These features under different scale factors are able to discover some fault information in different views. However, some features are less relevant with the fault type and are likely to make computation time increase, even become inaccurate. Here, the ReliefF technique is employed to reach the optimal features to boost the performance of classifiers. For the training matrix Train 100×19 obtained above, the ReliefF computes the importance values of all features and sorts them from low to high. Seven features with higher importance values are facilitated as the optimal features. They were F 10 , F 11 ,F 9 , F 5 , F 12 , F 2 and F 1 (F i stands for the MA_MFE value under the scale factor τ = i), which are more sensitive to reflect the fault information from vibration signals. Now, an intelligent classifier is desired to achieve fault detection. Here, two kinds of class discrimination methods are employed. One is widely-used multiclass support vector machine (multiSVM) in the case of small samples. In this work, one-vs-all technique is utilized to extend the basic SVM with linear kernel to multiSVM. Besides, an improved variable predictive model based class discrimination (VPMCD) is served as the other multiclass technique [33]. Table 4 shows the results of intelligent fault detection and comparison.
From Table 4, the proposed multi-fault diagnosis model presents the best performance-the highest detection accuracy, the smallest standard deviate (Std) and the lowest time consumption. When using the MA_MFEs as input variables, both the multiSVM classifier and the improved VPMCD classifier have achieved 100% fault detection rate, more importantly, and the time cost is greatly short, which is vital for online fault detection. In comparison, the Std of both the RCMFE and the MFE are bigger, and the time costs are longer, especially when using the multiSVM classifier. On the other hand, from the comparison between multiSVM and VPMCD, the latter is proved faster and more robust, which indicates that the latter is more suitable to address the classification problem with small samples.

B. EXPERIMENT 2
In Experiment 2, the proposed scheme is applied to realize incipient abrasion detection for a piston pump. The experimental datasets are collected from axial piston pump A11VLO190 with nine pistons. The speed of driving shaft was 1600 RPM and the pressure of the main hydraulic circuit was kept at 10MPa. The accelerometer was installed to the axial piston pump with magnetic base to collect the vibration signal using NI9233 data acquisition card with a sampling frequency of 10 kHz. The axial piston pump was tested under six different conditions; They are noted as class 1 to class 6: class 1-normal condition (Normal), class 2-one piston abrasion (OPA), class 3-swash pate abrasion (SPA), class 4-valve plate abrasion (VPA), class 5-both swash plate and valve plate abrasion (BSVA), class 6-counter-position pistons abrasion (CPPA). There are three single fault types and two compound fault types. The vibration signals collected for different condition are given in Fig.6. In this experiment, we collected 55 raw vibration signals under each condition and 330 raw vibration signals were collected in all. The length of each vibration signal is N = 2500. The samples are divided into two groups, one group is used for training and the other for test.
Similarly, after the NLM denoising, the signal is decomposed into several ISCs by PELCD, partly shown in Fig. 7.   Then, the correlation coefficients between the ISCs and the denoised signal are computed. In the experiment, the maximum correlation coefficient is 0.75 and the threshold is set to one tenth of the maximum correlation coefficient, that is 0.075. The ISCs with higher correlation coefficient are used to reconstruct into a new signal, shown in Fig. 8, from which it can be found that superimposed trend oscillations are reduced. In the end, the fault features are acquired by the MA_MFE method with the maximum scale factor τ max = 20. Fig. 9 shows the MA_MFE values. the mul-tiSVM and the improved VPMCD are employed to make the multi-fault detection again. The first four features are selected out using the ReliefF technique, which are F 8 , F 13 , F 7 , F 10 , shown in Fig.10 as optimal features. The comparison  results are given in Table 5. The detection rates of both mul-tiSVM and VPMCD classifiers are 100%. Simultaneously, the cost time for VPMCD training and classification is 0.02s. Therefore, the results show that the proposed model is suitable for real-time online fault detection again.

V. CONCLUSION
Multiscale entropy is a powerful tool to analysis complex time series. However, it is difficult to characterize the short-term datasets using the existing algorithms. Hence, an improved multiscale fuzzy entropy MA_MFE is developed. In the technique, the moving-average algorithm is utilized to obtain the coarse-grains for short-term complex time series. The results from simulation analysis show that MA_MFE is more accurate and robust than the existing MFE and RCMFE techniques. Meantime, combining the new data-driven signal processing method -partly ensemble local characteristic-scale decomposition, inherent MA_MFE feature extraction method is proposed. Furthermore, a novel fault detection scheme has been developed for mechanical system. The comparison results on roller bearings show that the proposed method works more effectively, reliably and quickly. In addition, the proposed model can also be successfully employed to identify the incipient multiple abrasion of a piston pump, which shows that the proposed method can achieve incipient multi-fault diagnosis online even if only short-term samples and less training samples are collected. Our further work will focus on how to solve real-time online fault diagnosis for large-scale machinery under variable working conditions using the proposed method.