Engine Working State Recognition Based on Optimized Variational Mode Decomposition and Expectation Maximization Algorithm

Increasingly energy and environmental crises put forward higher request on diesel engine. It promotes the development of diesel engine, while the complexity of structure is much higher, which leads to higher probability of faults. In order to recognize the states of engine in harsh environments effectively, variational mode decomposition (VMD) and expectation maximization (EM) are introduced into this paper to analyze multi-channel vibration signals. To select the decomposition level of VMD adaptively, a novel power spectrum segmentation based on scale-space representation is proposed for the optimization of VMD and results show this approach can discriminate different frequency components in high noise circumstance accurately and efficiently. To improve the adaptability and accuracy of EM, a feature selection approach based on genetic algorithm (GA) is introduced to preprocess original data and a cross validation method is used for selecting cluster number adaptively. Combined with these approaches, a diesel engine state recognition scheme based on multi-channel vibration signals using optimized VMD and EM is proposed. Compared with existing method, this scheme shows great advantages in accuracy and efficiency, and could be applied in actual engineering.


I. INTRODUCTION
As a frequently used power source, diesel engines are being widely employed in industry, agriculture and some special industries because of the advantages of low fuel consumption and large output torque [1]. With increasingly energy and environmental crises, a large number of laws and regulations for diesel engine have been enacted in many countries, which greatly promotes the development of new technologies such as turbo charging, high-pressure common-rail fuel system, electronically controlled fuel injection system, variable valve timing (VVT) and exhaust gas recirculation (EGR) and so on [2]. While the new technologies improve the performance of diesel engine significantly, complex structure increases the The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wang . probability of failure. As one of the main parts in mechanical system, engine may lead serious security incidents under failure, so fault detection is an important research direction at present.
In order to ensure normal operation, fault detection system should recognize failures in early stage and issue a warning timely. So engine fault detection strategy develops from regular and disassembling diagnosis to real-time and non-disassembly diagnosis [3]. When developing the strategy, direct signal, such as rotational speed, temperature and pressure, is a traditional choice. For example, Xu et al used shaft instantaneous angular speed to extract characteristic parameters in order to detect misfire [4]. Although the direct signal is simple, it can hardly deal with other common faults such as mechanical failure and wear. To detecting various faults, some real-time acquiring indirect signal, such as noise VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ and vibration, is being a promising method, in which the vibration signal is a research focus because of easy measurement, low cost and strong robustness. For example, Jing et al analyzed vibration signals from cylinder head and detected valve clearance fault by it [5], Ftoutou et al used vibration signals to make research on fuel injection faults [6].
Owing to complex structure, so many vibration sources exist in engine that there is much noise in vibration signals. A number of signal processing algorithms have been researched to overcome this problem such as wavelet transform and empirical mode decomposition (EMD). Moosavian et al analyzed the performances of different mother wavelets and used wavelet transform to denoise vibration signals [7], Ma et al mixed wavelet transform and EMD to analyze vibration signals in order to diagnose abnormal combustion in engine [8], Li et al detected abnormal clearance between contacting components using EMD [9]. However, the wavelet transform used for signal decomposition is generally binary discrete wavelet transform (DWT), which just decomposes signals in Fourier spectrum mechanically. Moreover, wavelet basis has a great influence on decomposition while there is no definite conclusion on how to choose it. As an adaptable method, the EMD is built based on recursive model, which results in low robustness and mode mixing [10], [11]. In 2013, Gilles proposed empirical wavelet transform (EWT), which can build adaptable wavelet filter to extract components [12]. For the problem that meaningful modes should be pre-detected to provide necessary parameters for the wavelet filter, Gilles et al proposed a parameterless scale-space approach to overcome it [13], which improved the EWT and promoted its application in faults detection such as Wang et al used EWT analyzed the fault features of industrial bearing through vibration signals [14]. However, the problem of wavelet basis is still unresolved. Besides, the process of building wavelet filter is complex and the performance of filter is no guarantee of the best. In 2014, Dragomiretskiy et al proposed variational mode decomposition (VMD) based on variational model, which is an essentially adaptable Wiener filter bank and it is equating to the best filter under certain conditions [15]. The VMD has been used in faults detection for rotating machinery and showed great potential [16]. However, decomposition level in VMD should be determined manually, which increases the complexity and uncertainty. Fortunately, the parameterless scale-space approach also has ability to provide parameters for VMD. So the method is optimized to combine with VMD in order to build a better filter.
Although a lot of successes have been achieved in the diesel engine condition recognition and fault diagnosis based on time-frequency analysis, feature extraction and pattern recognition of single channel vibration signal, there are still some problems need solving. Firstly, in some harsh environments, unpredictable signal interference may lead to the misinterpretation of recognition rate. Secondly, every feature parameter has different sensitivity to different working states, which results in some features or feature combinations may not identify the working state effectively in certain case [17], [18]. For these reasons, multi-sensor information fusion technology is gradually attached the weight of researchers. Multi-sensors information resources can be fully and effectively utilized through data fusion method to gain the greatest resources of system under test in different view angles [19]. However, the increase of features has a negative impact, such as information redundancy and complex data processing on classification, and even reduce the recognition rate [20]. Accordingly, it is necessary to reduce the feature dimension by removing redundant and irrelevant features in order to improve classification efficiency and accuracy. Feature selection is a challenging task, especially for hundreds or thousands of features sub sets, in which genetic algorithm (GA) is a classic method to improve the searching efficiency. Stefano et al proposed an improved filter-based GA to search feature subsets with high discriminative power [21]. Mohammadi et al presented a methodology which can search and select a set of closure relationships for given experimental based on GA to minimize the error between measured and predicted pressure gradient [22]. Gangavarapu et al, used GA to optimize the subspace ensembling process of high-dimensional biomedical datasets feature subspaces [23]. The performance of GA highly depends on the selection of fitness function. However, various subsets of dataset may give best results with a different function [24].
Feature selection is a preprocessing technique for classification algorithm [25], [26]. X-means clustering algorithm is an extending K-means with efficient estimation of the number of clusters. It goes into action after each run of K-means, making local decision about which subset of current centroids should split themselves in order to better fit the data. The splitting decision is done by the Bayesian information criterion (BIC) [27], and the numbers of clusters are computed dynamically within preset lower and upper bound [28]. X-means algorithm has a good effect on processing low-dimensional dataset, while its performance on highdimensional dataset is unstable. Expectation maximization (EM) algorithm based on Gaussian mixture model has been being widely used due to its stability and simplicity [29]. Zhao et al proposed an improved EM algorithm for fault detection of air conditioning system [30]. Chen et al put forward an adaptive Gaussian mixture model to deal with the dynamic working process of rotating turbine engine disk, which improved the adaptability of fault detection model to turbine engine disk [31]. Tapana et al evaluated the performance of various clustering techniques and results showed that the EM clustering algorithm is the most effective and robust method for unlabeled data classification [32]. However, the conventional EM algorithm cannot choose the number of clusters adaptively and need to specify the number of clusters as an input parameter.
The paper is organized as follows: section 1 introduces the research background and significance, section 2 gives algorithms details, the experiment is described in section 3, the optimized VMD is proposed in section 4, analysis and contrast of experiment result by a novel approach based on EM are shown in section 5 and conclusion is given in section 6.

II. ALGORITHM THEORIES A. VMD THEORIES
The goal of VMD is to decompose input signal f into several modes, u k , i.e. intrinsic mode functions (IMFs), which are supposed to have specific sparsity properties and limited bandwidths. The IMFs are also assumed to be mostly compact around a center frequency, ω k [15].
Firstly, the u k is processed by Hilbert transform to get unilateral frequency spectrum: where p.v is Cauchy principal value, δ is Dirac distribution, * represents convolution and j 2 = −1. k ∈ [1, 2, . . . , K ], where K is the decomposition level in VMD, i.e. the number of IMFs.
Next, an estimated center frequency, e jω k t , is mixed to shift the unilateral frequency spectrum into baseband: Bandwidth can be computed by squared L 2 -norm of the gradient. Meanwhile, all of the IMFs can restructure the input signal f . Based on that, a constrained variational model can be obtained: where {u k } := {u 1 , u 2 , · · · , u K } and {ω k } := {ω 1 , ω 2 , · · · , ω K } are all IMFs and their center frequencies, respectively.
is the summation of all IMFs.
In order to solve the constrained variational model, Lagrangian multipliers, λ, and quadratic penalty α, are introduced to shift the model into unconstrained. λ and α are two classic ways to solve variational problem. The α is used to enforce constraints strictly and it can be ignored at low requirements for constraint. The α is used to balance the datafidelity constraint especially in high Gaussian noise condition and it is inversely proportional to the noise level in data based on Bayesian prior. The unconstrained variational model is: Based on Parseval / Plancherel Fourier isometry, it can be solved as: where n is the number of iterations, τ is time-step of the dual ascent and it is set as 0, i.e. taking no account of λ in this paper. Based on that, IMFs can be obtained by alternate direction method of multipliers (ADMM). The process of VMD is: x Initialize u 1 k , ω 1 k , λ 1 and n. y Compute u k , ω k and λ based on equation (5), (6) and (7) through ADMM.
z During the iteration in step y, results can be obtained when stopping condition k u

B. GENETIC ALGORITHM BASED FEATURES SELECTION
GA is a method to solve optimization inspired by biological processes of mutation, natural selection and genetic crossover, which is also a powerful feature selection tool, especially for feature set with large dimensions The basic genetic operators of GA are selection, crossover, and mutation [33].

1) SELECTION
Selection is based on individual fitness and influences its ability to reproduce into the next generation [34]. The probability of selecting individual h i is determined by (8): where F(h i ) is the fitness value of h i . The probability that an individual will be selected is proportional to its own fitness and is inversely proportional to the fitness of the other competing hypothesis in the current population.

2) CROSSOVER
Select two chromosomes than have high fitness values from the current population, exchange some bits of them and copy them into the new population. The location of these bits are random.

3) MUTATION
Select one chromosome that has a high fitness value from the current population, alter some bits of it and copy it into the new population [35]. VOLUME 8, 2020

C. EXPECTATION-MAXIMIZATION ALGORITHM FOR GAUSSIAN MIXTURES
Given a Gaussian mixture model, the goal is to maximize the likelihood function with respect to the parameters comprising the means and covariances of the components and the mixing coefficients. The main processes are shown as follows.
x Initialize means µ k , covariances k and mixing coefficients π k , and evaluate the initial value of log likelihood. y E step: Evaluate responsibilities based on current parameter values.
z M step: Re-estimate the parameters based on current responsibilities { Evaluate the log likelihood: Check for the convergence of parameters or the likelihood, and return to the Step y if the convergence criterion is not satisfied [36].
EM assigns probability distribution belonging to certain cluster for each instance.

III. EXPERIMENT
In order to research the approach of engine faults detection, author team built experiment system and designed experiment project. The experiment system consists of vibration testing and engine bench, as shown in Fig. 1. The engine bench system includes experimental engine and electrical dynamometer. The electrical dynamometer is manufactured by AVL List GmbH. The experimental engine is a turbo charged in-line 6-cylinder diesel engine and its parameters is listed in Table. 1.
The engine faults were designed and simulated on this bench test. Statistically, injection mechanism (27.0%), water leakage (17.3%) and valve and its seat are the most frequent faults in diesel engine [3]. Among these, water leakage generally leads to high water temperature, which can be easily   identified by instrumentations. So, the rest two faults are the main research objects in this paper. The inappropriate valve clearance is mainly manifested in valve and its seat fault caused by assembly error or abrasion, for which big and small valve clearances are researched. The injection mechanism fault is complex, and it is divided into injection timing and rail pressure in this paper.
Vibration testing system includes acceleration sensors, data acquisition and processing system. During engine operation, cylinder explosion pressure and valve bounce have a direct impact on cylinder head, so the acceleration sensors are placed on cylinder head cover to collected vibration signals with high signal-to-noise ratio (SNR) and rich information, as shown in Fig. 2. Main parameters of testing equipment are listed in Table 2.
Considering that the characteristic frequency of engine fault is below 10 kHz, sampling frequency is set as f s = 25.6kHz. The experiment simulates the failure of valve gap, different rail pressure parameters and different injection timing. Speed conditions are selected as 1600r/min and 2100r/min, load condition and other operating parameters setting are shown in Table 3.  Where the valve clearance '(0.4, 0.5)' means that the inlet valve clearance is 0.4 mm, the exhaust valve clearance is 0.5 mm, and so does the rest; tag 'N' means normal operating state parameter.

IV. SIGNAL DECOMPOSITION APPROACH BASED ON OPTIMIZED VMD
The decomposition level K in VMD should selected manually, which increases complexity and subjectivity. An unsuitable K could results in over or under decomposition and influences the accurate of subsequent analysis. To solve this problem, spectrum segmentation is introduced to optimized VMD.

A. OPTIMIZED VMD BASED ON FOURIER SPECTRUM SEGMENTATION
Gilles et al proposed a spectrum segmentation approach to detect meaningful modes for EWT [13]. VMD is a better filter and the selection of decomposition level K is essentially detecting meaningful modes. The spectrum segmentation approach is optimized to build VMD filter in this section.
Suppose the Fourier spectrum function of where * represents convolution. The number of minima with respect to x of L(x, t) is a decreasing function of the scale parameter t, and no new minima appear as t increases. Every minima corresponds to a curve in scale-space, and the length of curve is L i . When decomposing signal with VMD, selecting decomposition level can be transformed into finding boundaries delimiting consecutive modes that is finding local minima in Fourier spectrum. Based on scale-space representation, this problem is equivalent to find curves which meet a certain threshold.
The threshold is selected by the principle of histogram. Firstly, sort the curves of local minima in ascending order by their length and obtain L sort = {L 1 , L 2 , · · · , L M }, where M is the number of local minima. Next, suppose where n is a positive integer less than L M . Finally, decrease the n one by one until the proportion of L i ∈ [L threshold , L M ] to L sort is just less than 1/M , and the L threshold is taken as final threshold. Suppose the number of boundaries is N , the decomposition level of VMD is K = N + 1 through considering of frequency range (i.e. [0, fs/2]). Unlike EWT, VMD needs center frequencies, instead of the boundaries, to build filter. The frequency at peak value of every segment in Fourier spectrum is take as the center frequency, and the K center frequencies can be obtained: So the adjustment added to step x in VMD algorithm is: Analyze input signal by spectrum segmentation approach, set the decomposition level K , initialize the center frequency The rest processes remain unchanged and the desired VMD result can be obtained.

B. VERIFICATION WITH SIMULATED SIGNAL
To verify the advantages of the optimized VMD, a simulated signal is selected to be decomposed by the VMD and the EWT. The simulated signal is shown as equation (16).
where t ∈ [0, 0.01], f 1 = 6500, f 2 = 3000, f 3 = 10000, η, i.e. s 4 , represents Gauss white noise of 25dBw. Sampling frequency is set as 25.6 kHz. Based on that, the simulated signal and its Fourier spectrum are plotted as Fig. 3, in which the s 1 , s 2 and s 3 simulate periodic impact component, low frequency sinusoidal component and high frequency sinusoidal component,respectively.
Based on the scale-space segmentation approach, the Fourier spectrum is segmented as Fig. 4. Special to note is that the amplitude of Fourier spectrum is absolute value of single-sided fast Fourier transform (FFT) instead of the real value. The real value is calculated for the absolute value by linear transformation, which means analyzing the absolute value directly is a more efficient way.
As shown in Fig. 4, the Fourier spectrum segmentation approach has ability to divide components with different frequencies. Based on the segmentation result, VMD and EWT   filters can be built and their decomposition results are shown as Fig. 5 and Fig. 6, respectively. In the VMD decomposition, the quadratic penalty is selected as α = 2200 by data analysis.
As shown in Fig. 5 and Fig. 6, VMD obtains better result compared with EWT. For the result of VMD, the IMFs are near to desired narrow-bandwidth original signals. For the results of EWT, IMF1 and IMF2 contain much noise  compared with the s 2 and s 3 respectively, which are sinusoidal signals with single frequency (as shown in Fig. 3). To explain the advantages of VMD, traditional decomposing algorithms, ensemble empirical mode decomposition (EEMD) and local mean decomposition (LMD) are employed to decompose the simulated signal and the results are shown in Fig. 7 and Fig. 8. There are 9 IMFs in the result of EEMD and 7 IMFs in the results of LMD, and the first 5 IMFs in each result are shown for the rest ones are obvious illusive components.
As shown in Fig. 7 and Fig. 8, neither EEMD nor LMD could decompose high frequency components completely, which means s 1 and s 3 cannot be distinguished. The bandwidths of IMF1 and IMF2 are so large that much noise exists in them, which effects the recognition of faults badly. The reason for this shortcoming is that EEMD and LMD are built based on recursive model and have low robustness.
Because of low robustness, EWT, EEMD and LMD have no advantages in decomposing signals with complex frequency and high noise, such as engine vibration signals.  In order to verify it, the correlation coefficients between results and original components are computed as Table 4.
As shown in Table 4, the correlation coefficients between VMD results and original components are around 0.95. As for EWT, the correlation coefficients for s 2 and s 3 are 0.78 and 0.89 respectively, which are unacceptable values. Although the correlation coefficient between s 1 and IMF2 is 0.97, the result in VMD, i.e. 0.94, has very little influence on recognition. The result also verifies the VMD has better performance in filtering noise. As for EEMD and LMD, they cannot extract high frequency sinusoidal component (i.e. s 3 ). Besides, although the correlation coefficients of s 1 are high, the frequency domains show that the high frequency sinusoidal component (i.e. s 3 ) still exists (its energy is low so that has little effect on the computation of correlation coefficients). The correlation coefficients of s 2 are 0.66 and 0.73 respectively, which are much lower than 0.94. In conclusion, the optimized VMD based Fourier spectrum segmentation is a realistic and accurate approach.

C. OPTIMIZED VMD BASED ON POWER SPECTRUM SEGMENTATION
As shown in last section, the Fourier spectrum segmentation can provide accurate and stable parameters for building VMD in order to reduce subjectivity and complexity. However, with in-depth research, author team finds this approach is struggling in dividing approximating frequencies. The f 2 simulated  signal shown in equation (16) is shifted from 3000 to 4000 in this section, and the new simulated signal is also divided by Fourier spectrum segmentation. As shown in Fig. 9, only one boundary at 8.93 kHz is obtained, which divides the Fourier spectrum into 2 components.
The reason for that is noise and the approximating frequencies reduce the salience of characteristic components. In order to overcome this problem, a spectrum with similar waveform of Fourier spectrum and salient characteristics should be researched. Based on that, power spectrum is used to divide signal. The power spectrum computed as follows: Firstly, remove the mean of input signal s: where s mean is the mean of s. Next, compute the Fourier spectrum of s in , and the unilateral Fourier spectrum is F(s in ) = {f 1 , f 2 , · · · f m−1 , f m }. So the power spectrum (PS) is: where f n is the adjoint of f n , n is the length of input signal s. Finally, divide the input signal in scale-space by the power spectrum.
As shown in the description, power spectrum can remain the similar waveform as well as make characteristics stand out. In order to verify this approach, the new simulated signal with f 2 = 4000 is divided by it and shown in Fig. 10.  As shown in Fig. 10, the power spectrum can reduce the influence of noise on segmentation. Two desired boundaries of 4.48 kHz and 8.47 kHz are obtained, which proves this approach is better than Fourier spectrum segmentation.
To verify the applicability of power spectrum segmentation, a measured signal collected from Y-direction (the horizontal which is perpendicular to crankshaft) near the 3rd cylinder at 2100r/min and 100% load is used to test its performance. The duration of the measured signal is an engine cycle and the time domain is shown in Fig. 11.
For this signal, the segmentation result of Fourier spectrum is shown as Fig. 12, 6 boundaries of 1.43 kHz, 3.68 kHz, 4.51 kHz, 7.04 kHz, 9.22 kHz and 11.57 kHz are obtained.
As shown in Fig. 12, 6 boundaries divide the signal into 7 intervals. Among these, the 4th (between 4.51 kHz and 7.04 kHz) and 5th (between 7.04 kHz and 9.22 kHz) intervals is complex. For the 4th interval, it can be identified as a combustion component. According to the simulated signal, periodic impact has wide bandwidth and several extremely close peaks in spectrum, which means this interval is divided appropriately. However, for the 5th interval, there are 2 subsections in it obviously (the segmentation point is near 8 kHz), whose characteristics are different from periodic impact. So the power spectrum is used to divide the signal, and result is shown in Fig. 13.
As shown in Fig. 13, characteristic frequencies stand out significantly, and 7 boundaries of 1.   2 subsections and get more clear segmentation. The result shows the power spectrum has better performance in spectrum analysis and robustness than Fourier spectrum. Based on that, the VMD optimized by power spectrum is used to decompose the measured signal, and the result is shown as Fig. 14.
As shown in Fig. 14, the frequencies of IMFs correspond to the power spectrum segmentation accurately, and ideal narrow bandwidth components with little noise are obtained. Besides, all the IMFs cover the analysis frequency as well as no overlap between them. To explain the advantages of VMD optimized by power spectrum segmentation further, the signal is also decomposed by the VMD optimized by Fourier spectrum segmentation and result is shown in Fig. 15.
As shown in Fig. 15, the component between 8 kHz and 10 kHz is not extracted by the VMD optimized by Fourier spectrum, which means this algorithm has low ability in decomposing components with similar frequencies and may loss important information during decomposition. Overall, the VMD optimized by power spectrum segmentation is an accurate and advanced approach.

V. EXPERIMENTAL ANALYSIS BASED ON OPTIMIZED EXPECTAION-MAXIMIZATION A. DATA PREPARATION
The selected vibration signals are collected synchronously from three sensors, two of which are located on the wall of the first and the third cylinders and another one located on the cylinder head cover corresponding to the first cylinder.
The vibration data of each channel is divided into certain length depends on the diesel engine rotating speed and sampling rate. In order to include complete working cycle of diesel engine. The length of segment can be expressed as (19). L = 60 * fs * 2/r (19) where L is the length of segment, fs is sampling frequency, r is rotating speed, the sign '' '' represents round up operator. In order to obtain effective feature subset, original data is decomposed by optimized VMD and the characteristic components are restructured. Nine features are exacted from the restructured signals, the features are shown as follows: Considering that signals from 3 channels are analyzed, there are 27 features exacted, where Ch1, Ch2 and Ch3 indicate the first, second and third channels respectively.
The total of 27 features exacted form the signals of 3 channels are shown as follows:

B. OPTIMIZED EXPECTATION-MAXIMATION ALGORITHM
To improve the adaptability and accuracy of EM, a fitness function of correlation-based feature selection (CFS) approach for GA is introduced to preprocess original data and a cross validation method is used for EM algorithm to select cluster number adaptively.

1) FITNESS FUNCTION OF CORRELATION-BASED FEATURE SELECTION ALGORITHM
The fitness function is a criterion used by attribute subset valuator to measure the worth of an attribute subset. In this article, the CFS evaluator is used as fitness function for GA [33], the process is shown as Fig. 16.
CFS is a filter algorithm that evaluates feature subsets according to correlation based heuristic evaluation function. The bias of the evaluation function is toward feature subsets, which is highly correlated with the class and uncorrelated with each other. The acceptance of a feature will depend on the extent to which it predicts classes in areas of the instance space not already predicted by other features [37]. CFS's feature subset evaluation function is shown as (20): where M S is the worth of feature subset S containing k features, r cf is the mean feature-class correlation (f ∈ S), and r ff is the average feature-feature inter correlation. The numerator can be thought as providing an indication to predict the class of feature set, and the denominator indicates the redundancy among features. Equation (20) is the core of CFS and imposes a ranking on feature subsets in the search space of all possible feature subsets.

2) ESTIMATING THE NUMBER OF CLUSTERS USING CROSS-VALIDATION
EM clustering method needs specifying the number of clusters as an input parameter, but in some cases this value is unknown. Many solutions have been proposed to choose the number of clusters automatically, most of them rely on modeling assumptions, but main difficulty in choosing the numbers of clusters is that clustering is fundamentally an ''unsupervised'' learning problem, meaning that there is no obvious way to use ''prediction ability'' to drive the model selection [38]. Cross-validation (CV) method is a data-driven approach to estimate the number of clusters, and it is adaptive to the characteristics of data distribution [39]. Specially, in this article, 10-fold cross validation method was adopted, and the details are shown in following steps: x. The number of clusters is set as 1 y. The training set is split randomly into 10 folds. z. EM is performed 10 times using the 10 folds the usual CV way.
{. The log-likelihood is averaged over all 10 results. |. If the log-likelihood increases, the number of clusters is increased by 1 and the program continues at step y.
The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case, the number of folds is set equal to the number of instances [40].

C. EXPERIMENTAL ANALYSIS
Experiments were carried out under the condition of different rail pressures, injection timings and valve clearances, respectively (for detailed parameter settings see TABLE  3). 40 samples at 1600r/min in different working state are selected respectively, and the optimized VMD algorithm was implemented to process the vibration signal acquired synchronously from three vibration sensors at first, then 27 features are extracted from them and preprocessed with feature selection method. Finally, the different working states are classified by EM. The detailed flowchart of data processing is shown as Fig 17. As shown in Fig 17, the power spectrum and scalespace representation are employed to optimize VMD in order to decompose signals adaptively. Then, the features are extracted from the reconstructed signals. These complex features lead to high computational cost and even reduce the recognition rate, which impels the using of feature selection algorithm. The CFS is selected and GA is used to optimize its performance. After all of these processes, EM optimized by cross validation is used to classify the different working states of engine, in which the cross validation is a method to make the EM selecting the number of clusters adaptively.
To verify the advantages of this scheme, two different unsupervised and adaptive clustering methods, X-means and EM with cross validation, are used to cluster the samples with and without feature selection processing, respectively. For clearly comparison, the classification accuracy is defined as (21): (21) where N accuracy is the number of correctly clustered samples; N original is the number of original samples in a certain class. In this experiment, the value of N original is 40.
In this paper, the result of cluster is shown in twodimensional chart and the classification accuracy is calculated if the number of clusters is correct, otherwise only the result of cluster is shown in two-dimensional chart. In particularly, only MS_Ch1 is shown in charts for limited space because it is a feature exits in every example. The results were shown as follows.

1) CLUSTERING RESULT OF DIFFERENT INJECTION TIMING
An optimal subset selected from the original features is shown blow, the number of selected features is 15: Four kinds of injection timing parameters are designed in the experiment, so the correct number of clusters is 4.  There are 40 samples for each working state. Experimental clustering result is shown as follows: As shown in Fig. 18 and Fig. 19, no matter feature selection is carried out or not, X-means cannot get the correct number of clusters. In Fig. 20, EM algorithm also cannot gain the right number of clusters without feature selection. In Fig. 21, after preprocessing of feature selection, the desired cluster result is obtained by EM algorithm.

2) CLUSTERING RESULT OF DIFFERENT RAIL PRESSURES
An optimal subset selected from the original features is shown blow, the number of selected features is 13: Four kinds of rail pressure parameters are designed in the experiment, and the correct number of clusters is 4. There are 40 samples for each working state. Experimental clustering result is shown as Fig. 22 -Fig. 25.
As shown in Fig. 22 and Fig. 24, without preprocessing of feature selection, both X-means and EM algorithm can get the correct number of clusters and the average classi-   fication accuracy of each algorithm is 73.75%. In Fig. 23, X-means cannot gain correct number of clusters with featureselection preprocessing. In Fig. 25, with feature-selection preprocessing, EM algorithm achieves better result compared with Fig. 22 and Fig. 24, and the average classification accuracy is 85.63%.

3) CLUSTERING RESULT OF DIFFERENT VALVE CLEARANCES
An optimal subset selected from the original features is shown blow, the number of selected features is 9:   The experiment design three group of different valve clearances, and the correct number of clusters is 3. There are 40 samples for each working state. Experimental clustering result is shown as follows: As shown in Fig. 26 and Fig. 27, no matter feature selection is carried out or not, X-means cannot distinguish three different states of valve clearance. In Fig. 28 and Fig. 29, three different states are correctly classified by EM algorithm without or with feature selection preprocessing.
It can be deduced from the results that: the EM clustering method with features selection achieves the accurate number of clusters and attains the average classification accuracy of 89.38% for different injection timing parameters. For different rail pressure parameters, X-means and EM both can achieve correct cluster number without features selection, and also achieve the same average classification accuracy of 73.50%. On the contrary, only EM method can get the correct number of clusters after features selection and attains high average classification accuracy of 85.63%. For different valve clearances, X-means method cannot get the right number of clusters no matter with or without features selection. However, EM achieve both correct number of clusters and high average classification accuracy of 97.50%. In summary, the analysis of three types of parameters adjustment experiments in diesel engine validates the EM based on feature selection optimized by genetic algorithm and cross validation has the ability to obtain cluster numbers adaptively and is effective for the classification of different diesel engine working states.

VI. CONCLUSION AND OUTLOOK
In order to recognize states of diesel engine, a novel optimization approach based VMD and EM using multi-channel feature fusion algorithm is proposed: (1) Segmentation based on scale-space representation is introduce to analyze the frequency components of original vibration signals. Considering low robustness of Fourier spectrum segmentation, power spectrum is used to discriminate frequency components because of its similarity to Fourier spectrum and good performance in noise suppression. Based on that, the decomposition level of VMD can be selected adaptively, which could provide accurate time-frequency analysis results for subsequent recognition.
(2) After extracting features from multi-channel vibration signals, the GA and CFS fitness function are used for features selection to reduce dimensions of data. The cross validation is introduced to optimized EM algorithm in order to cluster feature subset adaptively.
(3) A diesel engine state recognition scheme based on optimized VMD, feature selection and optimized EM using multi-channel vibration signal is proposed. To prove the advantages of it, classic X-means and the novel scheme are used to analyze different injection timing, rail pressure and valve clearance states. Comparison shows the proposed method has better stability and classification accuracy, and it has certain engineering and theoretical significance.
However, there are some further works need to do: (1) Quadratic penalty is another important parameter in VMD, which has a big effect on decomposition result, and it is selected as α = 2200 by data analysis in this paper. Although obtaining acceptable results, this parameter should be researched further in future work.
(2) There are three main feature selection methods: filter method, wrapper method and embedded method. The feature selection based on GA and CFS fitness function in this paper is one of the filter methods, whose advantage is high efficiency. The shortage is that it is independent of classifier, which results in the selected feature subset maybe not the best. Other feature selection methods will be researched in succeeding work.
XIAOBO BI received the master's degree in vehicle engineering from the Hebei University of Technology, Tianjin, China. He is currently pursuing the Ph.D. degree in power machine and engineering with the State Key Laboratory of Engines, Tianjin University, Tianjin. He is also a Senior Lecturer with the Hebei University of Technology. His research interests include multiinformation fusion fault diagnosis of ICE and NVH control of vehicle.
JIANSHENG LIN is currently a Ph.D. Supervisor and a Professor with the Tianjin University of Technology. His current research interests include multibody dynamics simulation and analysis and internal combustion engine working process research.
FENGRONG BI received the Ph.D. degree in power machinery and engineering from Tianjin University, Tianjin, China. He is currently a Ph.D. Supervisor and a Professor with Tianjin University. His current research interests include the control of vibration and noise and the fault diagnosis of vehicles and power machineries, with specific investigation on excitation mechanism, transfer characteristics, assessment, and control methods of NVH.