Impulse Feature Extraction of Bearing Faults Based on Convolutive Nonnegative Matrix Factorization

The detection of bearing faults is of great significance for the stable operation of rotating machinery. Apart from detection by analysing the vibration response, another powerful strategy is to extract the periodic impulse excitation directly induced by faults, which efficiently eliminates the influence of the transmission path and noise. Typical methods of this strategy include maximum correlated kurtosis deconvolution (MCKD) and multipoint optimal minimum entropy deconvolution (MOMEDA). However, these deconvolution methods based on maximizing a certain measurement index are still insufficient at finding the correct fault period directly because of the interference of noise components. To effectively extract the periodic impulse excitation from the vibration response, a new impulse feature extraction method from the vibration spectrogram based on convex hull convolutive nonnegative matrix factorization (CH-CNMF) is proposed. As the spectrogram intuitively reveals the time and frequency information of the impulse response generated by the fault excitation, according to the decomposition characteristics of CH-CNMF, the time-frequency structure of the impulse response is represented by the basis tensor, while the weight matrix corresponds to the impulse excitation. Meanwhile, autocorrelation is adopted to enhance the periodic impulse excitation. Finally, based on power spectral entropy and first-order correlated kurtosis, the optimal periodic pulse can be selected from the autocorrelation curves of the weight matrix. Both numerical simulation and experimental verifications on bearings indicate that the proposed method can eliminate the influence of random shock excitations and directly attain the periodic impulse for the source of the bearing fault, and that its extraction effectiveness outperforms MOMEDA.


I. INTRODUCTION
The rolling bearing is one of the most widely used universal parts in various rotating machines. When there is local damage on the surface of a bearing element, it will strike the surface of other elements during rotary motions, generating an impulse force. This impulse will arouse the high-frequency natural vibration of the components of the bearing system, resulting in a periodic impulse response due to the bearing The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo . rotation. Therefore, the period of the impulse excitation corresponds to the frequency of the fault feature. Extracting this period of impulse excitation is a key component of bearing fault diagnosis. Specifically, the impulse of incipient faults is relatively weak and often buried in heavy background noise. Moreover, an abnormal interference component might mislead the classic response band optimization techniques in the wrong direction, ignoring the information related to the actual fault [1]. Methods such as spectrum analysis [2], [3], cepstral analysis [4], [5], envelope analysis [6], [7], and timefrequency analysis [8], [9] usually focus on finding the period VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of the attenuation response, which is often obscured by the transmission path, noise and other factors. Essentially, the collected vibration signal is a convolution of the impulse excitation and the transfer function, so some scholars employ deconvolution as an effective way to extract the impulse excitation. Most of the existing deconvolution technologies are based on a certain measurement index, taking the maximum index as the optimization target and constructing an inverse filter to eliminate the convolution modulation, thus attaining impulse excitation. Endo et al. [10] combined the minimum entropy deconvolution technique (MED) [11] with an autoregressive model, and proposed the autoregressive minimum entropy deconvolution (ARMED) technique. Sawalhi et al. [3] extended ARMED to the identification of rolling bearing faults in combination with spectral kurtosis. However, the deconvoluted feature information appears as a single large-value impulse or irregular discrete impulse, which is a serious flaw in the judgement of the fault characteristics. Since then, McDonld et al. [12] proposed the maximum correlated kurtosis deconvolution (MCKD) technique to compensate for the shortcomings by absorbing the periodic pattern of the signal after deconvolution. However, the MCKD method is dependent on the period information of the impulse source being estimated in advance, and it often requires resetting the sampling manually according to the sampling process. Inspired by the MED technique, Miao et al. [13] proposed the sparse maximum harmonicsto-noise-ratio deconvolution technique, but this method still requires accurate input parameters to achieve an optimal result. To overcome the shortcomings of these methods, McDonald and Zhao [14] developed the multipoint optimal minimum entropy deconvolution (MOMEDA) technique that does not require the periodic information of the fault source. All of the above deconvolution algorithms, which use the statistical index as the objective function to optimize the solution, often determine an incorrect fault period because of noise interference.
Since the time-frequency distribution (TFD) can simultaneously exhibit the time and frequency distribution of the vibration response, it provides a favourable condition for the deconvolution of the impulse response. Although Wodecki et al. [15] and Liang et al. [16] both applied nonnegative matrix factorization (NMF) on TFD to extract fault features, the decomposed basis matrix does not consider the time-varying characteristic of the signal, and it is difficult to extract the impulse excitation of the fault source. However, in speech signal processing, Smaragdis [17] used a convolution model to further extend the NMF to a convolutive NMF (CNMF) and employed it to extract the local time-frequency basis. O'grady and Pearlmutter [18] also applied CNMF to audio spectra, successfully forming an over-complete basis for the speech signal and achieving supervised monophonic speech separation. Smaragdis and O'grady used the CNMF model to extract the variable frequency components of speech signals based on the deconvolution characteristic of CNMF. By virtue of the deconvolution process of the signal, the CNMF model can integrate the time information into the basis matrix. Thus, when applying CNMF on the TFD of a vibration response, the decomposed time-frequency bases represent the impulse response, and the weight vector is the impulse excitation of the fault source, which corresponds to the time location of each impulse response. Therefore, CNMF provides an effective solution to deconvolute the impulse source signal directly from the observed signal, and the susceptibility of the kurtosis statistical indicator to irregular impulses can be effectively eliminated.
Based on the deconvolution characteristics of CNMF, this paper proposes an impulse feature extraction method for bearing signals with localized faults: after decomposing the TFD of the signal by convex-hull convolutive NMF (CH-CNMF), the obtained basis tensor reflects the time-frequency bases of the impulse response, and the weight matrix contains the location of the corresponding impulse excitation. Autocorrelation processing is also implemented on the weight matrix to enhance the periodic components within the signal. Considering the periodicity of the impulse excitation caused by failures, the optimal impulse excitation can be selected by combining the power spectral entropy (PSE) and the firstorder correlated kurtosis (CK 1 ). The major innovation of this paper is the proposal of a new deconvolution strategy that directly extracts the time-frequency structure of the impulse from the TFD, without constructing an inverse filter for deconvolution based on a single index. This method fundamentally addresses the defect that a filter based on a single index is susceptible to interference, which enables obtaining more accurate results.
The rest of this paper is organized as follows. Section 2 gives a brief description of NMF and the characteristics of performing NMF on TFD, and then describes the CNMF algorithm and outlines the characteristics of applying CH-CNMF to deconvolute the impulse excitation. Section 3 introduces the basic principle and procedure of the proposed method. Section 4 verifies the effectiveness using a simulated signal and actual fault bearing data. Conclusions are drawn in Section 5.

A. NMF AND DECOMPOSITION ON TFD
The conventional basic nonnegative matrix factorization [19] decomposes a given nonnegative matrix V = v 1 v 2 · · · v n ∈ R m×n into the product of two nonnegative matrices W = w 1 w 2 · · · w r ∈ R m×r and H = h 1 h 2 · · · h r T ∈ R r×n , such that the decomposition is represented as: where W and H are called the basis and weight, respectively, and r is the low rank of the embedding space. Normally, (m + n)r < mn.
The traditional NMF algorithm has the shortcoming of slow convergence speed [20]. To improve the performance of NMF, Ding et al. [21] proposed the Convex NMF algorithm. The W in Convex NMF is obtained by the linear combination of the original matrix V. Set G as the combination coefficient, then: where g i = 1 and g i ≥ 0. Compared with traditional NMF, Convex NMF provides more practical results and is more conducive to finding a sparse solution [21].
Because Convex NMF does not consider the influence of the weight matrix H, Thurau et al. [22] imposed another constraint on this basis (named Convex-Hull NMF), and Equation (2) can be rewritten by: If X = VG represents all vertices on a convex hull C(V), there must be such an H such that V = XH, suggesting that the cost function J = V − XH 2 can be 0. As the amount of data must be much larger than the number of C(V) vertices, fewer convex hull vertices can achieve the optimization goal. Therefore, not only in the convergence speed but also in the result representation, the result of CH-NMF is superior to that of Convex NMF and traditional NMF [22].
As a traditional nonnegative matrix, the time-frequency distribution can be extracted by a short-time Fourier transform (STFT) where w(t − τ ) is a shifted window. Here, the time-frequency distribution can be represented as V = |STFT (t, f )|. When NMF is applied to V, each basis vector w i in W can be regarded as a cluster in the frequency domain, and each weight vector h i in H is a cluster in the time domain. Thus, the representation of V with w i and h i can be given as [23]: where E represents error or noise. Liang et al. [16] used two intermittent sinusoidal signals of different frequencies to illustrate the clustering and correspondence. The decomposition is shown in Fig.1. Both w 1 and h 1 describe the signal of frequency f 2 , and both w 2 and h 2 describe the signal of frequency f 1 . The peak value of w 1 corresponds to the frequency of 2200 Hz, and w 2 peaks at the frequency of 1000 Hz. As the distribution of the signal in the time domain, h 1 corresponds to the time envelope of the frequency f 2 , while h 2 corresponds to the time envelope of the frequency f 1 . Obviously, the frequency and time distribution of the vibration response in the TFD can be extracted by NMF. For the obtained basis matrix, because the basis vector only reflects the frequency and has no time structure, the variable frequency or the vibration response distribution of the timevarying signal is not reflected.

B. CNMF AND EXTRACTION OF TIME-FREQUENCY BASIS
To absorb the time information into the spectral basis matrix, it is necessary to consider the potential correlation among adjacent columns of the input time-frequency matrix V. Therefore, Smaragdis [17] employs the convolution model to extend the basic NMF to CNMF, and decomposes the input matrix into the sum of the continuous basis matrix sequence W t and the corresponding coefficient matrix H: where T is the length of the shifted sequence, i→ (•) is an operator that shifts the column to the right by i bits, and the left shift portion is padded with zeros.
Based on CNMF, Vaz et al. [24] proposed the Convex-Hull CNMF method (CH-CNMF), which attempts to find K time components in V at any time t, and the decomposition can be defined by where S ∈ R m×p are p vertices of C(V), and G ∈ R p×K ×T + is the combination coefficient. Each time series W (t) = SG(t) ∈ R m×K + is the basis matrix obtained by the decomposition of V, and t→ H ∈ R K ×n + is the weight matrix. To improve the performance of CH-CNMF, the sparse constraint is added to the weight matrix H, and the following VOLUME 8, 2020 cost function is constructed by: where λ is the sparsity-regularization parameter. The sparsity constraint improves the interpretability of the basis vector because the weight vector is closer to the sparse impulse signal. By updating G and H alternately by iteration until they reach the predetermined convergence result or iteration steps, the update rules are as follows: where  To illustrate the decomposition result of CH-CNMF, we select the periodic swept-frequency cosine (chirp) signal for analysis, and the input signal is a combination of two periodic chirp signals. The lower frequency signal increases from 100 Hz to 150 Hz, the higher frequency component decreases from 400 Hz to 300 Hz, and the interference component is added at 1.07 s. With K = 2 and T = 0.2 s, the decomposition result is presented in Fig.2. It can be seen that the basis W extracted by CH-CNMF can effectively reflect the local time-frequency basis in the non-stationary signal.
Inconsistent with NMF, the weight matrix extracted by CH-CNMF corresponds to the onset time of each variable frequency component, which indicates that CH-CNMF is a deconvolution process. Moreover, although w 1 contains an interference component, there is no clear performance in h 1 , which implies that the individual interference signals have less impact on the weight matrix structure.

C. DECONVOLUTION OF VIBRATION RESPONSE BY CNMF
When a local fault occurs in the rolling bearing, periodic impulse excitation h(t) will occur under the rotation of the bearing [25]. These excitations are convoluted by the response of the bearing system and are reflected in the vibration signal as a periodic attenuation waveform. Therefore, the impulse response x(t) can be expressed as: where f (t) represents the transfer function. Usually, where ζ r is the attenuation factor, A is the initial amplitude, ϕ is the initial phase, and f r denotes the natural frequency. The response mechanism of the rolling bearing can be expressed in a graphical form as shown in Fig.3. Then the observed signal y(t) can be expressed as: where e(t) denotes noise and other interference components. For the signal in Fig.3, the carrier frequency of the periodic attenuation response corresponds to the natural frequency of the bearing system, and the periodicity of the impulse corresponds to the frequency associated with the fault. Thus, the deconvolution of the vibration response is efficacious in extracting the impulse excitation h(t).
When we apply CH-CNMF to the periodic vibration response signal (as shown in Fig.4), the decomposed w 1 is the time-frequency distribution of a single impulse response, and the weight vector h 1 is the impulse excitation h(t), which corresponds to the time location of each impulse. These results suggest that CH-CNMF provides an effective method to deconvolute the impulse source h(t) directly from the observed signal y(t).

III. IMPULSE FEATURE EXTRACTION WITH DECONVOLUTION
Suppose that there is a time domain signal: (13) where A k is the amplitude of the impulse, δ(t) is the Dirac delta function, and T 0 is the period of the impulse. The Fourier transform of y(t) is:  where Y (f ), F(f ), and E(f ) are the Fourier transforms of y(t), f (t), and e(t), respectively. To extract this impulse signal, the following main steps are proposed.

A. DECOMPOSITION OF VIBRATION RESPONSE
With the determined decomposition dimension K and time T , the time-frequency distribution V = STFT y (t, f ) is decomposed by CH-CNMF. The parameter K , a common empirical value, has not been strictly defined in previous studies. However, to completely separate the fault signal, we introduce a limitation, a ≤ K ≤ a + 3, to reduce overlap among resonance bands, where a is the number of frequency resonance regions with larger amplitude or known frequency components. Appropriately increasing the value of K can separate the frequency components and avoid the aliasing of resonance peaks. It is difficult to extract the response signal with too small a value of T , and too large a value increases the spatial complexity. The time T should be guaranteed to contain at least most of the impulse attenuation time t s . According to the impulse response mechanism of the rolling bearing, if the time taken for the response amplitude to reach ε% of the initial amplitude is t s , then there will be: From Equation (15), we can obtain t s = − ln ε% ζ r . However, ζ r is difficult to determine or even unknown for more complex signals, so it is difficult to use t s to determine T . In fact, as long as T ≥ t s , the decomposition time dimension already contains at least one impulse attenuation response (as shown in Fig.5), and it is possible to extract the fault information. Although it is difficult to determine t s directly, it is feasible to predict T 0 , and T 0 must contain an impulse attenuation response, so T is set to be near T 0 in this paper.

B. ENHANCEMENT OF IMPULSE FEATURE
By virtue of the decomposition, the basis tensor W encompasses matrices w i that contain the time-frequency basis of the impulse response f (t). The corresponding weight vector h i will also have the impulse excitation N k=1 A k δ (t − kT 0 ). VOLUME 8, 2020 The weight vector h i can be expressed as: where B k represents the amplitude of the impulse, B k = A k , and n(t) represents noise.
To diminish noise and make h i (t) closer to h(t), bandpass filtering with an approximate passband ranging from 2f r to 2f h is adopted to remove the lower frequency component, especially the rotation frequency, and the high-frequency interference, where f r is the rotational frequency of the shaft and f h is the fault characteristic frequency. In case the fault type is unknown, it is recommended to set f h to the maximum characteristic value of the inner ring fault to contain more information of various fault types. Thus, the filtered signal is: As the regularization term in Equation (8) merely enhances the sparseness of the impulse source and does not account for the critical period characteristic of the fault, the autocorrelation of h i f (t) is adopted to further weaken the effect of aperiodic components: The autocorrelation has the benefit of removing the uncorrelated components of the signal, i.e. noise and random impulsive contents, that are unrelated to any specific bearing fault [26]. Furthermore, the periodic excitation in h is enhanced and higher periodic peaks in the autocorrelation suggest more pronounced periodicity of the signal.

C. OPTIMAL SELECTION OF IMPULSE FEATURE
Since impulse excitation usually generates vibrations with multiple natural frequencies, the decomposition dimension generally satisfies K > a to ensure complete decomposition, which implies that more than one of the K autocorrelation curves contains the impulse source information. Therefore, it is necessary to determine the optimal autocorrelation curve that best reflects the fault impulse signal. The optimal objective is to select the curve with the simplest structure and the most prominent impulse period T 0 .
First, the power spectral entropy (PSE) is used to evaluate the spectral structure of the signal. For the autocorrelation curve R i h (τ ) of each weight vector, its Fourier transform is Then its power spectrum is S = S 1 , S 2 , · · · S p is used as the division of the power spectral energy. The information entropy of the response, i.e., the power spectral entropy is defined as: where q i = S i sum(S) represents the percentage of the power spectrum of the i-th segment in the entire spectrum. Obviously, the power spectral entropy index characterizes the spectral structure of the autocorrelation. The more uniform the energy distribution in the whole frequency component is, the more complex the signal is, the greater the uncertainty is, and the larger the power spectral entropy is [27].
Second, the correlated spectral kurtosis is adopted to measure the impulse feature. In previous studies, spectral kurtosis was usually used to measure the impact signal. However, this kurtosis guidance strategy is susceptible to strong non-Gaussian noise and tends to highlight outliers that are not related to real faults. To improve this indicator, McDonald et al. [12] proposed the correlated kurtosis indicator. The correlation spectrum kurtosis of the first-order shift (CK 1 ) is where L is the length of the acquired signal sequence, y l = 0 if l = 1, 2, · · · , L, and T is the estimated fault characteristic frequency that has been set when performing CH-CNMF. CK 1 characterizes the maximum value of periodic impulse over a particular period. Compared with other indexes used in fault diagnosis such as the Gini index [28] and kurtosis [13], CK 1 takes into account the well-known periodic characteristics of fault signals. The higher shift CK indicates a larger sequence of continuous periodic pulses.
Considering the complexity of the signal, we expect that the ideal periodic impulse excitation should be as simple as possible and contain obvious periodicity. However, there is no index that can take both characteristics into account in the current research, so we combine PSE with CK 1 (T ) to make a judgement. The smaller spectral entropy indicates that the spectral structure of the signal is simpler, and the larger CK 1 (T ) indicates that the signal contains more impulse responses with the period T . Therefore, three weight vectors {h a , h b , h c } with the smaller PSE are chosen, and then the weight vector h opt with the largest CK 1 (T ) among them is taken as the optimal object:

D. ALGORITHM
Based on the main steps above, we propose an impulse extraction method (CNMFD) for bearing fault diagnosis with deconvolution characteristics via CH-CNMF. The procedure of the algorithm is as follows: Algorithm CNMFD Input: signal y, decomposition dimension K , time T Step1: Perform STFT on the input signal to obtain the timefrequency distribution V ; Step2: Determine the decomposition dimension K and time T , and perform CH-CNMF on V ; Step3: Perform bandpass filtering on each weight vector; Step4: Perform autocorrelation on each filtered weight vector; Step5: Combine the PSE and the CK 1 (T ) to select the optimal analysis object h i ; Step6: Perform spectrum analysis on the autocorrelation curve R i h (τ ) to obtain the spectrum H (f ); Step7: Find the characteristic frequency and its multiplier, and determine the location of the fault; Step8: Determine whether H (f ) is clean. If there are many interference components, skip to Step 4 and reduce the passband range according to the obtained characteristic frequency in Step7;

IV. EXPERIMENTAL VERIFICATION A. NUMERICAL EXPERIMENT
According to the vibration response mechanism of a rolling bearing, the following rolling bearing vibration model is given as [29]: . (25) where T α is the cycle of the failure pulses, T α = 1/f h , and f h is the fault characteristic frequency. T d describes the time interval between external impulses. Because of its unpredictable VOLUME 8, 2020  nature, it is a random variable. P k , f k , β k correspond to the amplitude, frequency and phase of the discrete harmonics generated from the shaft, gear, etc.
Assuming that a local fault occurs in the outer ring of the bearing, the formant is at the natural frequency of 2200 Hz, and the modulation frequency f h is 80 Hz. In addition, three nonperiodic impulse attenuation components are included. The carrier frequencies are 1000 Hz, 2600 Hz and 3000 Hz, the fractional bandwidths are 0.1, 0.2, 0.3, and the phase angle is set to 0. Finally, Gaussian white noise with a signal-to-noise ratio (SNR) = 0 is added, and the vibration response signal is sampled at 10 240 Hz. The model of the simulated signal is described in Fig.7 and its waveform and spectrum are shown in Fig.8. Regardless of the waveform or the spectrum, it is difficult to directly observe the impulse response and the three natural frequency regions due to strong background noise.
As four interference frequency bands are found, the decomposition dimension K is set to 5, and T is set to 1/f h = 12.5 ms to cover one impulse response interval. The decomposition result of the TFD is shown in Fig.9, and Fig.10 gives the PSE and CK 1 (T ) of each weight vector. It can be seen that the weight vectors with smaller PSE are h 1 , h 2 and h 5 . However, the dominant part of w 5 corresponding to h 5 corresponds to the random impulse at 1000 Hz and contains fewer frequency components at 2200 Hz in Fig.9. The vector with the maximum CK 1 (T ) is h 2 among the three weight vectors with smaller PSE, and the CK 1 (T ) of h 1 is very small. Therefore, we choose h 2 for subsequent analysis and list the result of h 5 for contrast.
According to the criterion that the filter band should be set to approximately 2f r -2f h , we round it to approximately 60 -200 Hz to facilitate the actual filter setting. The autocorrelation curve of the filtered weight vector and its spectrum are given in Fig.11. For h 2 , the adjacent peak interval is 12.5 ms in the autocorrelation curve, which clearly shows the location of the fault impulse, and its spectrum shows the obvious fault frequency f h . In comparison, although h 5 captures the frequency of fault impulses in the waveform and spectrum, it has a smaller amplitude and more interference components than h 2 because the main frequency of h 5 points to aperiodic pulses at 1000 Hz.
Using the NMF method proposed by Liang et al. [16] to decompose the spectrogram, the results obtained are shown in Fig.12. The basis vector w 5 strikes at the fault carrier frequency, that is, h 5 corresponds to the envelope of the fault signal in the time domain. Fig.13 shows the time-domain signals extracted by CH-CNMF and NMF respectively. Results extracted by these two methods are basically consistent with the position of the theoretical pulse. Due to the difference in the solution goal, the relative magnitudes of the two solutions are not exactly the same. For example, the impulse can be clearly observed by CH-CNMF at 0.038 s, but it is difficult to observe the existence of the impact envelope by NMF. Meanwhile, the envelope wave has difficulty directly observing the location of impulses due to noise interference. In comparison, the impulse features extracted by CH-CNMF are simpler and more intuitive. However, there are missing amplitudes or interference amplitudes in some places that are greatly affected by noise, which is the main reason for further optimization by autocorrelation analysis.
For comparison, the impulse source extracted using the MOMEDA algorithm is shown in Fig.14. The curve of multipoint kurtosis with samples is shown in Fig.14(a), and the filtered signal has the largest spectral kurtosis when samples = 2. The result extracted by samples = 2 is shown in Fig.14(b), (c). Due to the influence of interference noise, the maximum kurtosis does not point to the periodic pulse, and the obtained result does not reflect the simulated fault.
In fact, according to the fault frequency, the fault information should be best reflected by setting samples = 1/f h × f s = 128, and the extracted result is shown in Fig.15(a), (b).  Compared with the extracted impulse in Figs. 11(a) and (b), although the fault frequency is already known, the extracted impulse source signal by MOMEDA still contains more noise and interference components.

B. OUTER RACE FAULTS
The bearing life test data [30] provided by the Intelligent Maintenance System Center of Cincinnati University are used to verify the method, and illustrate the effect of the proposed method. As shown in Fig.16, four bearings are mounted on the same shaft and the two bearings in the middle are subjected to a radial pressure of 27 kN. The vibration data are acquired by an acceleration sensor at a sampling frequency of 20,000 Hz every 10 minutes. Here we choose the data of outer ring bearing failure on February 16, 2004. According to the bearing parameters, when the rotational frequency  The waveform and spectrum of the data are shown in Fig.17. Since the bearing is in the early stage of the fault, there is no periodic impulse in the time domain signal. In the spectrum shown in Fig. 17(b), not only are the low frequency components complex, there are also multi-resonance bands, and the main resonance band frequency is distributed between 3000 Hz and 5000 Hz.
Since each resonance band can correspond to the characteristics of the impulse response, it is not necessary to find out all the resonance bands. In Fig.17(b), the spectrum contains  four main resonance bands, so K is set to 5 to ensure effective separation. T is approximately 1/f o = 4.2 ms (approximately 85 points), so we set T = 85/f s = 4.25 ms to decompose the time-frequency distribution of the signal by CH-CNMF. Fig. 18 and 19 show the decomposed result and the PSE and CK 1 curve of each weight vector, respectively. In Fig.19, h 1 , h 2 , and h 3 have smaller power spectral entropy, that is, they have simple spectral structures. Among them, h 2 has the largest CK 1 , so we select h 2 as the analysis object.  In Fig.20, we compare the spectrograms, w 2 , h 2 , and h 4 . It can be seen that w 2 is prominent with an impulse response around 4480 Hz. Most peaks of h 2 reflect the location of each impulse corresponding to w 2 in the time domain. h 4 also has VOLUME 8, 2020  many peaks corresponding to this frequency range. Due to the complexity of the actual system structure, the shape of the defect, and noise, the actual impulse response waveform may have a certain change, which is difficult to keep consistent with the theoretical shape in Fig.4. Therefore, the impulse response corresponding to the same time-frequency basis will be decomposed into the same h i , and individual changed impulse responses will be decomposed into other weight vectors, for which the corresponding impulse excitation position in h i may be missing, and the positions can be restored by the correlation function.
To eliminate the frequency modulation and other component interference, the bandwidth of the bandpass filter is from 2f r to 2f o , and the parameters of the filter are adjusted to integer for simple calculation. Thus, since 2f r and 2f o are equal to 66.6 Hz and 472.8 Hz respectively, the cutoff frequencies of pass-band filter are set to 100 Hz and 500 Hz. Finally, the autocorrelation curve and its spectrum corresponding to h 2 are shown in Fig.21(a), (b). The autocorrelation curve has no other interference components, and the interval between adjacent peaks is approximately 4.2 ms, corresponding to a fault frequency of 236.4 Hz. The frequency component of fault impulses can be observed directly in the spectrum of the autocorrelation curve R 2 h (τ ). In addition, the MOMEDA algorithm is adopted to extract the fault impulse and the result is shown in Fig. 22. For Fig.22(a), the filtered signal has the largest spectral kurtosis when samples = 12, which is highly likely to be affected by interference noise. The result also shows that the filtered signal is independent of the fault characteristics. Considering the frequency of the fault feature, when samples = 1/f o ×f s = 85, the MOMEDA algorithm will obtain the best result which is shown in Fig.22(c). However, if the parameter is set manually, the value of blind deconvolution is lost. Meanwhile, there are more noise and interference components in the waveform. The result shows that the proposed method CNMFD can find the correct fault cycle, and the extracted impulse signal is cleaner, which has better deconvolution ability than that of MOMEDA.

C. INNER RACE FAULTS
The other experiment is carried out on the rolling bearing test bench. The test bench is mainly composed of a governor, a DC motor, a support box and a radial loading device. The basic structure is shown in Fig.23. A certain radial load is applied to the 6308 rolling bearing, and the accelerometer is installed in the vertical direction of the motor drive end bearing housing. In the experiment, the bearing with an inner ring scratch is used for analysis. The data are collected by the OR35 data acquisition system, and the sampling frequency is 12 800 Hz. When the rotation frequency is set to VOLUME 8, 2020    Fig.24. Due to the minor fault, it is difficult to observe the periodic impulse response directly from the waveform. The spectrum contains a large number of resonance peaks, and it is difficult to find the existence of the fault. As the spectrum contains approximately four main frequency peaks, K is set to 5 to ensure complete decomposition. According to the fault frequency, T = 1/f i = 11.6 ms is taken to obtain at least one time-frequency structure. The decomposition result of TFD is shown in Fig.25. To measure the spectral structure and correlated kurtosis, the PSE and CK 1 of each weight vector are calculated, which are presented in Fig.26. In Fig.26, h 1 , h 2 , and h 3 have smaller power spectral entropy and h 2 has the largest CK 1 among the three, which means that h 2 may be the closest to the fault impulse signal. Moreover, h 2 is analysed between 50 Hz and 180 Hz to eliminate the frequency modulation and other interference, and the autocorrelation of the filter signal and its spectrum are shown in Figs. 27(a) and (b). Although most of the adjacent peak intervals are 11.7 ms in the autocorrelation curve, which is consistent with the fault frequency, due to the influence of the interference component, the extracted impulse contains some irrelevant peaks. At the same time, the spectrum in Fig.27(b) contains some interference peaks, and it does not affect the judgement of the fault frequency.
The impulse source signal extracted by MOMEDA is also given in Fig.28 for comparison. If we attain results automatically from the original algorithm, the filtered signal has the largest spectral kurtosis when samples = 2, which is most likely affected by the interference noise. Based on the actual fault frequency, if we artificially set samples = 1/f o × f s = 148 by the fault characteristic frequency, this should achieve the best result. Although the correct result can be obtained by setting samples = 148, the extracted result contains a large amount of noise in Fig.28(c). The impulse signal extracted by accurate sampling length is not as clean as the result of the proposed method in the waveform, reflecting that the proposed method has a better deconvolution effect.

V. CONCLUSION
Directly extracting an impulse excitation from a vibration signal is an efficacious approach to bearing fault diagnosis. Existing methods, e.g., MED, MCKD, and MOMEDA. MCKD requires priori knowledge of the fault period, which considerably limits fault diagnosis. MOMEDA takes the kurtosis indicator to construct the filter and tends to find the wrong period for complex signals due to the existence of interference components and noise. Based on the time-frequency distribution of the impulse response signal, this paper proposes a deconvolution method for extracting impulse excitation using Convolution NMF. Time-frequency distributions can effectively characterize the localized features of the impulse response, including the resonance frequency band and the attenuation response. Convolution NMF is employed to decompose the impulse response. Meanwhile, an autocorrelation function is adopted to further enhance the impulse feature in the weight vectors. Compared with MOMEDA, the simulation results and practical experiments verify that the proposed method circumvents the incorrect orientation of the kurtosis indicator and successfully extracts pure and accurate impulse features.