Tapered Roller Bearing Failure Diagnosis Based on Improved Probability Box Model

Bearing failure often occurs in rotating machinery. The fault diagnosis method based on the vibration signals has been studied for many years. This paper proposed an improved probability box (ip-box) modeling method for diagnosing bearing faults. The major theoretical principles involved with the probability box (p-box) modeling methods and a projection method. Since a larger aggregated width results in the p-box not being conducive to a fault identification and diagnosis, the mean of the focal element interval and the amount of data fluctuation between the adjacent focal elements were used as additional information. Then, the additional information was added to the ip-box model by the cooperative optimization method. Finally, the experimental results showed that the classification performance of a support vector machine (SVM) trained with eight measured values from the ip-box was significantly improved.


I. INTRODUCTION
The fault diagnosis technology of bearings has become an important means and key technology to ensure the safety and stability of production systems in the development of a modern industry [1]. Data obtained in practical engineering contains uncertainty because of restrictions on the working environment or test cost [2]. The existence of the uncertainty limits an application of the traditional probability model for the fault diagnosis technology of bearings [3], [4]. An interval model is one of the common methods to describe the uncertainty of bearing signals, but it can obtain only the range of the bearing data, not its probability distribution, and the statistical information of the bearing data cannot be perfectly used.
Tang et al. [5] introduced the theory of the p-box into mechanical fault diagnosis to solve the above problems. The idea of p-boxes was originally put forward to express pure 'epistemic uncertainty' with 'interval' and has experienced cross research with fuzzy theory [6], evidence theory of Dempster structure [7], Boolean logic reasoning based on traditional probability theory [8], Kolmogorov method with sparse samples [9], etc. The p-box theory not only integrates The associate editor coordinating the review of this manuscript and approving it for publication was Filbert Juwono . random algorithms such as Bayesian reasoning and evidence theory, but also artificial intelligence algorithms such as neural network [10], expert system and fuzzy set theory [11]. For the p-box in an application of the bearing diagnosis, vibration signal is considered as a random variable, and its random distribution type is analyzed to establish the p-box model; then features of the p-box are obtained by measuring the aggregated uncertainty of the p-box to input into SVM model. However, the focal element interval of the p-box may not be the optimal width for each p-box model, which may increase the data crossover between p-boxes, thereby reducing the independence of input data in the SVM model. Because the interval model is the basis of the p-box theory, the width of the p-box can be significantly reduced by reducing the confidence, but the estimation of parameters will be seriously distorted. In the bearing fault diagnosis, the p-box must be discrete into focal elements. Therefore, there may be data redundancy in the focal elements from the same p-box. However, through nonlinear programming, the phenomenon of partial repetition between focal elements can be reduced. Based on a convolution quadrature method, Fahmy developed a novel numerical modeling optimization technique [12]. The interval evaluation of parametrized functions was defined by Marendet et al. [13] adapting numerical constraint programming techniques to quantified inequalities. A piecewise linear approximation method, a derivative algorithm, and a duality-based algorithm are proposed by Zhang and Li [14] for solving interval quadratic programming. Based on nonlinear interval programming and the nested optimization solving strategy, Wang et al. [15] proposed an interval uncertain optimization method for artillery structural dynamic responses considering robustness and interval economy. An improved spatial branchand-bound algorithm was proposed by Zhao et al. [16] to efficiently solve the reformulated bilinear programming problem. Besides, Zeng et al. [17] proposed a new multiple attribute decision-making method based on the nonlinear programming methodology. Based on the normal fluctuation range of each cyber-physical energy system state, a bilevel nonlinear programming model was proposed by Wang et al. [18]. More the nonlinear programming models in [19]- [22]. However, the treatment of the focus element interval is rarely found. With the deepening of research, the nonlinear programming process of the projection method is relatively simple in the interval values and satisfies the accuracy of programming for the p-box focus element.
For the focal element interval correction of the p-box, this paper discretizes the p-box into several focal element intervals and constructs the whole envelope interval of focal element interval sets. Then, the maxima and minima are searched in the focal element interval by an optimization method, and the complete search path is saved. By evaluating the relationship between the subinterval of the focal element and the search path, the crossover point along the negative gradient direction and the bound of the focal element interval is obtained as the initial point of the subinterval search path. Then, the extremum of the focal element interval is solved by the projection method. The method optimizes all the focal element interval sets in a collaborative way, avoiding the repeated calculation of an internal search, reducing the aggregated width of the p-box, and improving the overall calculation efficiency. The mean value of the interval and data fluctuation of the adjacent focal element interval is added as additional information to the p-box, which can further reduce the aggregated width to avoid the overlap between p-box models. Then, the area difference and bound difference between the p-box and ip-box are defined, and the improvement degree of the p-box is analyzed quantitatively. Finally, the validity of the proposed method is demonstrated based on the experiment data. It should be noted that the current method is used to optimize the width of the focal element interval for the p-box model, i.e., ip-box model, using the nonlinear programming process of the projection method.

A. BASIC NOTIONS OF P-BOX
A signal collected by an accelerometer varies with the time because of the measurement error of the sensor, the different position of the measurement, and the different of the working condition, then the signal can be considered as a random variable X . A cumulative probability distribution function (CDF) of the random variable X does not be expressed by a single curve, as the estimated value x of X is not a single scalar value [23]. Considering ith estimated value x i ∈ x i , x i , the upper and lower bounds of the CDF are given by: where F (x) and F (x) are the upper and lower bounds of the CDF, respectively; P denotes a lower probability measure.
is called as the p-box, then the random variable X with the uncertainty is limited in it. For a single scalar value, Eqs. (1a) and (1b) are equivalent, and the CDF can be expressed by a single curve; for an interval, the schematic diagram of CDF can be plotted in Fig. 1. In Fig. 1, the area of x corresponds to the lower probability measure P(X > x) of Eq. (1a); the area of { corresponds to F (x) = P(X ≤ x); the area of 1-x corresponds toF (x) = 1 − P(X > x), i.e., the sum of the areas of y, z and {. The difference between Eqs. (1a) and (1b) is the area of y and z, which is not equivalent.

B. P-BOX MODELING METHOD OF BEARING TIME-DOMAIN SIGNAL
The bearing signal has strong nonstationary randomness, which includes not only the irreducible uncertainty which cannot be reduced by further empirical study brought by the accuracy of the equipment (although the equipment may be better machined) but also the epistemic uncertainty brought by the operator during the collection of bearing data, it can generally be reduced by an additional empirical effort [23]. The p-box model can well achieve the fusion of the irreducible and epistemic uncertainty. According to the different forms of bearing signals, three p-box modeling methods were proposed in the previous work [5].
The probability and statistics toolbox of MATLAB is used to analyze whether the bearing data samples satisfy commonly used random distributions. If random distribution type is met, a distribution-type p-box modeling method (DTPMM) can be used. Normal distribution for the bearing data is considered. Then, the p-box can be determined by limiting the random distribution parameter in an interval mean µ ∈ [µ left , µ right ] and standard deviation σ ∈ [σ lower , σ upper ] with a mass m, which is given by And ''the averaging discretization method and the outer discretization method can be used to discretize a parameter of the p-box model in any case'' [5].
The strong nonstationary randomness of the bearing signal may cause the bearing data samples to not satisfy a random distribution type, so the DTPMM is inapplicable. A dimensionless p-box modeling method (DPMM) can obtain the dimensionless values from the raw bearing data, which avoids the data clutter and improves the regularity. The dimensionless values may satisfy the random distribution type. The skewness features and the kurtosis features are the third and fourth moments of the bearing signals, respectively, which can be expressed as: where SK i (i = 1, . . . , m)(k = 3) and SK i (i = 1, . . . , m)(k = 4) are the skewness feature and kurtosis features of the ith row vector of bearing signals, respectively, and α i is the mean of the ith row vector. Then, a vector of skewness features and a vector of kurtosis features can be obtained. If these vectors satisfy a random distribution type, the DTPMM can be applied.
The DPMM needs to extract the features from the raw bearing data, which results in information loss in the probability and statistics of the raw bearing data. To preserve the characteristic (i.e., the information of the probability and statistics), a raw-data p-box modeling method (RDPMM) is proposed. The RDPMM can directly establish the p-box model for the raw bearing data based on the definition of the p-box. First, the maxima and minima from each column vector of bearing signals can be obtained by the RDPMM. Then, a Dempster-Shafer structure is obtained based on the maxima and minima vectors and discretized according to the same basic probability assignment. Finally, the upper and lower bounds of the p-box can be approximated by discrete sampling.

III. IMPROVEMENT AND COMPARISON OF P-BOX MODELING METHODS
In this section, we mainly improve the influence of the overlapped interior region of the intervals on the p-box, and then the area difference and bound difference between the p-box and ip-box models are analyzed quantitatively.

A. IP-BOX MODELING METHOD
The aggregated width of the p-box model has a great influence on the result of pattern recognition. In some cases, a calculated p-box will also be the best possible result in the sense that the bounds can be no tighter without excluding some of the overlapped interior regions of the intervals.
When the wider p-box model is discretized into a large number of focal elements, there are overlap regions between the focal elements, which not only is detrimental to the correct classification rates but also reduces the calculation efficiency [24]. However, although these focal elements overlap each other, they are all included in a Dempster-Shafer structure. Using the gradient project method based on the characteristics, a collaborative optimization method can be proposed to improve the width of the p-box. According to the sampling frequency n, the p-box model can be discretized into a Dempster-Shafer structure with the following expression: Although there are overlapping regions between A i−1 , A i , and A i+1 , they are within the definition domain of a random variable X . Consider a union A ∪ is related to the focal element interval A by the following expression: Define the set of enveloping all A i as an interval envelope A envelop . Then, the A ∪ must be included in the interval envelope A envelop : where the minimum interval envelope A envelop of A ∪ can be expressed as: The interval envelope A envelop is obtained by the discretized p-box model, where a large number of overlap regions exist in the intervals [25]. By evaluating the overlap regions between A i and A envelop , the initial gradient value and gradient direction of the subregions can be determined to reduce the overlap regions and repeated calculation of the interval, as shown in Fig. 2.
Consider Fig. 2(a). Search the feasible direction in the bound x ≤ x ≤ x along the negative gradient direction of the distribution functions of p-box −∇F((x 1 , x 2 ) k ) based on min F X (x; ), where is a random distribution parameter and x satisfiesx ≤ x ≤ x [26]. When the bound x 2 = x 2 , −∇F((x 1 , x 2 ) k ) is projected on the bound to generate a new search direction d k . The modeling steps of the method can be expressed as follows: Step One: Consider x 1 as the initial feasible point, and set k = 1.
Step Two: Transform the bound x ≤ x ≤ x into the form of inequality Ax ≤ b, then A and b are respectively decomposed into Step Three: Define M = A 1 . If M is an empty set, let P = I , where I is the unit matrix; otherwise, P must satisfy the following expression: Step Four: Let d k = −P∇F(x k ). If d k = 0, go to step six; if d k = 0, go to step five.
Step Five: If M is an empty set, then stop the calculation, and obtain x k ; otherwise, obtain u by the following expression: where if u ≥ 0, then stop the calculation, and obtain a global best point x k ; if u contains a negative component, then selects a negative component. For example, A 1 is corrected by u j . Remove the row corresponding to u j in A 1 , and return to step three.
Step Six: Solve min F(x k + λd k ), s.t: 0 ≤ λ ≤ λ max , where λ max can be obtained by the following expression: where b and d can be expressed as follows: and F X 1 (x 1 ; 1 ) can be obtained by a search. Then, calculate x k+1 by the following expression: Set k = k + 1, and return step two. To understand the above modeling method, consider Fig. 2(b). A 1 , A 2 and A n are obtained from the discretized p-box model. In addition, x (1) is the initial feasible point.
Here, x (1) ∼ x (n) is the search path, where the direction is the negative gradient direction; x (1) ∼ x (n−1) is the search path inside bound, and x (n−1) ∼ x (n) is the search path after gradient projection. For A 1 , A 2 and A n , the partial search path of x (1) ∼ x (n) is also their internal search path because of the overlapped interior region of the intervals. For example, (4) is the internal search path of A 1 , and x (2) ∼ x (6) is the internal search path of A 2 . When A 1 , A 2 and A n are searched, filtering the repeated internal search paths and searching directly from different search points will avoid many repeated search calculations. The extreme point of A 1 can be calculated by the gradient project method. Point P A1 is the initial point to search directly, which avoids searching interior regions; for the same reason, search A 2 with the point P A2 .
The crossover point can be obtained between A i and the search path (x min 1 , x min 2 , . . . , x min k ) by calculating the minimum value of F(x) in the domain of definition of A envelop and saving (x min 1 , x min 2 , . . . , x min k ) and gradient information. The last crossover point P min Ai along the search direction is selected as the initial point. Calculate the optimal solution of min F X (x; ), s.t: x ≤ x ≤ x, and assign it to the upper bound of the p-box F i . Similarly, the optimal solution from the negative function −F(x) of F(x) can be obtained and assigned to the lower bound F i .
The bearing signals contain a large amount of statistical information, such as the mean, mode, and variance, which can be used [27]. Adding this information to the p-box model is conducive to the improvement of the area of the p-box model. The mean x average of the bearing data can express the amount of data fluctuation between the adjacent focal elements [28]. The additional information is related to the mean x average by the following expressions: where u and u are the supremum and infimum of the focal element interval. Then, a Dempster-Shafer structure is related to the additional information by the following expression: where u i ≤ u i and where u i = u i−1 whenever u i = u i−1 . Therefore, the upper bound and lower bound can be approximated by F and F, respectively.

B. ANALYZING P-BOX AND IP-BOX
The area difference between the p-box and ip-box can be calculated by defining the distance between the area and bound of the p-box and ip-box. Consider two random distribution parameters ψ 1 and ψ 2 , one from the p-box [29]. The area between the p-box upper bound F ψ1 (x) and the ip-box upper bound F ψ2 (x) can be calculated by the integral: where S is the area between the p-box upper bound F ψ1 (x) and the ip-box upper bound F ψ2 (x). Similarly, the area between the p-box lower bound F ψ1 (x) and the ip-box lower bound F ψ2 (x) can be calculated by the integral: where S is the area between the p-box lower bound F ψ1 (x) and the ip-box upper bound F ψ2 (x). The absolute area difference S is related to S and S by the following expression: The absolute area difference S can express the area difference between the p-box model and ip-box model as a whole, but the distance of the bounds cannot be expressed. Consider one point β in the cumulative probability range of 0 to 1. Draw a horizontal line parallel to the horizontal axis through β [30]. The horizontal line and p-box (or ip-box) will produce multiple crossover points, where the crossover points of and F ψ2 −1 (β), respectively. Then, the absolute bound difference L(β) can be defined as where L 1 (β) and L 2 (β) can be calculated by the following expressions: S and L(β) can quantify the difference between the p-box and the ip-box, and they are used to compare and analyze the results of the two modeling methods. For the same reason, can be defined as: can be defined as:

IV. MODELING AND DISCUSSION
A. DATA COLLECTION The vibration signals in Ref. [5] were used to verify the effectiveness of the improved method. The drive end bearing type is a 30305 SKF tapered roller bearing. The eignal acquisition system is NI PXI-1042Q high-performance acoustic vibration testing system. The sensor is a PCB M603C01 ICP acceleration transducer; the sampling frequency is 10240, and the motor speed is 800r/min; the sample numbers is then set to 60 [5]. The experiment platform, accelerometers arrangement and photograph of fault bearings are shown in Fig. 3, where the grooves with 0.5mm deep and 0.5mm wide were processed in the bearing inner raceway, outer raceway, and rolling element by using wire-cut electrical discharge machining technology. The vibration data and bearing fault cases can be expressed as follows: ''The power spectra of the bearing signals under eight conditions are shown in Fig. 4, where H stands for healthy bearing, IR stands for inner race fault, OR stands for outer race fault, RE stands for rolling element fault, IOR stands for inner and outer race faults, IRRE stands for the inner race and rolling element faults, ORRE stands for the outer race and rolling element faults, and IORRE stands for the inner race, outer race, and rolling element faults''; the power spectra were calculated by the MATLAB 2017b in which the Hanning window was used to record the data, then data of the power spectra were extracted for plotting power spectra using the OriginPro 2018.
H, IR, OR and RE correspond to single fault signals of the rolling element bearing, and IOR, IRRE, ORRE, and IORRE correspond to compound fault signals in Fig. 4. Based on the vibration spectrum, we found that ''the basic power spectra are not generally informative in Fig. 4, partly because they are in the presence of strong masking signals from other machine components but also because the angle of the load from the radial plane varies with the position of each rolling element in the bearing as the ratio of the local radial to the axial load changes'' [1], [31]. It should be noted that the uncertainty of the bearing signals is the main characterization to bearing signals, because it always exists and changes from the beginning of health signals to the formation of fault signals. Hence, collecting uncertainty of the bearing time-domain signals provides a new way for rolling element bearing diagnostics.  In a previous study, we showed that the p-box modeling method can suitably avoid the decoupling process for bearing signals [5].

B. COMPARISON BETWEEN P-BOX AND IP-BOX BASED ON DPMM
The random distribution of these features from the IORRE data can be verified to establish a p-box based on the DPMM, as shown in Fig. 5, where the results for the random distributions of the kurtosis and skewness features satisfy normal distributions, and their number of focal elements is 1000. The remaining bearing signals (i.e., H, IR, OR, RE, IOR, IRRE and ORRE) can also be verified to establish corresponding p-boxes based on the DPMM.
Kurtosis and skewness p-boxes of IORRE are the set of all nondecreasing functions from the reals within 0 to 1, as shown in Fig. 5, where the initial feasible points for the kurtosis and skewness of p-boxes is on the boundary.  kurtosis and skewness p-boxes of IORRE. When N=1000, the p-boxes and ip-boxes are compared as shown in Fig. 6.
The distance of bounds for the ip-box is smaller than that of the p-box, as shown in Fig. 6. This may be because the overlapped region of the focal element intervals is reduced by filtering the repeated internal search paths to obtain the optimal modeling path based on the ip-box modeling method. The iterations of the ip-boxes and p-boxes are listed in Table 1. From Table 1, we infer that the iterations of the skewness p-box are lower than the iterations of the kurtosis p-box because more time is required to calculate a greater-order moment. The iterations of the kurtosis ip-box account for 48.14% of the iterations of the kurtosis p-box; the iterations of the skewness ip-box account for 60.16% of the iterations of the skewness p-box. The difference in the relative efficiency is affected by the search regions of the optimized path, and the overlapping of search regions increases the relative efficiency of the iterations. From Fig. 5, we infer that there are many overlapping parts in search regions when solving the maximum and minimum kurtosis p-box and skewness p-box, but the overlapping regions of the kurtosis p-box are smaller than the overlapping regions of the skewness p-box, which reduces the relative efficiency of the kurtosis (ip-box vs. p-box). Similarly, the p-boxes of H, IR, OR, RE, IOR, IRRE, and ORRE based on the DPMM are analyzed, and they are similar to the p-box of IORRE.

C. EFFECT OF FOCUS ELEMENTS
The focal element intervals of the ip-box model are smaller than those of the p-box model; that is, the data of the ip-box model are closer to the mean value, as shown in Fig. 6. When the cumulative probability density of S is 0.6, the area S of the upper part is smaller than that of the lower part ( Fig. 5(a)). When the cumulative probability density of S is 0.4, the area S of the upper part is larger than that of the lower part. When the cumulative probability density is 0.5, S is approximately antisymmetric to S (Fig. 5(a)). When the cumulative probability density of S is 0.6, the area S of the upper part is smaller than that of the lower part ( Fig. 5(b)). When the cumulative probability density of S is 0.3, the area S of the upper part is larger than that of the lower part. Similarly, S is approximately antisymmetric to S (Fig. 5(b)). Using S + S, the curve of area difference between the ip-box and p-box based on the kurtosis and skewness of IORRE is obtained under the array N= [20, 30, ..., 1000], as shown in Fig. 7. As the number of focal elements increases, the area difference increases (Fig. 7). This is because Pl i and Bel i can be approximated by F and F in a stepwise, respectively. When the number of focal elements is less than 400, the growth rates of area differences for the kurtosis and the skewness are similar; when the number of focal elements is more than 400, the growth rates of area differences of the kurtosis are larger than that of the skewness. This is because the skewness is a smaller-order moment of a probability function than the kurtosis. From Fig. 5, we infer that ∇F Kurtosis > ∇F Skewness . Then, for min F X (x; ), s.t: x ≤ x ≤ x, as the number of focal elements increases, the kurtosis is more advantageous than the skewness. The number of focal elements is 1000, and the area difference between the p-box and the ip-box based on the kurtosis of IORRE is 0.22925; the area difference of the skewness is 0.19621 (Fig. 7). However, the distance of the bounds cannot be expressed by the area difference. To express the bound difference L(β), the p-box and the ip-box are truncated with β = [0.05, 0.5, 0.95], as shown in Fig. 6. The bound differences between the ip-box and p-box are larger on both sides and smaller in the middle. A relative area difference and a relative bound difference are calculated by Eqs. (19) and (20), as listed in Table 2, respectively, because the relative difference can better reflect the reliability of measurement than the absolute difference.
From the relative area differences in Table 2, we infer that the improvement degree of the kurtosis is better than that of the skewness. The relative boundary differences of kurtosis and skewness are close to 0 under β = 0.5, which is less than the values for β = 0.05 and β = 0.95. For β = 0.05 and β = 0.95, the relative boundary difference of kurtosis and skewness is approximately equal, which proves that S is approximately antisymmetric to S (Fig. 6(b)). This may be because the modeling data satisfy a normal distribution, where (µ left , σ lower ) must be antisymmetric to (µ right , σ upper ) under β = 0.5 and (µ left , σ upper ) must be antisymmetric to (µ right , σ lower ) under β = 0.5. However, the numerical calculation will produce errors, so there is only approximate antisymmetry for S and S. From Table 2, we infer that this small error can be ignored. For H, IR, OR, RE, IOR, IRRE, and ORRE, the relative area difference and relative bound difference based on the DPMM are listed in Table 3.
Comparing Tables 2 and 3, the improvement degree of the p-boxes for single fault signals of the rolling element bearing (i.e., the p-boxes of H, IR, OR and RE) is smaller than that of the p-boxes for compound fault signals (i.e., the p-boxes of IOR, IRRE, ORRE, and IORRE). This may be because the overlapping interior region of the Dempster-Shafer structure for the p-box of compound fault signals is larger than that of single fault signals. From Table 3, we infer that the improvement degrees of the kurtosis and skewness p-boxes for H are the smallest because the healthy bearing signals envelop a minimal amount of uncertainty, and the p-box of H could be tighter at the beginning.

D. COMPARISON BETWEEN P-BOX AND IP-BOX BASED ON RDPMM
It is worth emphasizing that the cooperative optimization method is inapplicable for the RDPMM because the p-boxes can be established by the RDPMM based on raw bearing data, which avoids the need to verify the suitability of a random distribution. Then, the additional information can be added directly into the p-box of the RDPMM (calculated with Eq. (13)), which reduces the amount of IORRE data fluctuation between the adjacent focal elements. The p-box of the RDPMM and the ip-box of the RDPMM based on the IORRE were established as shown in Fig. 8.
The interval range of each focal element interval is not equal (|x i − x i = |x i+k − x i+k ), which results in a different improvement degree for each focal element (Fig. 8). The larger the raw focal element interval x i , x i is, the larger the improvement degree is. The aggregated width of the ip-box is smaller than that of the p-box, as shown in Fig. 8. The ip-box and the p-box need 714301 iterations, respectively, indicating that the number of iterations for the ip-box has not decreased. This is because the modeling data of the p-box and the ip-box for the RDPMM are from the raw bearing data. In a comparison with the number of iterations 32670 for the kurtosis p-box of IORRE, the relative efficiency (the kurtosis p-box vs. the RDPMM p-box) is 4.57%. In a comparison with the number of iterations 25580 for the skewness p-box of IORRE, the relative efficiency (the skewness p-box vs. the RDPMM p-box) is 3.58%. Similarly, in a comparison with the number of iterations 15730 for the kurtosis ip-box of IORRE, the relative efficiency (the kurtosis ip-box vs. the RDPMM ip-box) is 2.20%; in a comparison with the number of iterations 15390 for the skewness ip-box of IORRE, the relative efficiency (the skewness ip-box vs. the RDPMM ip-box)  is 2.15%. The results for other bearing signals were similar to IORRE. Thus, although the aggregated width of the ip-box for the RDPMM is small, the number of iterations is not reduced.

V. INTELLIGENT DETECTION FOR ROLLING BEARING A. OVERLAPPED REGIONS BETWEEN DIFFERENT BEARING CONDITIONS
For the different bearing conditions, the p-boxes and the ip-boxes are established based on the skewness as shown in Fig. 9.   9 gives the upper and lower CDFs of the different bearing conditions. The p-boxes based on the skewness values of the different bearing conditions are not easy to be distinguished in Fig. 9(a) because of the overlapped regions between the different p-boxes; the overlapped regions are not conducive to the intelligent detection of machine learning. Based on the ip-box modeling method, the overlapped regions are reduced by filtering the repeated internal search paths to obtain the optimal modeling path as shown in Fig. 9(b). Based on the kurtosis and the RDPMM, the p-boxes and the ip-boxes were established to contrast the overlapped regions; the experimental results showed that the overlapped regions of the ip-boxes were smaller than that of the p-boxes. Reducing the overlapped regions can increase the discrimination of each ip-box, then improve the recognition rate of the ip-box. The effectiveness of the ip-box is demonstrated as follows.

B. FEATURE-BASED DEMPSTER-SHAFER STRUCTURE
In the pattern recognition system, the ip-box must be discrete into a Dempster-Shafer structure which is composed of the improved focal element intervals and mass function. The width of different focal element intervals can be obtained through the uncertainty measurement method. The weight of a single focal element interval can be obtained by multiplying the corresponding mass value because different focal element interval is independent of each other. A basic feature of ip-box can be obtained by accumulating every uncertainty probability of the Dempster-Shafer structure. The basic feature called the aggregated width can be expressed as: where α x i and α x i are the upper and lower bounds of the focal element interval respectively; m i is the mass value of the corresponding focal element interval. The aggregated widths can be obtained by measuring the aggregated uncertainty of different ip-boxes, as shown in Fig. 10. From Fig. 10, the improved aggregated width is smaller than that in Ref. [5], which proves that this paper method is effective. For the improved method, the aggregated width of the RDPMM ip-boxes is the smallest, and that of the skewness ip-boxes is the largest. It may mean that the RDPMM ip-boxes are more conducive to pattern recognition because  the improvement of the aggregated width can reduce the degree of overlap between ip-boxes. Similarly, in order to measure more ip-box information, more features can be expressed as: where ε 1 and ε 2 are the means of the probability statistics associated with the lower and upper bounds of the cp-box, respectively; ω i (i = 2, 3, 4) and ω 1 are similar, and are scalar values. is a conditional value similar to the degree of confidence, which satisfies 0.95 was used for in the study [30]. ω j (j = 5, 6) is the interval value of uncertainty measurement information of the ip-boxes, as listed in Table 4.

C. PATTERN RECOGNITION WITH DIFFERENT METHODS
''In the present approach, the influence of various loads and speeds on the ip-box leads to shifting lower and upper bounds of the ip-box to the negative or positive values on the abscissa axis, but the aggregated width of the ip-box is unaffected because the signal patterns remain unchanged. This is also one of the advantages that the ip-box can include the uncertainties of the bearing signals. Hence, the ip-box still has the advantages of improvement under different loads and speeds and has the robustness of anti-interference. Additionally, the diagnostic results are unaffected as long as the aggregated width remains unchanged. For simplicity reasons, we can ignore the influence of different loads and speeds on the ip-box to study the classification performance of different ip-box modeling methods.'' A total of 1600 p-boxes were used in this study, where 60% were training set, 20% were verification set and 20% were test set. The Python environment-based SVM classifier was configured as described in Ref. 5.
Based on the SVM, the results of the kurtosis ip-boxes were displayed in a two-dimensional confusion matrix [32], as listed in Table 5.
From Table 5 200, 200, 197, 196, 191, 195 and 193 cases. The ip-boxes obtained by the RDPMM have greater accuracy in correctly predicting the bearing condition with a combined bearing component fault than the DPMM. It may be because that the impact of the aggregated width. For the RDPMM and the DPMM, the detailed accuracy of each class is shown in Table 6. Table 6 provides information about the true positive (TP) rate, false positive (FP) rate, precision, recall, and F-measure values for the eight classes using the SVM. From Table 6, we refer that the mean of the RDPMM F-measure values is larger than the means of the kurtosis and skewness F-measure values, and the mean of the kurtosis F-measure values is larger than the means of the skewness F-measure values.
Teymourlouieet al. [33] and Yaddaden et al. [34] point out ''the basic quality measure offered by the error rate is no longer appropriate: errors are not simply present or absent; they come in different sizes''. Then, for the different ip-boxes modeling methods, the measurement errors in classification results are given in Table 7.
From Table 7, we infer that the RDPMM ip-box is the best according to all five metrics: it has the smallest value for each error measure and the largest Kappa statistic. The kurtosis ip-box is the second-best by all five metrics. The skewness ip-box is the third-best by all five metrics.
Employing the p-box modeling methods, the kurtosis, skewness, and RDPMM p-boxes were established, and its features were extracted for pattern recognition; the experimental results showed that the classification performances of the kurtosis, skewness, and RDPMM p-boxes were 83.4%, 81.5%, and 95.2%, respectively. Compared with the p-box, the improvement degree of the classification performance for the kurtosis ip-boxes is 9.91%, that of the skewness ip-boxes is 10.31%, and that of the RDPMM ip-boxes is 3.05%. It is because the overlapped regions between the ip-boxes are reduced.

D. COMPARISON OF DIFFERENT DATA PROCESSING METHODS
A comparison of different data processing studies should be undertaken to demonstrate the advantage of the method proposed by this paper. Composite multiscale fuzzy entropy is an effective method to analyze the complexity of time series in bearing fault diagnosis [35]. It can not only reflect the complexity characteristics of time series from multiple scales but also has the advantages of short data and good robustness. Fig. 11 presents the results of the composite multiscale fuzzy entropy for each bearing condition based on the current data, where some values were applicated as follows: Largest scale is 20, embedding dimension 2, a gradient of exponent function 2, and similarity tolerance 0.15SD (SD denotes the standard deviation of raw bearing data).
In Fig. 11, the fuzzy entropy of H is larger on the relatively large scale, and changes gently with the increase of the scale values; the curve of composite multiscale fuzzy entropy for other bearing conditions shows the obvious decreasing trend. In this contrastive study, the steps used in this method can be described as the follows: Firstly, Total 1600 samples were used in this study, i.e., there were 200 samples for each bearing condition; the feature set was obtained by calculating the values of composite multiscale fuzzy entropy for each bearing condition. Then, 60% of features were training set, 20% were verification set and 20% were test set. Finally, the correct classification of faults can be given by the SVM model, as listed in Table 8. For the composite multiscale fuzzy entropy, the experimental results showed that the total correct recognition rate is 83.5% in Table 8. However, compared to the correct recognition rates 93.31%, 91.81%, and 98.25% from the proposed methods in this paper, there is still room for improvement. It is because the method of composite multiscale fuzzy entropy requires additional empirical effort in the bearing fault diagnosis [35]. Additionally, Additionally, for the calculation time of finding fault, the proposed method in this paper consumes 55 seconds, which is 15% faster than the traditional method.

VI. CONCLUSION
This study presents a procedure for obtaining tighter p-boxes by reducing coincidence intervals using the cooperative optimization method and adding additional information. The machine learning method (i.e., SVM) gives the classification accuracy based on the ip-boxes.
For the iterations of the p-box and ip-box, the relative efficiencies of the DPMM and the RDPMM were calculated. The results showed that the relative efficiency, which denotes the RDPMM ip-box vs. the kurtosis ip-box, was 2.20%. For the skewness ip-box, the relative efficiency was 2.15%. Therefore, the computational efficiency of the skewness ip-box is best.
To quantitatively assess the difference between the p-box and ip-box, relative area difference and relative bound difference of the different cumulative probability density values were obtained based on the kurtosis and skewness. The experimental results showed that the aggregated width of the DPMM p-box was improved, and the improvement degree of the kurtosis p-box was larger than that of the skewness p-box. Consequently, the kurtosis ip-box is more conducive for bearing fault diagnosis than the skewness p-box without considering the efficiency of calculation.
To compare the modeling method of the p-box and the ip-box for the kurtosis, skewness, and the RDPMM, an SVM was used as the classification algorithm. The experimental results showed that the improvement degree of the classification performance for the kurtosis ip-boxes was 9.91%, that of the skewness ip-boxes was 10.31%, and that of the RDPMM ip-boxes was 3.05%. However, the total correct classification rates of the RDPMM ip-boxes were higher than those of the kurtosis and the skewness ip-boxes.