Acoustic-Based Detection Technique for Identifying Worn-Out Components in Large-Scale Industrial Machinery

This letter addresses the challenge of monitoring large-scale machine halls, particularly in the context of iron making processes. We propose an acoustic sound-based condition monitoring (ASCM) system to detect potential faults and damages in machinery. The letter focuses on selecting suitable audio features, integrating physical insights regarding the fault, and determining optimal window lengths for feature extraction. Our fault detection method utilizes outlier detection with a Gaussian mixture model trained on features extracted only from normal operating conditions. We compare conventional audio features with physically motivated features and conduct a window length analysis. The results demonstrate the effectiveness of our approach and the impact of incorporating physically motivated features for fault detection performance.


I. INTRODUCTION
In various industries, ensuring the optimal functionality of individual machines is critical for uninterrupted production processes and minimizing financial losses. Particularly in industries, such as blast furnace iron production, where a continuous production chain is essential. The preprocessing of raw materials plays a crucial role in blast furnace iron production and any shutdown should be avoided due to the significant energy and time required for the start-up process of the blast furnace [1].
Manual monitoring alone is insufficient in industries where a continuous production chain is vital. Hence, condition monitoring systems are necessary to detect potential faults and damages in machinery [2]. Traditional condition monitoring systems, which often utilize vibration sensors, may not be suitable for large-scale machinery due to the requirement for a significant number of sensors. As an alternative, acoustic sound-based condition monitoring (ASCM) systems have been proposed [3]. These systems take advantage of the fact that experienced maintenance personnel can identify indications of damage events based on the acoustic signature of machinery. The broad applicability of ASCM to different types of machines has been explored in [2], and general research on sound scene analysis and the applicability of ASCM in large machine halls has been conducted in [4].
While acoustic condition monitoring has been widely used for single machines or small machine parts, such as rotating machines, roller bearings, and tool conditions [5], [6], [7], [8], [9], [10], [11], [12], [13], there is a need to address the challenges of monitoring large-scale machine halls. These halls, such as burden preparation halls for blast furnaces, are vast and highly reverberant environments [14]. Thus, the acoustic signals consist of machine sounds and reverberations, where the activation and deactivation of specific machines for material mixing introduce variations in the operational conditions. As a result, the normal acoustic signal exhibits a wide range of variations across the signal space, indicating its nonuniqueness. Analyzing these diverse audio signals requires specialized techniques.
Most methods proposed in the aforementioned works [5], [6], [7], [8], [9], [10], [11], [12], [13] leverage acoustic knowledge of the fault case, particularly machine learning-based approaches that rely on a large amount of data in case of faulty machinery. However, our particular scenario presents a challenge as we have limited access to a substantial amount of audio data for fault cases. In addition, the fault signal is substantially overwhelmed by the background sound, indicating the challenging nature of dealing with a weak signal problem. Anomalous sound detection (ASD) methods have been proposed as effective solutions for detecting abnormal sounds in various applications, including small-scale industrial applications [15], [16], [17]. One key advantage of these methods is their ability to detect anomalies using only data from the normal state, employing outlier detection techniques.
In this research, we propose a fault detection system for huge industrial machinery based solely on normal operation state data. We place emphasis on the selection of appropriate audio features, utilizing both standard audio features and physical motivated audio features. A comprehensive analysis was conducted to assess the suitability of different combinations of audio features for such a detection system. In addition, we determined the optimal window length for data framing during the preprocessing stage and our findings indicate that incorporating physical knowledge can provide insights into determining the optimal window length.

II. MEASUREMENT SYSTEM AND DATA ACQUISITION
To perform audio measurements in a burden preparation hall, a microphone array was developed taking the challenging environmental conditions into consideration. The volume level in the hall was found to be consistently high, averaging around 95 dB SPL. In addition, the hall is subjected to high dust loads, which necessitated the use of a dust-protected setup and microphones with a high dynamic range.

A. Microphone Array
To conduct audio measurements in the burden preparation hall, we set up an eight-microphone array using Behringer ECM8000 measurement condenser microphones. These microphones were chosen due to their high dynamic range and flat frequency response, which range from 20 to 20 000 Hz, making them well suited for our application. The audio signals were recorded using an 8-channel Zoom F8n MultiTrack Field Audio Recorder with a sampling rate of 48 kHz. Over multiple weeks, each hour a 20 s sample was stored as a multichannel WAV-file. To ensure an autonomous operation, a measurement laptop with remote access was used to operate the system. The arrangement and orientation of the microphones can be seen in Fig. 1, while a foam cover was used to protect the microphones from the high levels of dust present in the hall. To safeguard the sensitive components of the system, such as the audio recorder and measurement laptop, they were housed in a dust-protected box.

B. Measurements and Signal Analysis
The acoustic signal in the hall is a combination of multiple sound sources, including 13 sieves processing different materials. The operating states of these sieves, which are turned-ON or OFF based on blast furnace requirements, result in various combinations with different acoustic signatures. A previous study [4] demonstrated that the acquired acoustic signals are short-time stationary signals, making power spectral density (PSD) analysis suitable for investigating the frequency characteristics. The study in [4] extensively showed that normal operating states exhibit different PSDs, indicating a wide coverage of the signal space and diverse acoustic characteristics within normal operation.
Obtaining audio files of actual fault cases is crucial for testing the proposed ASD system. However, such data are often not readily available and despite our long term automated measurement routine, we could not capture a fault case. To overcome this limitation, we conducted fault tests on the machines and recorded the resulting audio in an emulated setting. Common fault scenarios, such as spring ruptures, involve the collision of loose metal parts with the machine during operation, generating a distinctive knocking sound. To replicate this sound, fault case experiments were performed with the assistance of experienced personnel, where a 100 × 200 × 10 mm steel plate was placed against the machine during operation. Care was taken to ensure that the resulting sound closely resembled that of a real fault case. Fig. 2 illustrates the experimental setup and displays the PSDs of the normal operating condition just before the fault experiment and the simulated fault. This approach facilitated the collection of multiple datasets containing simulated yet authentic-sounding fault cases.
As depicted in Fig. 2, the level increase due to the fault mainly appears in the frequency range between 300 and 10 000 Hz. Consequently, for all subsequent analyses and assessments, we applied bandpass filtering to the recorded data, exclusively focusing on this specific frequency range.

C. Characteristics of Fault Case
The objective of analyzing audio data from fault cases is to extract relevant physical features and assess whether their incorporation enhances the effectiveness of fault detection of an ASD system. By identifying fundamental underlying physical properties, our aim is to integrate them into the fault detection process for improved performance.
Since the knocking in the fault data is expected to have periodic characteristics due to the periodic oscillation of the sieve, we conducted further analysis using the pitch detector [18] T where I ( f i ) is the periodogram of the data vector x, M being the discrete period of the signal in samples, and f i = i/M are the frequencies in the interval [0, 1/2]. Evaluating T (x) for different periods M and determining the maximum value gives access to the fundamental frequency if a periodic signal is present. The underlying principle of this approach involves summing the higher harmonics and the fundamental frequency in the spectra to improve the SNR for accurate fundamental frequency estimation and detection. This test statistic can be derived by hypothesis testing, taking into account the structure of the PSD of a periodic signal. Now, to detect the presence of a periodic signal, the test statistic is compared with a threshold γ , and it is decided that a periodic signal is present if T (x) > γ . In such scenarios, where background noise exhibits significant variations in an unknown manner, we suggest employing a constant false alarm rate detector to determine the threshold parameter γ [19]. For a more comprehensive understanding of pitch estimation and detection, see [18] and [20]. In comparing the estimated pitches between normal and fault cases, we observed a fundamental frequency corresponding to the sieve frequency in both scenarios. However, it is important to note that fault cases exhibited higher amplitudes to some extent, making them potential indicators of damage. Nonetheless, relying solely on pitch for fault detection is insufficient due to the overlap inside the feature space. Analyzing the PSDs revealed that fault cases present as bandpass signals. Based on these analysis, the utilization of bandpass energies, that can be extracted by employing a filterbank, as physically motivated features can be inferred. Combining these features with pitch estimates offers a promising approach for fault detection. Furthermore, our investigations into pitch detection and estimation also provided valuable insights into determining the optimal window size for feature extraction.

III. FAULT DETECTION METHOD
This section presents a comparative analysis in the context of an ASD-based approach, examining the effectiveness of different audio features and window lengths. A Gaussian mixture model (GMM) is trained on feature vectors extracted from a training set, exclusively composed of data from normal cases. Thus, the GMM serves as a statistical representation of the normal operating conditions in the feature space. GMMs are known for their ability to capture complex decision boundaries [15], making them well suited for the ASD application. Such systems use outlier detection methods based on probability values and a threshold associated with feature vectors and the model. Instead of employing a fixed threshold for outlier detection, we employ a receiver operating curve (ROC) analysis to evaluate the system's overall performance across various threshold values, providing a comprehensive assessment of its fault detection capabilities. Such a system is schematically depicted in Fig. 3. Consequently, we conducted a comparison between systems utilizing conventional audio features and those employing the aforementioned physically based features. One limitation of using common features (CF) lies in the challenge of determining which specific features should be utilized. In addition, there is the issue of selecting appropriate parameters, such as windowing length, among others. Hence, utilizing physically motivated features based on the properties of fault signals provides a more intuitive and straightforward approach.

A. Window Length Analysis
The initial step in feature extraction involves framing the audio data using a suitable window function and length. According to [21], an antitruncation window is recommended. Therefore, we opted to utilize a Hamming window. To determine the optimal window length, we conducted a comprehensive analysis with different window sizes that encompassed the entire ASD system, utilizing the areas under the receiver operating curve (AUC) as the criterion for optimization. The length analysis can be observed in Table 1, which shows that the performance exhibits a high degree of similarity, yet it also illustrates that the optimal window length for our specific purpose corresponds to the period of the sieve vibration (80 ms), which can be estimated through pitch estimation.
For all subsequent analysis, we employ a fixed window length of (80 ms).

B. Performance Comparison of the ASD System With Different Audio Features
For the ASD model, we have selected a set of features commonly employed in audio signal processing [21]. In addition, we leverage the estimated pitch, its corresponding amplitude and bandpass energies as audio features in our analysis. A comprehensive list of the features utilized in our analysis can be found in Table 2.
The audio features mentioned above were extracted from the dataset, specifically capturing sounds under normal operating conditions, and then divided into a training set (3000 windows of data) and a test set (3700 windows of data). In addition, extracted audio features from the fault case were added to the test set (1400 windows of data). The extracted features were utilized to construct feature vectors. Different feature combinations were used to construct distinct feature vectors, and for each combination, a GMM was fitted to the training set. This exploration aimed to assess the effectiveness of different feature types in the ASD approach. The testing dataset was then utilized to evaluate the fault detection performance of the trained GMMs. The performance evaluation included a detailed analysis using ROC, and the results are presented in Fig. 4. Furthermore, Table 3 gives the calculated AUC.
To interpret the AUC value, a rule of thumb according to [22] is that a value around 0.9 indicates excellent discrimination of the data. While the physically motivated features (PF + BE) showed good performance, the most effective results were obtained by using mel frequency cepstral coefficient (MFCCs) as features for our specific task. This finding highlights the practical value of MFCCs in capturing relevant information necessary for accurate fault detection within our context. Our investigations revealed that incorporating additional types of features alongside MFCCs did not yield any notable enhancements in the overall performance of the system. Nevertheless, the system utilizing physically motivated features exhibits a high level of detectability and offers the advantage of providing a more interpretable representation of the data.

IV. CONCLUSION
In conclusion, our study reveals several findings. First, we found that the proposed ASD system is well suited for fault case detection in our specific application. Second, we observe that physical motivated features exhibit a better performance compared with CF, confirming their effectiveness in capturing relevant information for fault detection. In addition, while the MFCCs demonstrate only a slight improvement over the physical motivated features, they still emerge as the top-performing features for our specific task.
The physically motivated features have better interpretability and are easier to calculate, making them practical for implementation. The pitch information helps determine the optimal window length for both CF and MFCCs. These findings improve feature selection and contribute to more understandable and efficient fault detection systems in industries. The proposed system is the starting point for a complete detection system. It has good detection rates but a notable false positive rate as well. To improve the system, we are working on using the continuous sound produced during failures and considering time-related information. This is an ongoing research focus.