Development of Automatic Wheeze Detection Algorithm for Children With Asthma

Asthma is a symptom of tracheal obstruction caused by bronchospasm, and it is among the most prevalent chronic obstructive pulmonary diseases. Auscultation is the most commonly used approach for the clinical diagnosis of asthma. However, recognizing wheezes through auscultation requires experienced physicians, and this approach is not sufficiently objective. Therefore, developing a method for recognizing wheezes objectively is crucial. Most studies have used the spectral features of lung sounds to detect wheezes; however, they have not achieved sufficiently high performance owing to the poor discrimination of spectral features. Several studies have attempted to extract wheezing features from lung sound spectrograms; however, their approaches were easily affected by variations in the wheezing frequency and background noise. The present study proposes a novel automatic wheeze detection algorithm for extracting lung sound features in the time–frequency domain and automatically detecting wheezes. The proposed algorithm applies canonical correlation analysis to successfully detect wheezing features in a lung sound spectrogram. Moreover, a neural network technique is used to effectively classify healthy and wheezing sounds. The experimental results indicated that the proposed algorithm showed excellent performance in detecting wheezing.


I. INTRODUCTION
Asthma frequently presents as airflow obstruction, shortness of breath, and intermittent wheezing during infancy or childhood [1]. It is a highly prevalent chronic obstructive lung disease and associated with a heavy burden of healthcare costs, and it is among the top 20 chronic conditions globally for disability-adjusted life years in children [2]. From 1990 to 2015, the worldwide prevalence of asthma The associate editor coordinating the review of this manuscript and approving it for publication was Vishal Srivastava. increased by 12.6% to 358.2 million individuals [3], and in 2014, approximately 334 million people had asthma worldwide [2]. In clinical practice, wheezing is defined as a type of continuously abnormal lung sound with a specific tone [4], and it can be considered indicative of the degree of airway obstruction [5]. Asthma typically presents with a high-pitched whistling (wheezing) sound. When asthma becomes severe, it may result in dyspnea, asphyxia, or other life-threatening situations [6]. Therefore, investigating methods for evaluating the asthma state is crucial. However, the diagnosis of the asthma state is generally based on the auscultation method that depends on the expertise of the physicians [7], and an objective judgment to evaluate the asthma state remains lacking [8].
Studies have proposed time-domain, frequency-domain, and spectrogram approaches for analyzing the abnormal lung sounds of patients with asthma. Regarding time-domain approaches, in 1977, Murphy et al. examined lung sounds by using a time-expanded waveform analysis and identified the time-domain waveform feature to distinguish different respiratory diseases [9]. Kiyokawa et al. obtained hourly nocturnal wheezing count (NWC) patterns by recording intermittent tracheal sounds to detect bronchoconstriction during sleep. They revealed that NWC was positively correlated with the severity of wheezing and could be used to evaluate the level of bronchoconstriction [10].
Concerning frequency-domain approaches, in 1984, Cohen and Landsberg obtained the linear prediction coefficients of the lung sound power spectrum and the ratio of the peak value to the root mean square value in the lung sound envelope to classify normal and abnormal breathing sounds [11]. In 1995, Malmberg et al. used the peak frequency and median frequency of the lung sound spectrum to investigate the lung characteristics of patients with chronic obstructive pulmonary disease (COPD) [12]. In 2004, Corbera et al. proposed a local adaptive wheeze detection algorithm for detecting wheezing features during forced exhalation from the lung sound spectrum [13]. However, the relatively low distinguishability between the peak and the median frequencies of normal and abnormal lung sounds usually reduces the efficiency of such approaches. In 1985, Fenton et al. estimated the spectral features of wheezing and used the ratio of the wheezing duration to the inspiration and expiration duration to evaluate the severity of bronchial obstruction [14]. However, the variation of wheezing features easily affected the reliability of the estimated wheezing duration. In 2011, Jin et al. extracted spectral derivative (SD) features from a lung sound spectrogram to estimate the duration and frequency of wheezing sounds [15]. However, SD features were easily affected by background noise, resulting in errors in distinguishing mild wheezing sounds from normal respiratory sounds. In 2020, Habukawa et al. developed a rule-based wheeze recognition algorithm for children. This algorithm monitors whether the local maximum fast Fourier transform values continue for more than 100 ms [16]. However, its results are easily affected upon inputting respiratory sounds with an unstable volume.
Regarding spectrogram approaches, in 2009, Riella et al. used digital image processing techniques to extract spectral projections from a lung sound spectrogram to identify wheezing sounds [17]. However, the use of two-dimensional convolution masks increased the background noise in the spectrogram and thus affected the correctness of the extracted features, especially in patients with mild asthma. Lin et al. extracted several features from a spectrogram and used them as inputs in a back-propagation neural network for automatic wheeze detection [18]. Their automatic wheeze detection system uses the order truncate average (OTA) method to suppress noise and strengthen wheezing signals. However, the OTA method has a very high computational cost. In 2019, Shi et al. proposed a deep learning approach that combines the VGGish network with a bidirectional gated recurrent unit neural network to recognize lung sounds [19]. In 2019, Demir et al. proposed an efficient convolutional neural network (CNN)-based approach for classifying lung diseases [20]. They combined the CNN with a support vector machine (SVM) classifier to classify the spectrogram of lung sounds. In 2020, Sadi and Hassan proposed a deep learning approach that combines a CNN model with the Mel-frequency cepstral coefficient to classify wheezes and crackles [21]. The aforementioned deep learning approaches [19]- [21] have great potential for classifying images created from visual representations of audio. However, they require large quantities of training data to increase their precision and overall accuracy. Insufficient data may produce unsatisfactory results [19]- [21].
To overcome these limitations, the present study proposes a novel automatic wheeze detection algorithm. To reduce the influence of noise and background lung sounds and the variation of wheezing features in the lung sound spectrogram, the canonical correlation analysis (CCA) technique, a statistical method for determining the underlying correlation between two datasets, is used to examine the continuity of wheezing features in the lung sound spectrogram to effectively extract wheezing information. Finally, a neural network is used to classify the wheezing and normal lung sounds from the extracted lung sound features. The experimental results revealed that the respiratory rate, sound index, breathing cycle period, inspiratory duration, expiratory duration, maximum peak frequency, wheezing duration, and wheezing frequency could exactly reflect the wheezing characteristics of children with asthma; moreover, the proposed system showed excellent performance in wheeze detection.

A. DESIGN OF LUNG SOUND RECORDER
In this study, a self-assembled lung sound recorder was designed and implemented to collect lung sounds. Fig. 1 presents the recorder's system architecture and photograph. It mainly comprises a stethoscope bell, microphone, and wireless signal acquisition module. The stethoscope bell (3M Littmann Master Classic II 2146, 3M, Maplewood, MN, USA) collects lung sounds, which are converted into electrical signals through the microphone (JL-0627C, Jeou Luen Technology Co., Taiwan). Next, the lung sound signals are amplified, filtered, and digitized using the wireless signal acquisition module, and they are then wirelessly transmitted to the back-end host system via Bluetooth (RN4678, Microchip Technology, Taiwan). In this study, the gain of the used amplifier was set to 390 V/V with a frequency band of >150 Hz, and the sampling rate of the analogto-digital converter built into the microprocessor (RX210, Renesas, Japan) in the module was set to 2 kHz. A lung sound monitoring program built into the back-end host system was designed to receive, display, and store the received lung sound signals in real time; it can also enable the user to play the recorded lung sounds in real time.

B. AUTOMATIC WHEEZE DETECTION ALGORITHM
A wheezing sound is a continuous adventitious lung sound caused by airway obstruction, and it can be observed in patients with asthma or COPD. The wheezing feature has a time-varying narrow line pattern (from 250 to 800 Hz), and its duration may exceed 250 ms in a lung sound spectrogram [22], [23]. The proposed automatic wheeze detection algorithm ( Fig. 2) was designed to extract wheezing features from a lung sound spectrogram and to detect wheezing sounds. Accordingly, in the algorithm, the breathing cycles must be segmented first. The raw lung sound is preprocessed using a bandpass filter (150-1000 Hz) to preserve meaningful lung sound features and to reduce the influence of the heart, blood, and muscle sounds. The sound is then rectified and integrated to obtain the envelope of the lung sound. Next, the local minima of the lung sound envelope are estimated, and the segment between the two nearest local minimums is considered a breathing cycle. Next, the lung sound features, including the respiratory rate, breathing cycle period, inspiratory and expiratory durations, sound index, maximum peak frequency, and wheezing feature duration and frequency in the lung sound spectrogram are extracted. In this study, the respiratory rate was defined as the number of breathing cycles within 1 min. The breathing cycle period was defined as the period between the onset point and the offset point in a breathing cycle. The sound index, related to the lung sound power, was defined as the sum of the absolute amplitudes of the lung sound within a breathing cycle.
For wheezing feature detection in the algorithm, the CCA technique [24], [25] is used to estimate the continuity of the wheezing feature in the lung sound spectrogram. This technique affords a powerful feature analysis for determining and quantifying the maximum correlation between multichannel samples in the time-frequency domain. Before shorttime Fourier transform is used to obtain the lung sound spectrogram, additive white Gaussian noise must be applied in the raw lung sound to disrupt the continuity of the normal lung sound feature and to highlight the wheezing features in the spectrogram. Assume that the lung sound spectrogram S ls = (S 1 , S 2 , . . . , S k , . . . , S N ) contains N power spectral density (PSD) vectors; here, S k denotes the PSD vector obtained from the kth time interval. The total covariance matrix M tc_S k S k+1 between S k and S k+1 can be expressed as follows: (1) where M S k S k and M S k+1 S k+1 are the within-set covariance matrices of S k and S k+1 , respectively, and M S k S k+1 = M T S k+1 S k is the between-set covariance matrix. The canonical correlations between S k and S k+1 can be obtained by solving the eigenvalue equations: where the eigenvalues λ 2 k are the square canonical correlations and the eigenvectors V S k and V S k+1 are the normalized canonical correlation basis vectors [25]. The maximum canonical correlation ρ k between S k and S k+1 can then be defined as the maximum value of λ k , and it should exhibit a relatively stable and high value for wheezing sounds owing to the continuity of the wheezing feature in a lung sound spectrogram, as displayed in Fig. 3. In Fig. 3(a), the wheezing sound presents a time-varying narrow line pattern in the lung sound spectrogram, and its frequency range is 250-400 Hz and duration is approximately 750 ms. Moreover, the maximum canonical correlations at the segment of the wheezing sound are shown to be higher than others. If these correlations are continuously higher than the given threshold for over 100 ms, then they are considered to reflect a wheezing feature. In this study, the given threshold was defined as the sum of the average and standard deviation of all maximum canonical correlations within a breathing cycle. After the detection of the wheezing feature, its duration and frequency in the lung sound spectrogram could be estimated.
A radial basis function neural network (RBFNN) affords advantages such as a simple structure, fast training process, and excellent approximation capability. Accordingly, it is applied in the algorithm to classify normal lung sounds and wheezing sounds. The RBFNN structure contains three layers: input (N 0 neurons), hidden (N 1 neurons), and output (one neuron) layers, as illustrated in Fig. 4. In this study, a k-means clustering algorithm [26] and normalized least mean square algorithm [27] were used to train the center vectors in the hidden neurons and the weight vector between the hidden neurons and the output neuron, respectively. In the training procedure, the desired RBFNN outputs for wheezing and normal sounds were set to 1 and 0, respectively. If the RBFNN output was larger than the given threshold, it was classified into the wheezing sound group; otherwise, it was classified into the normal lung sound group.

C. EXPERIMENTAL DESIGN
This study enrolled 95 children (aged 0 to 11 years) from Kaohsiung Chang Gung Memorial Hospital (CGMH) as participants. Of these children, 63 were healthy (40 boys and 23 girls) and 32 were patients with asthma (23 boys and 9 girls). In clinical diagnosis, the wheezing grade is usually used to evaluate the asthma state. The patients with asthma were classified into three groups according to their asthma states: level 1 (4 people), level 2 (17 people), and level 3 (11 people). The experimental protocol of this study was approved by the Institutional Review Board (IRB) of Kaohsiung CGMH, Taiwan (IRB number: 104-2415B). The parents of all participants provided informed consent. In this experiment, the self-assembled lung sound recorder was placed on the first intercostal space of the upper right anterior chest to collect lung sounds for 20 s. Analysis of variance was used to analyze the significance of differences, and statistically significant differences were defined as p < 0.05.

A. LUNG SOUND FEATURES CORRESPONDING TO DIFFERENT GROUPS
First, the segmentation performance for breathing cycles was evaluated. Accordingly, 500 breathing cycles were randomly selected from the database for testing. The segmentation accuracy for breathing cycles was approximately 99.6%. Subsequently, the lung sound features corresponding to the various study groups were investigated. The respiratory rate, sound index, breathing cycle period, inspiratory duration, expiratory duration, and maximum peak frequency for the VOLUME 9, 2021 various groups are presented in Figs. 5(a)-5(f). The respiratory rates and sound indices for the asthma groups were significantly higher than those for the healthy group, and the value of the sound index increased with the wheezing grade. The breathing cycle periods for all asthma groups were significantly shorter than those for the healthy group. The difference between the inspiratory durations of different groups was nonsignificant. The expiratory durations for all asthma groups were significantly shorter than those for the healthy group. The maximum peak frequencies in the lung sound spectra for all asthma groups were also significantly higher than that for the healthy group. However, the differences between all lung sound features of the level 3 asthma group and level 1 and 2 asthma groups were nonsignificant. Figs. 6(a) and 6(b) display the duration and frequency of the wheezing features in the lung sound spectrogram corresponding to different groups, respectively. The differences between the wheezing durations of the level 3 asthma group and the level 1 and 2 asthma groups were nonsignificant. The differences between the wheezing frequencies of the level 3 asthma group and the level 1 and 2 asthma groups were significant.

B. PERFORMANCE OF RBFNN FOR DETECTING WHEEZING SOUNDS
The performance of the RBFNN in classifying lung sounds in the asthma groups (levels 1, 2, and 3) and healthy group was investigated. Accordingly, significant lung sound features, including the respiratory rate, sound index, breathing cycle period, expiratory duration, maximum peak frequency, wheezing duration, and wheezing frequency, were used as the input of the RBFNN. The number of hidden neurons was set to 16, 32, 64, and 128, and the thresholds were from 0.1 to 0.9. To evaluate the performance of the RBFNN in classifying lung sounds, several binary classification parameters must be defined first: true positive (TP), meaning that the wheezing sound was correctly classified as a wheezing sound; false positive (FP), meaning that the normal lung sound was incorrectly classified as a wheezing sound; true negative (TN), meaning that the normal lung sound was correctly classified as a normal lung sound; and false negative (FN), meaning that the wheezing sound was incorrectly classified as a normal lung sound. The F-measure, which is the harmonic mean of precision (positive predictive value, PPV) and recall (sensitivity, also called TP rate, TPR), was used to determine the optimization threshold. It can be calculated as follows: The experimental results revealed that the RBFNN exhibited optimal performance in classifying lung sounds when the hidden neuron number and threshold were set to 64 and 0.3, respectively: F-measure = 95.2%, PPV = 96.8%, sensitivity = 93.8%, and accuracy = 96.8 % (Table 1). Fig. 7 shows the RBFNN outputs for the various groups. The difference between the RBFNN outputs for the healthy group and asthma groups was significant.

IV. DISCUSSION
Asthma is caused by chronic inflammation of the bronchi and bronchioles. This leads to the increased contractility of the surrounding smooth muscles, in turn leading to airway stenosis and typical symptoms such as wheezing [28]. Asthma is usually associated with anxiety, dehydration, infection, and other disturbances [29], and it manifests through increased shortness of breath, cough, chest tightness, or some combination of these symptoms [30]. Therefore, breathlessness might be a major factor causing the higher respiratory rates and shorter breathing cycle periods for all asthma groups, as presented in Fig. 5. The inspiratory durations for all asthma groups were nonsignificantly higher than those for the healthy group; however, the expiratory durations for all asthma groups were shorter. This might be because patients with asthma easily encounter the problem of conspicuous inspiratory airflow limitation [31], [32] and require a sufficient inspiratory time and a shorter expiratory time to exchange sufficient fresh oxygen. Moreover, airway obstruction caused by narrowing bronchial airways leads to hyperventilation and a considerable increase in airflow sound intensity, as reflected by the increase in the sound index. Previous studies [33]- [36] have indicated that wheezing was unequivocally associated with airway obstruction, and the relationship between the severity of asthma and the breathing airflow was significant [37]- [40]. In the present study, the increase in breathing airflow and the narrowing trachea resulted in the higher maximum peak frequency in the asthma groups and even produced specific high-frequency tones for longer than 100 ms (Fig. 6) [41]. The wheezing frequency of the asthma level 3 group was significantly higher than those of the asthma level 1 and 2 groups. This reflects the relationship between asthma severity and the increasing airflow and narrowing trachea.  Previous studies have proposed several wheeze detection methods. Table 2 presents a comparison of the proposed algorithm with these other methods. In 2015, Wiśniewski et al. proposed an acoustic sound analysis approach for recognizing asthmatic wheezing sounds [42]. In this study, the changes in the audio spectral envelope (ASE) and tonality index (TI) were defined as the spectral features of lung sounds. Here, TI indicated the flatness of the lung sound spectrum, and it was used to quantize wheezes in the spectrum. In the approach proposed by Wiśniewski et al., an SVM with a polynomial kernel is used as a classifier, and its recognition rate (accuracy) was approximately 93%. However, the ASE feature for single-tone wheezes was easily affected by background breathing sounds. In 2016, Lozano et al. used ensemble empirical mode decomposition (EEMD) and the Hilbert transform to differentiate continuous adventitious VOLUME 9, 2021  sounds (wheezes and rhonchi) from respiratory sounds [43]. Specifically, they used EEMD to decompose the respiratory sounds into several signal components and the Hilbert transform to calculate the instantaneous phase of these signal components. Next, the instantaneous frequency and envelope were calculated from the obtained instantaneous phase to provide more detailed information about the sound vibrations. The instantaneous frequency and envelope were used as the SVM input, and the accuracy of the SVM was approximately 94.6%. In 2017, Oletic and Bilas. used a hidden Markov model forward-backward algorithm to estimate the occurrence probabilities of wheezing in a lung sound spectrogram and then used a sequential hypothesis testing method to estimate the beginning and ending times of wheezing [44]. The accuracy of this method was nearly 93.4%. However, the sensitivity of the sequential hypothesis testing method to wheezes was easily affected by background breathing sounds. In this study, each breathing cycle could be automatically segmented, and several lung sound features in the time and frequency domains and wheezing features in the lung sound spectrogram could be extracted. Moreover, in contrast to the aforementioned methods, the influence of background breathing sounds and environmental noise on the performance of the algorithm in extracting wheezing features from the lung sound spectrogram could be effectively reduced by using the CCA technique. The accuracy of the proposed automatic wheeze detection algorithm was over 95%.

V. CONCLUSION
In this study, a novel wheeze detection algorithm was developed for automatically extracting lung sound features and recognizing the wheezing sounds of children with asthma. Compared with other wheezing analysis methods, the CCA technique could effectively reduce the influence of environmental noise and background breathing sounds and detect wheezing features in a lung sound spectrogram. The experimental results indicate that except for the inspiratory duration, most of the lung sound features for all asthma groups, including the respiratory rate, sound index, breathing cycle period, expiratory duration, maximum peak frequency, wheezing duration, and wheezing frequency, were significantly different from those of the healthy group. Moreover, the RBFNN with extracted lung sound features exhibited excellent performance in classifying normal lung sounds from wheezing sounds (accuracy = 96.8%). Therefore, the proposed algorithm can enable efficient wheeze detection in children with asthma and has potential for use in evaluating the severity of wheezing in the future.  He is currently a Professor with the Department of Computer Science and Information Engineering, National Taipei University, Taiwan, where he is also currently the Director of the Computer and Information Center. His research interests include the areas of smart medicine, embedded systems, wearable systems, biomedical signal processing, biomedical image processing, and portable biomedical electronic system design.
YAN-DI WANG received the M.S. degree from the Institute of Imaging and Biomedical Photonics, National Chiao Tung University, Taiwan. His research interests include the areas of biomedical circuits and systems, biomedical signal processing, and biomedical optoelectronics. He is also a fellow of the Institution of Engineering and Technology (IET), U.K. His current research interests include biomedical circuits and systems, biomedical signal processing, and biosensor.