Respiratory Sound Classification: From Fluid-Solid Coupling Analysis to Feature-Band Attention

Based on respiratory sound production mechanisms, we study the relationship between airflow characteristics in the bronchi and sound pressure spectrum curves to implement an end-to-end respiratory sound classification system with a feature-band attention module. First, we analyse fluid-solid coupling simulations of the bronchi and execute acoustic simulations to obtain the spectrum curves of the bronchi at the sound pressure level. Then, based on the spectrum characteristics of the bronchi, we propose an attention strategy to refine the acoustic features with adaptive weights. In addition, we introduce a feature-band attention module to ResNet-based networks with a squeeze-and-excitation block. Finally, we perform experiments on the ICBHI public database to classify respiratory sounds into one of four classes: normal, wheezes, crackles, and both (wheezes and crackles). The results show that our proposed system exhibits superior performance compared with the baseline system. This type of feature learning strategy is useful for exploring the distinct characteristics of different types of respiratory sounds.


I. INTRODUCTION
In recent years, chronic respiratory diseases have spread all over the world with high prevalence rates and recurrent attacks. Asthma and chronic obstructive pulmonary disease (COPD) are the main representative diseases [1]. Patients with asthma and COPD require long-term treatment and daily monitoring, which is challenging due to limited medical resources, as clinicians must auscultate for respiratory sounds during each visit at internal medicine departments. Fortunately, with the rapid development of machine learning technology, many automatic respiratory sound classification systems have been launched to pave a new way forward in aiding clinical diagnoses and treatment of respiratory diseases [2].
The commonly used framework for respiratory sound classification includes two parts: feature extraction for respiratory sound signals and a classification model. For feature The associate editor coordinating the review of this manuscript and approving it for publication was Orazio Gambino . extraction, many methods have been adopted for extracting acoustic features, including entropy-based features [3], Short-Time Fourier Transform (STFT) [4], spectrogram [5], Mel-Filter banks (FBank) [6], Wavelet analysis [7], Perceptual Linear Prediction (PLP) [8], and Mel-Frequency Cepstral Coefficient (MFCC) [9]. For the classification model, some discriminative models and generative models have been employed, such as the k-Nearest Neighbour (k-NN), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), and Support Vector Machine (SVM). As early as 1997, some experts began using autoregressive models for extracting spectrum features from respiratory sounds, and they established multiple k-NN classifiers to recognize respiratory sounds [10]. Improved performance was achieved by HMM-based and SVM-based systems [11]- [13].
Recently, deep neural networks have achieved promising performance, and continuous optimizations have been made in algorithms and applications. In our previous work [14], we combined HMM and deep neural network architecture to build a classification model for respiratory sounds.
Some studies have focused on transfer learning strategies by utilizing the pretrained VGG16 model on image datasets [15]- [17]. More complex CNN-based systems with a model tuning strategy and a data augmentation technique were proposed to obtain better performance in the respiratory sound classification task [18]- [20]. However, the abovementioned systems directly adopt conventionally used acoustic parameters in the speech classification field or spectrogram representation for image classification. To improve the performance of respiratory sound classification systems, we aim to explore the innate characteristics of respiratory sounds by physically modelling the bronchi.
In the geometric modelling of bronchi, scholars started with the bronchi's geometric characteristics and modelled the bronchi through a variety of methods. For example, they studied airflow movement and particle deposition in the trachea. In 1982, Haselton and colleagues simulated the airflow state in the windpipe by using a symmetrical bifurcation bronchi model [21]. In 1989, based on the bifurcated pipe model, Snyder and Olson focused on the flow velocity distribution of the airflow near each bifurcation point. The research results show that strong shear stress may directly cause the airflow to produce secondary flow vortices [22]. Cheng et al. simulated particle deposition through oral and bronchi models [23].
After conducting geometric modelling of the bronchi, the airflow simulation process requires the related theory of fluid mechanics. In 1755, the Euler equation [24] was proposed to describe fluid motion under ideal conditions. Because the Euler equation assumes that fluid is a non-viscous ideal object, its applications in practical engineering are limited. In the 19th century, Navier and Stokes considered the viscous fluid movement in the boundary layer and defined the Navier-Stokes equation, also referred to as the N-S equation [25], which is the theoretical cornerstone of modern fluid mechanics. However, the N-S equation solution is also a challenge. Experts recommend turbulence simulation methods, including direct numerical simulation and indirect numerical simulation. Examples of indirect simulation methods are the Large Eddy Simulation (LES) and Reynolds averaged Navier-Stokes (RANS). The k − ω model and the k − ε model are two mainstream modelling solutions in the RANS method, in which k represents turbulent kinetic energy, ω is the frequency of turbulent decay processes, and ε denotes the turbulent energy dissipation rate. For the law of airflow movement in the bronchi, the above numerical simulation methods are usually used. For example, Zhao and colleagues used the k−ω model to analyse airflow characteristics in the upper respiratory tract of the human body [26]. Mihai Mihaescu et al. simulated the established human airway model and found that the RANS method was not suitable for predicting the anisotropic fluid movement. For the airflow simulation of the human bronchial airway, the LES large eddy simulation method captures microscopic characteristics of airflow and fluid movement [27], [28].
However, most of the above numerical simulation studies on bronchial airflow patterns ignored the influence of bronchial tube deformation and fluid-solid coupling and directly set the bronchial wall structure as a rigid wall. In actual situations, even small movements of the bronchial wall have a great impact on airflow in the bronchial tubes. Therefore, it is beneficial to consider the influence of the wall on bronchial airflow under actual physiological conditions within the human body. To complement this research gap, we modelled the bronchi with fluid-solid coupling and simulated the airflow in the bronchi, which provides basal data for subsequent acoustic modelling.
The acoustic characteristics of respiratory sound signals have a strong correlation with airflow characteristics in the lungs. However, most of the early studies on acoustic modelling of bronchi only analysed the relationship between respiratory sound production mechanisms and bronchial airflow but failed to map relationships between respiratory sound signal pressure levels and bronchial airflow. For example, Forgacs analysed the sound generation mechanisms of different types of respiratory sounds in the lungs, classified them based on corresponding airflow patterns in the tube, and found that there are three modes of bronchial airflow: laminar, turbulent, and vortex. The research results of Hardin et al. on vortex vocalisation in the respiratory system further demonstrate Forgacs' vortex theory, which illustrates that the vortex phenomenon is generated when the airflows from the small bronchi to the large bronchi [29]. The respiratory sound source is broadband noise, so the vortex is the main source of respiratory sounds.
In response to the problem of continuous wheezing, Vaz and Thakor proposed the airway tremor theory, which reveals the relationship between wheezing and airflow [30]. Xu et al. used the finite difference method to study the production model and further clarified the theoretical production mechanism of wheezes; wheezes are produced by the interaction between the airway wall and airflow during the movement of respiratory airflow [31]. In 2017, Messner et al. used linear predictive cepstral coefficients and polynomial regression to map the relationship between intrabronchial airflow and generated respiratory sounds [32], and they detected the respiratory phase of respiratory sound signals. It is assumed that the airway pathophysiology can be detected to diagnose respiratory diseases [33].
In the field of aeroacoustics, the Ffowcs Williams-Hawkings (FW-H) equation is usually used to determine the relationship between the sound pressure of sound signals and airflow movement [34]. Zhang et al. combined the LES and FW-H equations to predict the noise spectrum of the wings of an aircraft [35]. After using different turbulence calculation methods to study the influence of turbulence on an aircraft's wings, Li et al. used the FW-H method to investigate the influence of the thickness of the wings' trailing edges on noise characteristics [36]. These studies in the field of aeroacoustics inspired us to map the relationships between bronchial structures and respiratory sound acoustic characteristics. Without considering the bronchial wall, we mapped the relationships between bronchial structures and respiratory sounds and achieved some conclusions [37]. We plan to further explore the aeroacoustic phenomenon with the effect of bronchial walls.
In this paper, we study respiratory sound production mechanisms based on physical models of bronchi and airflow movement. We simulated airflow characteristics in the bronchi and the sound pressure spectrum. Based on the relationships between acoustic spectra and lung pathology, we further developed an end-to-end respiratory sound classification system with a feature-band attention module. This paper is organized as follows. Section II describes the related formulas for bronchial and acoustic modelling and the baseline system in this study. Section III describes the methods proposed in this study, including bronchial physical modelling and optimized automatic classification systems. Section IV illustrates the experimental settings and analyses the simulation results of physical and acoustic modelling. Section V discusses the performance of respiratory sound classification systems. Finally, Section VI concludes the paper.

II. RELATED WORKS
To better understand the mathematical content in this paper, we define consistent notations. The subscript n denotes the direction outside the wall of the tube. The subscripts i, j, and k denote the directions of fully Cartesian coordinates. The vectors with the superscript b represent the parameters of the bronchial tube wall.

A. FLUID CONTROL EQUATION
Airflow in the bronchi is in a roughly viscous flow state, which is isothermal and incompressible. The continuity equations for respiratory airflow movement in the tube and the N-S control equation are described in [38] and formulated as follows: where u i and u j are the velocities in directions i and j, respectively; x i and x j are the displacements in directions i and j, respectively. ρ is the airflow density, v is the kinematic viscosity coefficient of airflow in bronchi, and they are related to viscosity µ, µ = ρ × v. The viscosity of airflow µ is set to 1.89 × 10 −5 Pa · S. p is the pressure value per unit area produced by the flow field in the bronchi. t indicates time, h represents volume force, h = ρg, and g = 9.8 N/kg. In the flow field simulation in the bronchi, the standard k − ε model is adopted for steady simulation, and the LES simulation is employed to solve the N-S equation. In the LES simulation, the filtered continuity equation for airflow movement in the bronchi and the N-S control equation are written as: whereū i denotes the mean of u i , and u i u j is written as u i u j = u iūj + u i u j −ū iūj . We concencatenate (3) and (4): We also consider the influence of bronchial wall deformations on bronchi airflow. According to nonlinear continuum mechanics, the control equation for the solid bronchial wall is as shown in [38]: where ρ b is the bronchial wall density, u b is the motion velocity of the bronchial wall, Y b is the stress tensor, f b denotes stress near the boundary of the bronchial tube wall, and F is the deformation gradient of the bronchial wall, which is related to wall displacement.

C. SPECTRUM SIMULATION OF SOUND PRESSURE LEVELS
We use the FW-H equation to simulate the sound pressure level spectrum. The FW-H equation is as follows: 1 where a 0 represents the sound velocity in the bronchi, δ (f ) represents the Dirichlet function, H (f ) is the step function, and p represents the sound pressure at an observation point in the bronchi, which can be arbitrarily designated. The right side of (7) represents the source term generated by bronchial aerodynamic noise, and the monopole source term is expressed as: where ρ 0 is the airflow density value without perturbation. Since airflow in the bronchi can be regarded as incompressible viscous flow, in this paper, ρ = ρ 0 . The dipole source term is expressed as: where n j represents the unit normal vector pointing to the outer area of the flow field surface. The fluid velocity component perpendicular to the integration surface is represented by u n , and v n represents the velocity component of integration surface movement. Y ij represents the stress tensor of the fluid domain: where µ represents the viscosity of the airflow in the lungs, andp is the average sound pressure level. The quadrupole source term T ij is expressed as: Based on equation (7), the sound pressure p of a designated observation point of the bronchi is calculated. Then, the effective sound pressure p (e) is obtained [39]: We calculate the sound pressure level SPL for a designated observation point [39]: where p (ref ) is the reference sound pressure and generally

D. DEEP NEURAL NETWORK ARCHITECTURE
We take ResNet [40] as the backbone network for the baseline respiratory sound classification system. The basic component of the ResNet network is a residual module, which uses a jump-connect structure (also known as identity mapping).
Since the ResNet network is composed of many residual modules stacked together, it is easy to modify and expand the network structure.

III. PROPOSED METHODS
To explore the relationship between sound generation on the bronchial structure and the feature representations of respiratory sound signals, we propose an end-to-end respiratory sound classification framework with a feature-band attention module, which is obtained from the fluid-solid coupling simulation of a bronchial model.
Fluid-solid coupling modelling of bronchi requires boundary conditions: kinematic conditions, fluid velocity conditions, and dynamic conditions. Specifically, on the fluid-solid coupling interface between the bronchial fluid domain and the solid domain, the following conditions (a)-(c) should be satisfied [38]: The corresponding mass point between the fluid boundary domain and the solid domain (i.e. the wall of the bronchi) shares a consistent displacement. Namely,  The iterative calculation flowchart for the fluid-solid coupling of the bronchial model is shown in Fig. 1. The standard k −ε model and the solid control equation are used to simulate the flow field in the bronchi, to calculate the flow field velocityū i andū j , to determine flow field displacementx i and x j , and to find the pressurep per unit area of the flow field. When the flow field reaches a steady flow state, we select LES to simulate the flow field more precisely.

2) SOUND PRESSURE SPECTRUM CURVE FOR RESPIRATORY SIGNALS
To solve the FW-H equation, we substitute the pressurep, the flow field velocityū i andū j , and the flow field displacement x i , x j into equation (10) to obtain Y ij . Then, we substitute Y ij into (9) and (11) to determine F i and T ij , and we substitute F i , Y ij , and T ij into (7) to find p . The effective sound pressure p (e) is obtained after the root mean square calculation of the instantaneous sound pressure p at the designated observation

B. RESPIRATORY SOUND CLASSIFICATION FRAMEWORK 1) SYSTEM STRUCTURE
We design end-to-end respiratory sound classification systems based on the ResNet network. In contrast to the existing systems, we do not copy the acoustic features from the speech domain or pretrained neural networks for image recognition. We introduce the feature acoustic attention mechanism to the ResNet-backbone structure to strengthen the representation of frequency band characteristics among different respiratory sound signals. Due to the feature attention module that is inferred by bronchial modelling and simulation, the spectrum characteristics of the respiratory sound signals are adaptively weighted to further improve the classification performance. The overall framework is shown in Fig. 2.

2) CHANNEL-WISE ATTENTION MODULE
The whole structure of the ResNet baseline includes two 1-dimensional convolutional layers (Conv1D), two ResNet blocks that operate on the frame level, a statistic pooling layer that calculates the mean and standard deviation of each sample along the time-axis, some fully-connected (FC) layers, and an output layer with four nodes for four classes. Based on the ResNet baseline, we adopt the Squeeze-and-Excitation (SE) module [41] to capture the dependency between different feature channels and adjust the weight of each feature channel adaptively. The output of the residual block flows through the SE block before the skip connection, where T denotes the sequence length, F is the feature dimension, C denotes the number of channels of the residual block, and parameter r is a reduction ratio for controlling the computational cost of the SE block.

3) FEATURE-BAND ATTENTION LEARNING
Based on the analysis of spectrum curves for respiratory sound pressure levels, we propose two types of feature-band attention modules to obtain important frequency band information in respiratory sound signals, namely, a feature-band attention module (FB attention module) and a feature-band with a Q-parameter band attention module (FBQ attention module). With the FB attention module, the weight distribution of frequency bands in respiratory sound signals is obtained by the network's training process, and the scale operator weights the acoustic feature vectors. In the FBQ attention module, the spectrum characteristics for respiratory sound pressure simulation analyses are considered to include a Q vector. Then, the scale operator conducts bandwise multiplication with the Q vector to reinforce informative features and suppress less useful features. The structures of these two feature-band attention module types are shown in Fig. 3, where X represents the original acoustic features of respiratory sound signals, andX represents scaled acoustic features with a feature-band attention module.
Given C bands of Mel-filter banks and T frames for each sample, the acoustic feature vectors in each Mel-filter band are averaged through the global pooling layer: where x t,c represents the value of the c-th filter band in the t-th frame from the original acoustic features. X = [x 1 , x 2 , · · · , x t , · · · , x T ], x t = [x t,1 , · · · , x t,c , · · · , x t,C ]. z c represents the mean of the c-th filter band, and Z = [z 1 , z 2 , · · · , z c , · · · , z C ]. The band-wise weight S = [s 1 , s 2 , · · · , s c , · · · , s C ] is calculated through a series of nonlinear transformations.
22022 VOLUME 10, 2022  where σ (·) represents the sigmoid function, δ (·) is the ReLU function, and W 1 and W 2 are the network weights of the two fully-connected layers. For the FB attention module, the weight S dots the original acoustic features X to obtain new scaled acoustic features, andX = x 1 ,x 2 , · · · ,x t , · · · ,x T ,x t = [x t,1 , · · · ,x t,c , · · · ,x t,C ]: In the FBQ attention module, the parameter Q = [q 1 , q 2 , . . . , q c , . . . , q C ] is used to control the weight S, and then dots the original acoustic features X to obtain a new scaled acoustic featureX: We introduce these two types of feature-band attention modules to the SE-ResNet system and design the FB-SE-ResNet system and the FBQ-SE-ResNet system.

A. EXPERIMENTAL SETUP 1) BRONCHIAL MODELLING
Based on the airway tree model proposed by Weibel, we rendered the geometric structure of normal and asthmatic bronchi with SolidWorks modelling software. Since the vortex of the fluid movement in the large bronchi greatly contributes to creating respiration sounds in the bronchi, we select 0-3 level bronchi in the Weibel model for geometric modelling, and the wall thickness is 1.65 mm [42]. The specific parameters of the bronchial geometric model are shown in Table 1.     . 4 (a) shows the normal bronchi geometric model. We also narrowed some specific bronchi to obtain the asthmatic bronchi structure, as shown in Fig. 4 (b), and it is emphasized by two red circles. The fluid domain and the solid domain of the bronchi are shown in Fig. 5, where the solid domain of the bronchi in Fig. 5 (b) is the shaded part outside the yellow area in Fig. 5 (a).  In the Fluent workbench, we set the inlet boundary condition of the bronchi so that airflow enters the bronchial inlet at a certain speed with the software option Velocity Inlet. The air inlet velocity was 1.2 m/s, and it was evenly distributed on the inlet surface with an airflow turbulence rate that is set to 10%. The outlet boundary conditions of the eight outlets for the bronchi are all set to Pressure Out (software option), and the relative pressure of the outlet was 0 [28]. We assumed that this fluid-structure interaction simulation experiment was carried out under the physiological condition of a human body temperature of 37 • C. The airflow density ρ is set to 1.1 kg/m 3 , and the airflow viscosity µ is set to 1.89 × 10 −5 Pa · S. We used the standard k − ε model to calculate the airflow velocity until the bronchial fluid domain reached a steady flow state, and then we adopted the LES simulation to further calculate the flow field information with higher simulation accuracy.
In the Transient Structural module of ANSYS, we simulated the pressure and displacement of the bronchial wall during respiration. In the experiments, the density ρ b of the bronchial wall was set to 1,060 kg/m 3 , Poisson's ratio σ was 0.4, and the elastic modulus was set to 0.9 MPa, which follows Hooke's law [43]. In the solid domain, the inner wall of the bronchi is set as a fluid-solid coupling surface. As shown in Fig. 6, the fluid-solid coupling surface is marked in yellow and indicated by a red arrow.
Finally, we utilized the System Coupling module in ANSYS software and linked the simulation data to the Fluid Flow module for the fluid domain and the Transient Structural module for the solid domain. To prevent extreme deformations in the grid or excessive distortion rates in the simulation process, we employed a smoothing method to control the grid of the fluid-solid coupling surface and to correct the deformed grid in real time.

3) ACOUSTIC SIMULATION SETTINGS
In the acoustic module, the acoustic field information is calculated according to the airflow velocityū i andū j of the bronchi fluid domain at each time step and the pressure intensityp of the flow field.
In clinical auscultation, the position where the auscultation head is placed roughly corresponds to the level 2 bronchi in the Weibel model. We chose position ''1'', as shown in Fig. 7, as the observation point for the sound pressure level. The time step is set to 2.5 × 10 −4 s in the simulation of sound pressure levels. When the simulation converged, the sound field simulation was completed. Then, we used the Fourier Transform module to obtain the sound pressure level spectrum curve. Fig. 8 shows velocity nephograms of normal bronchi and asthmatic bronchi under inhalation and exhalation conditions, which were scanned in the plane on the central axis of the bronchi. As shown in Fig. 4, we segmented the bronchi into four regions, which were referred to as levels 0 through 3, and we found that airflow velocity increases as it moves through these ascending levels.

1) AIRFLOW VELOCITY OF BRONCHI
For normal bronchi, a flow rate imbalance occurs, wherein the airflow velocity on the side close to the bifurcation point is faster, and the airflow velocity at the central axis of the bronchi is higher than that on both sides.
For asthmatic bronchi, the overall velocity of airflow on the blocked side of the bronchus is greater than that on the unblocked side of the bronchus. For example, the maximum flow velocities of the blocked bronchus during inhalation and expiration are approximately 10.3 m/s and 9.32 m/s, respectively, whereas the maximum values of the normal bronchus flow velocity in the inhalation state and the expiration state are 6.74 m/s and 6.47 m/s, respectively.

2) BRONCHIAL WALL PRESSURE
We compared the pressure nephograms of normal bronchi and asthmatic bronchi under inhalation and exhalation conditions, which were scanned in the plane on the central axis of the bronchi. As shown in Fig. 9, in the inhalation state, the maximum bronchial wall pressure is in the entrance region, and the wall pressure decreases as the airflow moves through levels 0-3 of the segmented bronchi model. However, in the exhalation state, the maximum wall pressure of the bronchi is at the outlet region, and the wall pressure increases as the airflow moves through levels 0-3 of the segmented bronchi model. Furthermore, we found that the wall pressure of asthmatic bronchi is more uneven than that of normal bronchi. The overall pressure range on the blocked side of the bronchus is greater than that on the unblocked side of the bronchus (as well as that of the normal bronchi), which is consistent with the phenomenon that patients with physiological asthma breathe more difficultly.

3) SOUND PRESSURE LEVEL SPECTRUM CURVE
In the acoustic simulation process, we compared the sound pressure level spectrum curves of normal bronchi and asthmatic bronchi in both inhalation and expiration states, which were obtained from observation point 1. Fig. 10 shows that spectrum curves of normal respiratory sounds and asthmatic sounds are mainly distributed below 2,000 Hz, but sound pressure levels of asthmatic bronchi are higher than those of normal bronchi both in the exhalation and inhalation state, as well as the distribution differences in spectrum peaks. For example, the spectrum curves reach maximum peaks at 45 Hz (asthmatic bronchi) and 50 Hz (normal bronchi) under exhalation conditions. Under inhalation conditions, the sound pressure level spectrum curve of normal bronchi ranges from 0 to 1,000 Hz and thereafter declines until stabilizing at approximately 1,000 Hz. However, the sound pressure level spectrum curve of asthmatic bronchi still fluctuates with low pressure level values during the frequency range of 1,000 Hz to 2,000 Hz. These spectrum peak distributions are nearly identical to clinical statistics, which benefits the design of the feature-band attention module proposed in Section III.

V. CLASSIFICATION RESULTS AND ANALYSIS A. DATASET
To verify the proposed methods, we used the International Conference on Biomedical and Health Informatics (ICBHI'17) scientific challenge respiratory sound database [44]. The dataset contains 920 recordings from 126 patients, and a total of 6,898 respiratory cycles: 1,864 cycles are annotated by respiratory experts as crackles, 886 cycles are identified as wheezes, 506 cycles are both, and the rest are normal. According to ICBHI official standards, 60% of breathing cycles are marked as the training set, and 40% make up the test set. All samples are recorded with different equipment from hospitals in Portugal and Greece by two different research teams. Most audio samples of the database were acquired by the research team of the Respiratory Research and Rehabilitation Laboratory (Lab3R) of the School of Health Sciences, University of Aveiro (ESSUA), and the others were recorded by the research team of the Aristotle University of Thessaloniki (AUTH). A significant number of samples are noisy, which makes the dataset more challenging.

B. EVALUATION METRIC
In the ICBHI challenge, the official evaluation metrics for the four-class (normal (N), crackles (C), wheezes (W), and both (B)) classification problem are defined as follows [12]: (20) where Specificity represents the specificity of the system, Sensitivity is the sensitivity of the system, and Score is the average accuracy of the system. TN is the number of normal respiratory sounds that are correctly detected, FP represents the number of samples that are misjudged as abnormal respiratory sounds, TP is the number of abnormal respiratory sounds that are correctly detected, and FN represents the number of samples that are misjudged as normal respiratory sounds.      Res layer. FB-SE-ResNet applies the FB attention module on the SE-ResNet framework. FBQ-SE-ResNet scales the FB attention module of FB-SE-ResNet according to the simulation results of bronchial modelling. We compared them with public systems [12], [13], [16], [18]- [20] in terms of the official evaluation metrics, Specificity, Sensitivity, and Score, which were released by the organization. The details are shown in Table 4. Since the results in some references were calculated to two decimal places and the other results were whole integer values, we unified the accuracy to whole integer values.

1) SE BLOCK
From Table 4, the ResNet-backbone system gained 9% relative improvement in score value compared to the Decision Tree method [13], which is the baseline for the ICBHI 2017 Challenge. Based on the ResNet-backbone system, we introduced SE blocks and obtained relatively better performance in which the specificity value increased by 25%. The SE block's characteristics render it more attuned to feature channels between Res layers, thereby contributing to such improvements. However, the sensitivity value decreased by 35%, which may be due to unbalanced sample numbers among the four classes; for example, the normal class contains as many samples as the total number from the other three classes.

2) FEATURE-BAND ATTENTION MODULE
We focus on the effects of the feature-band attention module. In terms of the score value, the FB-SE-ResNet and FBQ-SE-ResNet systems were both better than the SE-ResNet system, which reached 2% and 6% improvements, respectively. The sensitivity values of the FB-SE-ResNet and FBQ-SE-ResNet systems were inferior to the ResNet-backbone but were beyond the SE-ResNet. For specificity values, the reverse applied. We think this is due to the effects of the feature-band attention module, which extracts innate feature representations for abnormal respiratory sounds even with an imbalanced sample size. In addition, we analysed the classification performance on different values of q in the range of 1.1 to 2.0, as shown in Fig. 11. We found that the q value of the feature had a great influence on the performance improvement and chose the best system with q = 1.3 to compare with the FB-SE-ResNet system.
Examples of FBank acoustic features, the scaled FBank features with FB attention learning, and the scaled FBank 22028 VOLUME 10, 2022 TABLE 5. Comparisons between the proposed system with the state-of-the-art systems with the random data split(highest scores in bold). features with FBQ attention learning, which are extracted from the normal, crackle, and wheeze respiratory sounds respectively, are shown in Fig. 12 -14. The original FBank features of abnormal lung sounds, such as wheezes and crackles, show significant features. And the scaled FBank features with the introduction of the feature-band attention module, allow the systems to focus on important acoustic feature bands, thereby improving the classification performance. Table 4 also shows the performance obtained by the proposed FBQ-SE-ResNet framework and state-of-the-art published systems (where available), CNN-MoE and two-path VGG-16. We note that the FBQ-SE-ResNet framework lies second in the official training-set split, which only uses a lowcomplexity network structure ResNet-9, but is very competitive to the state-of-the-art systems.
For the experiment with 5-fold cross validation, we obtained a sensitivity value of 80%, a specificity value of 87%, and a score value of 83%. Compared to the CNN-MoE system, the proposed FBQ-SE-ResNet gained 18% relative improvement in sensitivity and 5% relative improvement in score, while there was a relative decrease in specificity of 3%. For the experiment with 10-fold cross validation, we achieved an outstanding performance with a sensitivity value of 93%, a specificity value of 84%, and a score value of 88%. The classification results are presented in the confusion matrices as shown in Fig. 15.
Let's take for example, the 10-fold cross validation. It can be observed that the true positive rates of normal, crackles, wheezes and both (crackles and wheezes) are 93.0%, 93.0%, 74.7% and 64.0% respectively. In the case of the normal class, 4.9%, 1.4% and 0.5% of samples are falsely predicted as crackles, wheezes and both (crackles and wheezes) respectively. In the crackles class, 5.1%, 1.0% and 0.9% of samples are wrongly predicted as normal, wheezes and both (crackles and wheezes) respectively. In the wheezes class, 6.7%, 8.2% and 10.4% of samples are incorrectly identified as crackles, normal and both (crackles and wheeze) respectively. For the class of both (crackles and wheezes), only 3.6% of samples are falsely identified as normal, while 12.2% and 20.2% of samples are incorrectly classified as crackles and wheezes respectively. Although, it's still an interesting and challenging task to further improve the discriminant performance of the class of both (crackles and wheezes), the results demonstrate the proposed system is a progressive and innovative approach to the diagnosis of normal and abnormal respiratory sounds.

VI. CONCLUSION
We studied the physical modelling of lung bronchi to simulate sound pressure curves of normal breath sounds and wheezes. The simulation results show that respiratory sound signals in the bronchi are mainly distributed below 2,000 Hz, and the spectrum peaks differ among various types of respiratory sounds. Based on such respiratory sound characteristics, we designed a feature-band attention module to adaptively weight the spectrum characteristics of the respiratory sound signals; this feature-band attention module is then used as the input of the end-to-end respiratory sound classification system. Experimental results on public databases indicate that the proposed end-to-end respiratory sound classification systems with a feature-band attention module achieve promising performance. In the future, we intend to explore the impact of actual physiological conditions, such as increased airway mucus secretion and airway mucosal oedema. In addition, we plan to optimize the sensitivity value with data augmentation and the transfer learning technique.