A Blind Filtering Framework for Noisy Neonatal Chest Sounds

Chest sound— as the first and most commonly available vital signal for newborns— contains affluent information about their cardiac and respiratory health. However, neonatal lung sound auscultation is currently challenging and often unreliable due to the noise and interference, particularly for preterm infants. The noise often overlaps with the heart and lung contents in both time and frequency. Moreover, the frequency band of the useful components varies from one case to another, making it difficult to separate by fixed band-pass filtering. In this study, a single-channel Blind Source Separation (SCBSS) framework is proposed to separate newborns’ lung and heart sounds from noisy chest sounds recorded by a digital stethoscope. This method first decomposes the signal into a multi-resolution representation using a time-frequency transform, and then applies source separation algorithms, to find proper ad hoc frequency filters. In the simulation scenario, two different time-frequency transforms are considered; Stationary Wavelet Transform (SWT) with dyadic bases, and Continuous Wavelet Transform (CWT) with redundant bases. The transforms are followed by three different source separation methods, namely Principal Component Analysis (PCA), Periodic Component Analysis ( $\pi $ CA), and Second Order Blind Identification (SOBI). The yielded combinations are applied to the chest sounds recorded from ninety-one preterm and full-term newborns. The results show that compared to raw signals, fixed band-pass filtering and seven other separation methods, the heart and lung sounds extracted by the proposed methods have higher quality index and also result in more reliable heart and respiratory rate estimation.


I. INTRODUCTION
Diagnosing cardiac and respiratory diseases in newborns have been the focus of many studies in recent years [1]- [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Easter Selvan Suviseshamuthu .
Timely diagnosis of cardiac issues in newborns tremendously helps physicians to arrange the necessary medical interventions and treatments to prevent irreversible consequences. To assist with diagnosis, there has been a growing interest in using digital stethoscope (DS) and dedicated computer analysis, particularly in pediatrics [4]- [7]. The major advantages VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/  of DS include its accessibility, affordability, portability and compatibility with smartphones, which make it suitable to be utilized in resource-constrained and undeveloped countries. It also facilitates telemedicine applications for increasing specialist access, clinical decision support, crowdsourcing and home-based monitoring of cardio-respiratory conditions [7]. The feasibility of using DS for neonatal population has been recently investigated in a few studies focusing on heart, lung or abdominal sound auscultation [8]- [13]. However, its application for newborns has been limited in current clinical practice, mainly due to the weakness of the sounds and high levels of interference and background noise [7]. The sounds recorded from newborns often have poor quality and therefore denoising is required prior to further analysis and diagnosis. Generally, the chest sound is a mixture of heart and lung sounds, as the two clinically remarkable components in chest records, and noises from other internal and external sources. To date, most studies have used typical band-pass filtering to separate these components. However, the selected frequency bands have been inconsistent in different studies, complicating the use of fixed band-pass filtering and the choice of proper cut-off frequencies. Tables 1 and 2 summarize some of the frequency bands used in previous studies, demonstrating their noticeable disagreement. It can be even more challenging for neonatal chest sounds, where the frequency components of lung sounds are affected by the gestational age at birth, lung fluid, and differences in lung, airway and chest wall size and characteristics [2], [3], [12].
One of the potential solutions for separating the components of monophonic records is to use single-channel blind source separation (SCBSS) methods. These methods usually involve a decomposition procedure followed by a source separation algorithm to find the components of interest [35]. SCBSS methods have been widely utilised in the field of acoustics, including speech recognition [36] and music analysis [37]. They have also been used for biomedical audio signal processing such as fetal-maternal Phonocardiogram (PCG) separation [38].
The most prevalent heart-lung sound separation algorithms in the literature are based on non-negative matrix factorization (NMF) [39], empirical mode decomposition (EMD) [40]- [42], and singular spectral analysis (SSA) [43]. In regard to EMD-based methods, a practical issue is their reconstruction quality. The intrinsic decomposition in EMD does not have any implication on reconstruction optimality and adding optimality constraints requires additional computational cost, which contradict with the simplicity and intuitiveness of EMD [44].
To date, the existing methods in the lung-heart sound literature do not provide a frequency analysis along with their signal separation procedure. Also, to the best of our knowledge, the temporal structures of the sound sources-such as pseudo-periodicity-have not been utilized for the separation of lung and heart sounds. In addition, most of the existing methods have been evaluated only on synthetic data instead of real recordings acquired in real environments. Furthermore, prior research have focused on extracting one of the sources (heart or lung sound), or benefit from synchronous ECG records. While ECG monitoring is standard in newborn intensive care units (NICUs), it is typically not available in all clinical settings, especially in the less affluent regions such as low-and middle-income countries (LMICs).
In this study, SCBSS methods are epitomized and represented in a unified framework. SCBSS are further interpreted as a versatile approach for denoising, filtering and also extracting the heart and lung sounds from noisy neonatal chest audio recordings. Overall, the single-channel signal is decomposed into its frequency sub-bands using a timefrequency decomposition. Then, a linear combination of the frequency channels is found by source separation, such that a measure of decorrelation or independence is maximized. Therefore, the optimal frequency filters are adjusted particularly for each case using SCBSS methods. The framework may be adapted for different applications and conditions by defining suitable discrimination functions and optimizing them. In this study, based on the intrinsic features of the heart and lung sounds, three different separation measures are examined for extracting the sources, including minimum correlation, independent temporal structures, and maximum pseudo-periodicity. However, the methods using different time-frequency decompositions and independence measures are applied on the neonatal chest sound. The results are compared to raw records, fixed band-pass filtered signals, and several state-of-the-art methods. The results are evaluated in terms of an automated signal quality measure, heart-rate and respiratory rate estimation errors, and computational time. The following sections are arranged as follows: in Section II, the general concept of tuning the frequency filters for each case and each source is expressed mathematically, and some of its limitations and considerations are discussed. Section III provides details of the chest sounds, the data acquisition procedure, and some possible versions of the proposed framework and their implementation procedures. This is followed by the results of the proposed algorithms in Section IV. Finally, the findings are discussed and concluded in Section V.

A. DATA MODEL AND SCBSS PROBLEM
Suppose that x(t) ∈ R is a single-channel raw chest recording at the time of t, consisting of three major sources of heart sound x H (t), lung sound x L (t), and other undesired components and noise e(t); Separating these sources can be considered as a (blind) source separation problem. However, the solution is feasible if the problem is determined or over determined. It means that the number of observation channels (e.g., simultaneously recording sensors) should be at least equal to or more than the sources. This condition is not met when a single-channel observation exists. Therefore, most SCBSS techniques involve a decomposition process before solving the source separation problem. Generally, SCBSS methods can be expressed within the framework shown in Fig. 1, where the single channel observation is first decomposed to an M -channel real or complex signal y(t) ∈ C M using a suitable time-frequency transformation, e.g., short-time Fourier transform, wavelet transforms, Hilbert-Huang transforms, etc. [45]. Therefore, each of the decomposed signals in y(t) is represented by a convolution of the observation and the bases: where j = 1, . . . , M and v j (t) denotes the transform basis in the j th channel. Next, source separation methods are applied to separate the independent components. If we consider uncorrelatedness as a weaker measure of independency, we can also use PCA as well as ICA methods (generally abbreviated as ''CA'' methods in the sequel) to find the sources. Therefore, assuming B ∈ C M ×M as the matrix of unmixing coefficients, the sources are obtained as: where s(t) = [s 1 (t), . . . , s M (t)] T ∈ C M denotes the sources vector, while its rows are the linear combinations of interest. Hence, For some CA methods the components (sources) are naturally sorted according to a measure, such as energy in PCA or periodicity in πCA. The sorting may help select the desired components, but commonly it is not useful for general ICA methods such as JADE or SOBI. Moreover, the sorting can be unreliable for signals contaminated by harsh non-stationary noise. Therefore, the sources of interests need to be selected generally by a component selection procedure, which can be done empirically or by an automatic algorithm.
The transformed desired sources (denoted byŷ H for heart sound andŷ L for lung sound) can be estimated by mixing the selected components. Suppose that A = B † is the mixing matrix in CA methods, andŝ H andŝ L are the vector of the desired heart and lung components, with the same size as s but in which the unselected components are set to zero. So, Then, the inverse decomposition transform ofŷ can yield the estimated filtered signals in the original domain, i.e.,x H (t) andx L (t) for heart and lung sounds, respectively. All the procedure's steps are summarized in Algorithm 1.

B. UNCORRELATED AND INDEPENDENT SOURCES SEPARATION
Principle Component Analysis (PCA) and Independent Component Analysis (ICA) are two key techniques to solve source separation problems. For example, PCA seeks to find maximum variance resulting uncorrelated sources. The coefficients vector b i which satisfies the PCA constraint can be estimated by the eigen-vector corresponding to the largest eigenvalues of the observation's covariance matrix. However, if being uncorrelated is not a sufficiently strong constraint for separating the sources, ICA methods are the VOLUME 10, 2022 Algorithm 1 SCBSS Framework REQUIRED: the single-channel signal, the decomposition transform, the CA method, and the component selection method OUTPUT: the filtered signal, PROCEDURE: 1: decompose the 1-d record to an M -d signal using the decomposition transform, 2: analyse the decomposed signals using the CA method and find the independent components, 3: for each of the aimed output (heart and lung sounds): • using the component selection method, select the desired components and eliminate the rest, • reconstruct the decomposed signals by applying the CA inverse transform on the desired components, • merge the reconstructed M -d signals back to the original 1-d domain using the inverse of the decomposition transform.
alternatives. They estimate the sources with higher degrees of independence, but at cost of a higher computational cost, mostly arising from solving a non-linear optimization problem. Most of the ICA methods utilize PCA as a pre-whitening step, and then optimize the ICA contrast function to analyse signal statistics. Different contrast functions makes different types of ICA methods. For example, JADE algorithm uses the contrast of forth order moments to find the independent sources [46]. Other methods such as AMUSE, πCA and SOBI are satisfied of the second order statistics by diagonalizing the covariance and time-lag auto-covariance matrices of observation [47]- [50]. Therefore, they can be more effective than JADE if the data has temporal structures.
Assume that s(t) is a zero-mean source signal, and E t {.} denotes expectation over time, by defining the τ time-lagged covariance of s(t) as then it can be expressed using the observation's covariance matrix: where All the second-order-statistic source separation methods attempt to joint-diagonalize the set of covariance matrices {C y (τ k )} for different time-lag values τ k ∈ {τ 0 (= 0), τ 1 , τ 2 , . . . τ K }, and maximize the following general cost function: Therefore, in PCA C y (τ k ) is diagonalized just for τ 0 = 0, in AMUSE two covariance matrices with k = 0, 1 are involved, and SOBI utilizes several time-lags i.e., k = 1, 2, 3, . . . , K , and attempts to approximately jointdiagonalize their corresponding auto-covariance matrices. Among the mentioned methods, πCA is the one which by definition is customized for extracting semi-periodic signals. Some of the semi-periodic bio-signals like ECG and PCG can be considered cyclostationary, meaning that their statistical characteristics are repeated by a period of time like τ . In the wide-sense definition of a zero mean cyclostationary signal, the statistics are reduced to the covariance, and it is expected that This periodicity as a temporal structure can be a useful piece of prior information in coordinating the CA methods to extract the semi-periodic sources. This objective is carried out in πCA by minimizing the following measure of periodicity [48]: It can be shown that the above cost function is equivalent to the following [48]: It is obvious that minimizing this measure can be satisfied by the cyclostationarity criteria in (10) for λ = 1. However, this is a Rayleigh-Ritz optimization problem and the value of b minimizing (11) and (12) is attainable by the eigenvector corresponding to the largest generalized eigenvalue of the matrix pair (C y (τ ), C y (0)).

C. FREQUENCY FILTERING BY SCBSS
Considering (2) and (4), we can rewrite the i th source as below: where is a linear combination of the decomposition bases. Since the CA methods are linear transformations, if they are applied to time-frequency decomposed signals, the outputs are again frequency filtered signals, but under the constraint of being uncorrelated or (statistically) independent. In other words, they find the coefficients of the frequency filters with filtered components being uncorrelated or independent as much as possible. In the frequency domain, by Fourier transform of (13), we have 50718 VOLUME 10,2022 in which • indicates an element-wise multiplication. So, it is equivalent to frequency filtering of x(t) using the filter obtained from a linear combination of the time-frequency transform bases. In other word, the coefficients calculated from the CA methods amplify the frequency bins which discriminate the sources, and attenuate the other correlated frequency bins.

D. APPROPRIATE DECOMPOSITION TRANSFORMS
For the hereby proposed framework, various choices of time-frequency expansions are possible. Some of them such as Discrete Wavelet Transform (DWT) benefit from orthonormal bases, providing no redundant components. However, some of the DWT specifications should be modified and matched to the proposed SCBSS framework. Firstly, the CA methods need the decomposed signals at each scale to have the same length, which is not valid in the orthonormal expansions like DWT because of its decimated translation. It can be addressed using undecimated versions of DWT like Stationary Wavelet Transform (SWT) [51] or Maximal Overlap Discrete Wavelet Transform (MODWT) [52].
Although the undecimated translation causes the bases in the same scale to be redundant and not orthogonal, they remain mutually orthonormal in different scales. Therefore, the decomposed signals are supposed to be uncorrelated in different levels, and no more de-correlation process can be done by the methods like PCA. For illustration, consider an orthonormal time-frequency transform followed by PCA. Regarding the equation (13), the desired linear combination is estimated by PCA by finding the eigen-vector corresponding to the largest eigenvalue of C y , the covariance matrix of y(t). However, if orthogonal bases are used in the time-frequency decomposition, C y becomes the identity matrix, i.e., PCA does not add any advantage beyond the sub-band decomposition readily accomplished by DWT. However, other CA methods using more potent independency constraints like ICA techniques can still discriminate the sources. Moreover, orthonormal expansion sets involve dyadic scaling, predefined frequency bands and constant Q-factor wavelet decomposition. These restrictions lead to less flexibility in tuning the frequency-domain filtering. To avoid these constraints and to achieve more degrees of freedom, one may employ more redundant transformations, such as the DWT with Biorthonormal bases, Rational Dilation Wavelet Transform (RADWT), or even more redundant expansions such as the Continuous Wavelet Transform (CWT), which benefits from highly localized time-frequency filters. Even more redundancy (viz. degrees of freedom) can be achieved by CWT with complex coefficients, which comprises phase information as well as amplitude. Note that these excess redundancies will be subsequently reduced by the CA methods, and do not imply any restriction but computational cost. The advantages of utilizing redundant transforms have been studied in the context of heart and lung sound processing [53]- [56].

A. DATA ACQUISITION
In this work, the proposed methods were applied to single-channel chest sounds recorded from two groups of pre-term and full-term infants. To record the sounds, the digital stethoscope (Clinicloud Stethoscope, Clinicloud Ltd Pty, Melbourne, Australia) was placed on the right anterior chest of the infant. Lung and heart sounds were then recorded for 60 s. The recordings were saved by Voice Recorder & Audio Editor [57], a commercially available smartphone software, in MP3 format with 16 kHz or 44 kHz sampling rates. The study was conducted at Monash Newborn, Monash Children Hospital, a tertiary-level neonatal unit in Melbourne, Australia, and was approved by the Monash Health Human Research Ethics Committee (HREA/18/ MonH/471). Recordings were made between 24 and 48 hours after birth, to avoid potential interference from lung fluid clearance in the first 24 hours. More information about the data is available at [12], [58].
Some of the recordings were excluded manually because of significant artifacts such as crying, recorder saturation, stethoscope slipping, and some loud environment noises, making the lung and heart sounds impossible to recover. Since irregular breathing periods existed in most of the studied newborns, specially for the preterm cases, there are long non-breathing segments in the recordings. Therefore, 10 s segments containing both the heart and lung sounds were chosen manually by an expert. Totally, 91 segments are considered for further analysis, while 28 of them were accompanied by NICU Heart Rate (HR) and Respiration Rate (RR) information. In addition, to avoid unnecessary high computational cost in future processes, the recordings were down-sampled to 4 kHz.

B. ALGORITHM DEVELOPMENT AND IMPLEMENTATION
Different choices for the decomposition transform and the CA method in the proposed framework of Fig. 1 provides a variety of SCBSS algorithms. In the present study, SWT and CWT in combination with PCA, SOBI and πCA methods provide four algorithms to separate lung and heart sounds from single channel chest records. Their configurations and implementation details are described in the following. 1

1) SWT-PCA
As the first and simplest configuration, we adapted the Biorthogonal basis function with one and three vanishing moments for decomposition and reconstruction wavelets [59], [60]. Moreover, in order to keep the length of the signals unchanged, an undecimated version of DWT known as Stationary Wavelet Transform (SWT) is utilised [51]. Considering the sampling frequency of the data (4 kHz), a 6-level SWT is used to decompose the signal into dyadic subbands shown in Fig. 2. The six detail levels are given to PCA and the heart and lung sounds are eventually selected among the components. The final desired heart and lung sounds can be achieved according to Step 3 in Algorithm 1.

2) CWT-PCA
Discrete Time Continuous Wavelet Transform (DT-CWT) is a redundant time-frequency expansion utilized as the decomposition transform in our framework. Using one sample translation of the bases leads the decomposed signals to have the same length as the original signal. On the other hand, since the bases are redundant in scales, we can skip some of the scales considering a trade-off between the efficient redundancy and the computational cost. In the present study, we use Analytical Morlet (Gabor) wavelets with 17 scales (four voices per octave) covering 50-1000 Hz. Then like the previous algorithm, we can expect that PCA ranks the possible sources corresponding to heart sound, lung sound and noise. Therefore, according to our visual inspection, the best components corresponding to the heart and lung sources are selected, and the desired sources are obtained after reconstructing and converting the coefficients into the original domain.

3) CWT-SOBI
SOBI utilizes plenty of time-lags and attempts to joint-diagonalize the corresponding covariance matrices. The covariance matrix of the time-lag τ k is calculated as: In combination with CWT, SOBI is supposed to extract the sources by finding linear combinations of the sub-bands which maximize their temporal dependency. According to a regular use of SOBI, the set of time-lags τ = {τ 1 , . . . , τ K } might be considered randomly or uniformly just to sweep the time interval containing the expected temporal structures. Instead, in the present study, we propose a more proper method to set the time-lags based on the available prior information of heart sounds.
First of all, the space between the time-lags can be inferred from the expected higher bound of heart sounds frequency components, denoted by f h . Defining δτ = τ k − τ k−1 , we have δτ = 1 2f h . Although the maximum frequency is reported about 300 Hz for neonates, a slightly higher value can be chosen to ensure that all the frequency components are covered [14]. Therefore, taking f h 330 Hz, the time-lags are chosen uniformly with 1.5 ms spacing.
In addition, it is expected that the autocorrelation of cyclostationary signals have a higher envelope around the shifts equal to the recurring period of the heart sound. Therefore, if the average period of heart sound is T , the shifts around λT for λ = 0, 1, 2, . . . are appropriate candidates to be used as SOBI's time-lags. Mathematically, let τ p λ be the time around λT at which the autocoralation envelope has the local maximum, i.e., where R x is the autocorrelation of x, H{.} denotes Hilbert transformation, and r is the neighbourhood radius around the λT in which we search for the local maximum. So, from a wide set of available τ k , the proper time-lags τ k can be found as where τ is named maximum effective lag and can be calculated as τ = 1 2f l , where f l is the expected lower frequency bound of the desired signal, here the heart sound. By assuming f l = 50 Hz according to the literature in [19]- [22], the proper time-lags are supposed to be around the peaks of the autocorrelation envelope with at most τ = 10 ms difference. Number of detectable peaks in envelope of auto-correlation is affected by quality and regularity of heart sound. In the present study, the time-lags are set to cover three heart beats, which means = 3 in (16).
The CWT configuration is the same as the previous algorithm. After applying the SOBI, the suitable components corresponding to the heart and lung sounds are visually selected from the delivered components, and the rest is performed according to the step 3 of the general framework in Algorithm 1.

4) CWT-π CA
As mentioned before, πCA unlike SOBI uses a specific time-lag obtained from prior information [48]. In the case of heart sound filtering, it can be related to an averaged or synchronized heart beat captured from an auxiliary signal, like ECG or HRV. However, this reference time-lag is achieved in this study from the chest sound itself without using extra hardware and recording auxiliary signal. The heart sound is the major component in the lower frequencies, and the other components are almost negligible compared with it. So, the signals decomposed by one of the CWT scales corresponding to low frequencies (75-90 Hz) is selected, and its envelope is calculated. Since the Analytical Morlet provides complex coefficients, the envelope can be yielded easily by their absolute values. Then, a conventional peak detection algorithm can find S1 peaks, and the S1-S1 intervals are used as the reference time-lag for πCA. The peak detection algorithm searches for each peak in the expected interval of [0.7 T , 1.3 T ] after the previous peak, while T is the average heart sound period.
Because of variability in heart rate, the S1-S1 intervals are not equal, and it can decrease the accuracy of calculating C y (τ ). Therefore, in order to make the beats more aligned, we use a simple linear phase warping and map each beat to (−π, π). So, the covariance matrices in (12) are calculated in the phase domain as: (18) in which L is the total number of samples, and where φ(t) is the cardiac phase signal [48]. To ensure that the eigenvalues are real, we can make the covariance matrix symmetric byC By descending sort of the generalized eigenvalues of the pencil (C y (τ ),C y (0)) the sources are ranked from high to low periodicity. Since the heart sound is the most periodic source, we can expect to see it among the first components, and the lung sound is found in other components. However, because of some occasional non-stationary noises in the recorded data, this order may be slightly inconsistent.

C. COMPONENT SELECTION
The component selection can be carried out by classifying the components into three group of heart sound, lung sound or noise. The following features are used as inputs: • peak frequency of power spectral density, • standard deviation of S1-S1 intervals, as a periodicity related feature. The S1 detection algorithm is same as the one used in CWT-πCA algorithm.
• ratio of the second peak to the first peak of the envelope of auto-correlation absolute values. As a periodicity related feature, this value falls between 0 (least periodicity) and 1 (most periodicity). The procedure of finding first and second peaks is similar to (16), while λ = 0, 1.
15 chest records were utilised here to optimize the classifier, and then excluded from the main dataset in the subsequent performance evaluation. So, 840 components were produced using the 4 proposed algorithms and labeled via listening and visual assessing by an expert, to be used as the training dataset. Five popular classifiers including K-Nearest Neighbours (KNN), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Naive Bayesian, and multi-class Binary Decision Tree (BDT) were applied and their parameters were optimized in a 10-fold cross validation. Then, their classification accuracy rates were obtained through a 10-by-10 cross validation, and KNN was chosen according to its better result (see Table 3). The component selector may occasionally label none of the components as one or two of the desired sources. In the present study, it happened for heart sound extraction of 3.4 % of the recordings, but none of the lung sound extractions. However, in the case of no-labeling, in order to avoid empty outputs, the components with the power peak closest to 50 Hz and 400 Hz are returned respectively as heart and lung sounds.

D. PERFORMANCE EVALUATION 1) COMPARED METHODS
To assess the improvement of heart and lung sound quality by the proposed methods, they are compared to the raw sounds and the fixed band-pass filtered signals. The band-pass filters are 10 th order Butterworth passing 100-1000 Hz for the lung sounds and 50-250 Hz for the heart sounds. The cutoffs are corresponding to the dominant frequency bands of each signal based on the previous studies listed in Tables 1 and 2. In addition, seven state-ofthe-art heart-lung separation methods are examined and compared to the proposed methods including: Singular Spectrum Analysis (SSA) [43], Empirical Mode Decomposition (EMD) [40], Ensemble Empirical Mode Decomposition (EEMD) [41], Complete Ensemble Empirical Mode Decomposition (CEEMD) [42], Non-negative Matrix Factorization Clustering with Kullback-Leibler (KL) divergence cost function (NMFC-KL), Non-negative Matrix Factorization Clustering with L-2 cost function (NMFC-L2), and Non-negative Matrix Factorization Clustering with Sparse cost function (NMFC-Sparse) [39], [61], [62]. The parameters and settings of these methods are chosen based on the suggested values in the corresponding references. The utilised toolboxes and codes can be found in [62]- [66].

2) SIGNAL QUALITY ASSESSMENT
The measure of quality utilised here for both the heart and lung sounds assessment is based on our previous work in [67]. Briefly, a linear regression model has been tuned in order to map a collection of audio features into a 5-point scale measuring signal quality for heart and lung sounds (from 1 for the lowest to 5 for the highest quality). The quality scores are obtained after annotation of 176 chest sound recordings (different from ones involved in the present study) by 7 experts and used as the ground truth. Different varieties of features are engaged as the predictors, including: several envelope-based and wavelet-based features, statistical characteristics of the signals like variance and kurtosis, mel-frequency cepstrum, linear predictive coding, and line spectral frequencies coefficients, power ratios of different frequency ranges, dominant frequencies, properties of different fractal dimensions and entropy measures, number of detectable heart sounds (S1 and S2) and breathing periods, and length of periods without detectable heart and lung sounds. The best feature set is selected using Minimum Redundancy Maximum Relevance (MRMR) algorithm. Then, the regression model is trained by a set of best features, while different cost functions (SVM and least-squares) and different regularization methods (lasso and rigid) are examined to obtain the best test results based on mean squared error. More description of the features and the regression can be found in [67].

3) HR AND RR DETECTION
Estimating the vital signs like HR and RR is expected to be more reliable and accurate in well-filtered chest sounds. Therefore, two HR and BR estimation algorithms are applied on the filtered heart and lung sounds respectively, in addition to the raw signals. The results are then compared to the reference values recorded simultaneously via an NICU machine. The HR estimation algorithm introduced by Springer et. al., is initialized with HR determined from peak detection of envelope, and then uses it for heart segmentation to obtain overall heart rate estimate [68]. Breathing rate is determined from peak detection of 300-450Hz power envelope. More details of calculated heart and breathing rate are described in [58]. Their Root Mean Square Error (RMSE) compared to the NICU refernce values are shown in Table 5.  Table 4. For the KNN component selector utilised in the proposed methods, mean and standard deviation of the running time was 66±11 ms, which is included in the table values. Fig. 3 shows a 10 s sample chest record which are filtered by 4 different methods proposed in the present study and also band-pass filtering with fixed cut-offs. First of all, we can see that the heart and lung sounds obtained by the fixed band-pass filters are not filtered well, and both of them are contaminated with other undesired components. Clearly, the residual heart sound exists in the filtered lung sound, and vice versa. It to some extent happened for the heart and lung sounds filtered by SWT-PCA, but the amount of undesired components are less than fixed band-pass filtering. However, these residuals are pretty much reduced in the sources estimated by CWT-PCA, CWT-πCA and CWT-SOBI. The quality of the sources extracted by the proposed and compared methods is assessed by the methods described in III-D. Fig. 4 shows how much improvement is caused by each method compared to the raw signals for heart and lung sounds. Obviously, all the proposed methods have provided more quality improvement for the heart and lung sounds compared to the simple band-pass filtering. The performance of CWT-PCA is better than SWT-PCA in lung sound extraction, that may imply the advantage of using redundant bases instead of dyadic ones in high frequencies. The results of CWT-πCA were expected to be better than CWT-PCA, but they are not; πCA finds the proper components by diagonalizing just one delayed auto-covariance matrix, and its result is highly sensitive to its time-lag value. Miss-detection of S1 in poor quality sounds or presence of significant non-stationary noises can drastically affect estimating accurate values for period of heart sounds and consequently the time-lag. This drawback would likely be addressed if an external heart beat detection source were used. Using auxiliary signals like ECG along with chest sound could be helpful in this regard. Lately, simultaneous ECG-PCG recording and developing related devices, which have recently become an interesting trend [69], [70]. However, CWT-SOBI are more robust against the mentioned noises, because of using plenty time-lags and joint diagonalization of more than two autocovariance matrices. Therefore, it can be seen that the results of CWT-SOBI for heart sound quality improvements is much better than others.

IV. RESULTS AND DISCUSSION
Among the other compared algorithms, SSA outperforms the EMD-based and NMF-based methods. Its heart sound  SQI is almost like CWT-SOBI, but its lung sound SQI is not competitive with the proposed methods. EMD-based and NMF-based methods are in the next ranks for both heart and lung sound SQIs. It is also observed that there are occasional degradations in all the proposed and benchmark methods. This can be explained by considering that: A) the utilized SQI estimation method is not 100 % accurate and occasional incorrect estimations would be expected. But on average it is an appropriate metric for SQI calculation; B) None of the proposed and benchmark algorithms are perfect in their performance, especially for the case of low quality and noisy chest sound of newborns. Table 5 shows the effect of different filtering methods on accuracy of HR and RR estimation. As we can see, CWT-πCA, CWT-PCA, and CWT-SOBI generally caused lower RMSE for HR and BR estimation than the benchmark methods, fixed band-pass filtering, and raw signals. However, no improvements in RR and HR is demonstrated by SWT-PCA. It might be caused by its dyadic sub-bands which leads less flexibility of filter adjustment specially in low frequencies.
The computational time of the proposed methods for a heart-lung sound extraction from a 10 s record are demonstrated in Table 4. CWT-SOBI has significantly higher needed time compared to the other proposed methods, because of its joint diagonalization of plenty covariance matrices. Even so, it seems feasible to use the algorithms in commercial, portable and affordable medical devices. Among the seven benchmark methods, SSA and NMFC-L2 also have relatively low computational time, while the worst time-consuming method is CEEMD.     The filters of all the records are overlaid on each other in order to show their variation caused by the adaptability of the methods. The SWT-PCA filters has the least variation because of less frequency localization of SWT compared with CWT. The πCA filters seems unstable and tumultuous, which as mentioned before, might be caused by the inappropriate setting of the time-lag.

V. CONCLUSION
In auscultation of chest recordings, usually either the heart or lung sounds are the intended acoustic signals for examination. However, the recording inevitably is a mixture of both components and also different environmental and acquisition noises, causing lower agreement and accuracy in medical diagnosis. Therefore, filtering of the raw signals is a crucial step before any clinical auscultation. We have found it particularly challenging for newborns' chest auscultation, due to the closeness of the heart and the lung, noise and weakness of the sounds of interest.
Despite the common use of frequency filtering, characteristics of an ''optimal'' frequency filter varies among the cases. Specifically, defining suitable cut-off frequencies has not been straight forward. We addressed this challenge by customizing a SCBSS framework using the combination of different CA methods and time-frequency expansions to design ad-hoc filters, per case. Its main advantage is that it can find the adaptable filters for each case in a non-parameter procedure. In addition, using the proposed framework both heart and lung sounds can be extracted simultaneously. In two of the methods suggested within the framework, periodicity and temporal structure of the heart sound is used as the contrast function in tuning the frequency filters. Specifically, a novel method is proposed to set the parameters of SOBI (specifically the proper time-lags of its lagged covariance matrices) according to known characteristics of heart sound such as its lower and higher frequency bounds. The results of implementation using neonatal chest sounds showed that the proposed methods can improve the quality of both heart and lung sounds, compared to seven state-of-the-art heart-lung sound separation algorithms, fixed bandpass filtering and the raw signals.
The proposed methods can potentially assists healthcare providers with a more reliable auscultation using an inexpensive DS, applicable also in resource-limited settings. It is more significant in the potential application in tele-health, when the sounds are recorded by non-experts, for remote monitoring and diagnosis.
The present study will be more comprehensive and enriched by setting out the following point for further study; Involving other decomposition and CA methods may provide higher source extraction performance. Assaying the methods on other older age groups, including children and adults, and also the cases with pathological heart and lung sounds, potentially from the same group of subjects, could be investigated in future studies. ATUL MALHOTRA received the Ph.D. and M.D. degrees. He is also a Senior Neonatologist at Monash Children's Hospital, and an Associate Professor (a Research)/a NHMRC Fellow at Monash University, Melbourne, Australia. He has a large research program, with interests in neonatal lung and brain injury, with more than $7 million in research funding. He has published more than 100 peer-reviewed articles, and four book chapters to date. Together with Dr. Fae Marzbanrad, their team researches digital health technologies to improve neonatal cardiorespiratory monitoring.
FAEZEH MARZBANRAD (Senior Member, IEEE) received the Ph.D. degree from The University of Melbourne, Australia, in 2016. She is currently a Lecturer and the Head of the Biomedical Signal Processing Laboratory, Department of Electrical and Computer Systems Engineering, Monash University, Australia. Her research interests include biomedical signal processing, machine learning, affordable medical technologies, and mobile-health. VOLUME 10, 2022