Introduction
Since December 2019, the COVID-19 pandemic has af-fected over 232 million individuals and is thought to be responsible for over 4.7 million deaths worldwide [1]. Health-care systems have been challenged with an overwhelming number of patients with COVID-19 related complications. Moreover, it is believed that the numbers are even worse in developing countries and disadvantaged communities. Within the computer research community, there have been ongoing efforts to explore the potentials of artificial intelligence (AI)-based diagnostic solutions in the battle against COVID-19. Nevertheless, few research efforts have been directed towards resource-aware solutions that are suited for settings with limited cloud and computational infrastructure. Such resource-aware AI solutions would be of great significance for disad-vantaged communities and the developing world.
Within this context, this paper presents a novel method for the resource-aware identification of COVID-19 cough sounds using deep wavelet scattering networks (DWSN). To the best of our knowledge, this is the first exploration of the potentials of DWSN in the diagnosis of respiratory-related conditions. Despite its simplicity, the proposed method has demonstrated exceptional performance compared to other resource-hungry deep learning-based approaches. Experiments were conducted to demonstrate the ability of the proposed method to differenti-ate among three types of coughs, these are namely: COVID-19 related, Asthma-related, and healthy cases.
The rest of the paper is organized as follows: Section II presents an overview of the related work in the literature. Section III presents the details of the proposed methodology. Experiments and results are presented in section IV. Finally, conclusions are drawn in section V.
Related Work
Over the past decade, there has been a growing interest in the potentials of cough sounds in computer-aided diag-nosis of various respiratory-related medical conditions such as bronchitis, bronchiolitis, and pertussis [2]. Over the past year, this interest has extended to explore the potentials of AI in identifying COVID-19 cough sounds. In this section, we present a short survey of the research efforts most related to the work presented in this paper. In [3], the authors presented a method to classify cough sounds as COVID-19 positive and negative cases. The authors explored the use of Short Time Fourier Transform (STFT) and mel-frequency cepstral coefficients (MFCC) along with Support Vector Machines (SVM). The results are promising and comparable to accu-racies achieved by more complicated learning approaches. Experiments were conducted using 121 samples collected from a local hospital. Average accuracy of 98.6 % was reported. In [4], the authors presented a method for detecting Chronic Obstructive Pulmonary Disease (COPD) using cough sounds. Based on a limited dataset of 39 coughs, the authors reported an average accuracy of 85.4 % based on statistical features and random forest classification techniques. In [5], the authors presented an early result for a wearable cough detection system based on neural networks. An average specifity and sensitivity of 93.7% and 97.6 % were reported respectively. In [6], the authors presented a method for diagnosis of Pertussis, Bronchitis, and COVID-19 conditions based on cough sounds. Mel-frequency cepstrum features were used along with transfer learning. Overall average accuracy of 92.64% was reported. In [7], the authors evaluated the performance of convolutional residual-based neural network (Resnet50) and Long-term-short memory network (LSTM) in COVID-19 cough classification. Best results of 98% AUC were obtained using Resnet50. In [8], the authors explored the potential of automatic diagnosis of COVID-19 cough from crowdsourced uncontrolled respiratory sound data. Preliminary results of 80% Area Under Curve (AUC) were reported and a rich dataset was presented. This dataset is unique since it is crowdsourced. This makes it valuable for researchers interested in resource-aware solutions. In [9], the authors presented Virufy, an open-source dataset for COVID-19 cough sounds. In [10], the authors reported an accuracy of 97.5 % for COVID-19 cough classification based on mel-frequency cepstrum spectrogram images and convolutional neural networks (CNN). In [11], the authors presented a more comprehensive review for the problem of COVID-19 diagnosis from respiratory cough sounds.
In [12], the authors presented a method for diagnosis of COVID-19 from respiratory sound using deep convolutional neural network with multi-feature channel. The proposed method has been evaluated using Cambridge's cough sounds crowd sourced dataset [8]. The authors demonstrated the ability of proposed method to classify positive vs neg covid-19 cases with an average accuracy of 95.45% with an F1-score of 96.96 %. In [13], the authors proposed method of identifi-cation of Parkinson's speech using deep acoustic embeddings and decision-level fusion of VGGish, YAMNET, and openL3 embeddings. In [14], the authors presented a method for identification of COVID-19 cough using limited training data by exploring self-supervised representation ensembles prior to main classification task. In [15], the authors proposed a method for lung sound recognition using transfer learning. The pro-posed method combines VGGish network with bidirectional gated recurrent unit neural networks (BiGRU).
Proposed Methodology
This section presents details for the proposed resource-aware COVID-19 identification method. At the core of the proposed method, we explore the usage of deep wavelet scattering network as an alternative for other computationally intensive deep learning methods with large number of hyper-parameters to tune. Figure 1 shows the general architecture in the proposed method. The input cough signal is first resampled at 16khz. The signal is then trimmed by removing silence periods at the beginning and the end. A four-minute window is used as an input to the wavelet scattering network. Scattering embeddings are then used as input to the classification phase using support vector machine and majority voting.
A. Deep Wavelet Scattering Networks
Deep wavelet scattering networks (DWSN) are capable of extracting robust features that are insensitive to trans-lation and deformations [16]–[21]. Wavelet scattering networks share several desirable properties with traditional convolutional neural networks (CNN). These include multi-scale representation, non-linearity, and sparse representations [17]. On the other side, scattering networks use predefined wavelet and scaling filters. Hence, no learning is needed for the filter weights. These are appealing properties in situations with scarce training data and limited computational capabilities. We use Deep Wavelet Scattering Network (DWSN) at the core of the proposed resource-aware COVID-19 cough identification method.
A wavelet scattering network is constructed by iteratively repeating the following three steps: convolution with a wavelet filter, applying non-linearity using modulus operator, and an averaging stage using a scaling function. We use complex Morlet wavelets, [20], to construct the deep wavelet scattering network.
Following the notations in [22], let
The \begin{gather*}
S_{m} f(t)=\left\{\vert \vert \left\vert f(t) ^{\ast} \psi_{j_{1}}\right\vert ^{\ast} \ldots\left\vert \psi_{j_{m}}\right\vert ^{\ast} \varphi_{J}(t)\right\}_{j_{i \in \Lambda_{i}}} \tag{1}\\
i=1,2 \ldots \ldots m\end{gather*}
The scattering network was constructed using two layers. The first layer has 8 wavelets per octave, and the second layer has four wavelets per octave. An invariance scale is set to 0.5 minute and the input signal window used is 4 minutes. The scattering matrix is then constructed by aggregating all scattering coefficients at all orders as shown in Equation (2)
\begin{equation*}
S f(t)=\left\{S_{m} f(t)\right\}_{0 \leq m \leq l}\tag{2}\end{equation*}
B. Scattering Embeddings
The output of the wavelet scattering network is then crit-ically sampled in time based on bandwidth of the scaling function. This resulted in 32 time windows for each of the 1313 scattering paths. Thus, for each input cough signal, 32 scattering sequences were generated. These acts as the embedding signature for the cough signal. Z-score, [22], is applied to normalize the extracted embedding across each time window.
C. Classification Stage
The generated scattering embedding from each signal has the dimension of 32x1313 where 32 is the number of time windows and 1313 is the length of the scattering path. Support Vector Machine (SVM) classifier was used to classify each scattering sequence. SVM classifier is a supervised learning method that aims to construct the optimal hyperplane that best separates the two classes at hand [22]. Kernel functions are used to expand the application to non-linear problems. Results reported in this paper use quadratic kernel function with the Support Vector Machine. In the three-class classification experiment, the problem is formulated as multiple dual clas-sification problems using the one-versus-one approach [22], [23]. Finally, majority voting is then applied to generate the final classification for the cough signal.
Experiments and Results
Several experiments have been conducted to evaluate the proposed method. This section presents the details of the datasets used, the evaluation methodology, and the summary of the results.
A. Description of the Dataset
Evaluation for the proposed methodology has been conducted using two datasets. These are namely Virufy dataset, [24], and COVID-19 Sounds crowdsourced dataset [8].
1) Virufy Clinical Dataset (DB1)
This dataset was made available to the research community by Virufy, a non-profit research organization that aims at promoting the use of AI for the early diagnosis of COVID-19 [24]. This dataset contains 121 cough samples from 16 patients. All samples are labeled as either COVID-19 positive or negative. The dataset was collected in a hospital with physicians following standard operating procedures. The data is preprocessed and labeled with COVID-19 status acquired from PCR tests. Throughout the rest of the paper, this dataset is referred to as DB1.
2) COVID-19 Sounds Crowd Sourced Dataset(DB2)
This dataset been created by researchers from the University of Cambridge It was collected through a browser and an app from users all over the world [8]. The dataset has a wealth of recordings featuring different classes: COVID-19 positive with cough, Healthy cough, and non-COVID-19 Asthma with cough. The dataset includes also different modalities, namely: cough, breath, and cough with breath. In our experiments, only the cough-modality has been used to evaluate the capability of the proposed method to identify COVID-19 cough, Asthma cough, and healthy cough. This dataset is not open source and has been made available by the University of Cambridge.
Figure 2 shows samples of cough signals labeled as healthy, COVID-19, and Asthma(non COVID-19).
B. Evaluation Protocol and Metrics
The proposed method has been evaluated using repeated K-fold cross-validation [23], [22]. This involves repeating a K-fold cross-validation process for N number of times and reporting the average as the overall system performance. In general, this is thought to provide a better estimate of the model performance. In our experiments, the reported results are the average of 3-fold cross-validation that is repeated 10 times. Standard evaluation metrics were reported for the different experiments. These are namely: accuracy, specificity, sensitivity, and F1-score [23], [22]. Figure 2 shows samples of cough signals labeled as healthy, COVID-19, and Asthma(non COVID-19).
1) Results
The proposed method has been evaluated through a number of classification tasks. These are namely:
Task I: Distinguishing COVID-19 positive vs Healthy coughs.
Task II: Distinguishing COVID-19 positive vs Asthma coughs.
Task III: Distinguishing COVID-19, Asthma, and healthy coughs.
Task IV: Distinguishing COVID-19 vs Non-COVID-19 cough sounds
Tasks I, II, and III were conducted using the crowdsourced COVID-19 cough sounds dataset (DB1). Task IV was con-ducted using the Virufy clinically controlled dataset (DB2). Table I and Table II show the number of cough signals used in each classification task.
Table III summarizes the performance of the proposed method in the three different classification tasks under investi-gation. Despite its simplicity, the proposed method has demon-strated outstanding performance in the four classification tasks at hand.
In classification task I, distinguishing between COVID-19 positive cases and Healthy cough cases was achieved with an average accuracy of 98.64 %, with 99.08 % sensitivity, and 98.10 % specificity. The F1-score was also used to provide a better picture of the system performance in terms of false positives and false negatives. The proposed method has achieved an average F1-score of 98.63 %.
In classification task II, The proposed method was able to distinguish between COVID-19 and Asthma related cough sounds. An average accuracy of 95.34% was achieved. Using majority voting, an accuracy of 99.48% was achieved.
In task III, experiments were conducted to evaluate the capability of the proposed method to differentiate among three different classes of cough sounds. These are namely: COVID-19 cough, Asthma cough, and healthy cough. With no majority voting applied, an average accuracy of 96.46 % was achieved with 96.45 % sensitivity and 98.23% specificity. Applying the majority voting over the scattering sequences for each signal resulted in a 3% improvement with an average accuracy of 99.62%.
In Task IV, the proposed method has been evaluated using Virufy (DB2), a clincially-controlled dataset. The proposed method has achieved state of art performance of 96.46% accuracy with no majority voting applied. This was improved to 99.62 % by applying the majority voting over the classified scattering sequences.
2) Comparison with Other Methods in the Literature
A thorough comparison of different COVID-19 cough classification methods is a challenging task. This is due to the lack of standardized datasets and evaluation protocols. Nevertheless, Table IV attempts to draw some comparisons between the proposed method and selected state-of-art methods in the lit-erature. However, it is important to highlight that the methods listed in Table IV do vary in terms of the size of the dataset, the number of positive COVID-19 cases, and the hyper-parameters selections. From Table IV, it is evident that the proposed DWSN-based method has achieved accuracies that are com-parable and in line with related work in the literature. This is even though the proposed method is inherently much simpler than other deep learning approaches listed in Table IV This result is significant since it indicates the relative advantage of the proposed method in settings with limited resources.
Summary and Conclusion
This research has been motivated by the potentials of cough sounds in the early alert of COVID-19 outbreaks in resource-limited settings. We presented a novel resource-aware method for the identification of COVID-19 cough sounds using Deep Wavelet Scattering -based embeddings and support vector machines. We have demonstrated the robustness of the proposed method to distinguish among COVID-19, Asthma, and Healthy cough sounds. Compared to related work in the literature, the proposed method has demonstrated exceptional performance in identifying COVID-19 cough sounds. This is achieved at orders of magnitude less complexity compared to related CNN-based approaches in the literature.
The current COVID-19 pandemic has hit every corner of the world, including less advantaged communities. Thus, there is an urgent demand for innovative early diagnostic solutions that are suited for resource-limited settings. The future work of this paper would include testing the proposed methodology on larger datasets and exploring the potential of the proposed methodology as part of portable point-of-care early diagnostic solutions.
ACKNOWLEDGMENT
The authors wish to thank Prof. Cecilia Mascolo at the University of Cambridge for sharing the crowd-sourced cough sounds dataset.