Convolutional Sparse Support Estimator-Based COVID-19 Recognition From X-Ray Images

Coronavirus disease (COVID-19) has been the main agenda of the whole world ever since it came into sight. X-ray imaging is a common and easily accessible tool that has great potential for COVID-19 diagnosis and prognosis. Deep learning techniques can generally provide state-of-the-art performance in many classification tasks when trained properly over large data sets. However, data scarcity can be a crucial obstacle when using them for COVID-19 detection. Alternative approaches such as representation-based classification [collaborative or sparse representation (SR)] might provide satisfactory performance with limited size data sets, but they generally fall short in performance or speed compared to the neural network (NN)-based methods. To address this deficiency, convolution support estimation network (CSEN) has recently been proposed as a bridge between representation-based and NN approaches by providing a noniterative real-time mapping from query sample to ideally SR coefficient support, which is critical information for class decision in representation-based techniques. The main premises of this study can be summarized as follows: 1) A benchmark X-ray data set, namely QaTa-Cov19, containing over 6200 X-ray images is created. The data set covering 462 X-ray images from COVID-19 patients along with three other classes; bacterial pneumonia, viral pneumonia, and normal. 2) The proposed CSEN-based classification scheme equipped with feature extraction from state-of-the-art deep NN solution for X-ray images, CheXNet, achieves over 98% sensitivity and over 95% specificity for COVID-19 recognition directly from raw X-ray images when the average performance of 5-fold cross validation over QaTa-Cov19 data set is calculated. 3) Having such an elegant COVID-19 assistive diagnosis performance, this study further provides evidence that COVID-19 induces a unique pattern in X-rays that can be discriminated with high accuracy.


I. INTRODUCTION
C ORONAVIRUS disease 2019 (COVID-19) has been declared as a pandemic by the World Health Organization (WHO) a few months after its first appearance. It has infected more than 70 million people, caused a few million causalities, and has so far paralyzed mobility all around the world. The spreading rate of COVID-19 is so high that the number of cases is expected to be doubled every three days if the social distancing is not strictly observed to slow this accretion [1]. Roughly around half of the COVID-19 positive patients also exhibit a comorbidity [2], making it difficult to differentiate COVID-19 from other lung diseases. Automated and accurate COVID-19 diagnosis is critical for both saving lives and preventing its rapid spread in the community. Currently, reverse transcription-polymerase chain reaction (RT-PCR) and computed tomography (CT) are the common diagnostic techniques used today. RT-PCR results are ready at the earliest 24 h for critical cases and generally take several days to conclude a decision [3]. CT may be an alternative at initial presentation; however, it is expensive and not easily accessible [4]. The most common tool that medical experts use for both diagnostic and monitoring the course of the disease is X-ray imaging. Compared to RT-PCR or CT test, having an X-ray image is an extremely low cost and a fast process, usually taking only a few seconds. Recently, WHO reported that even RT-PCR may give false results in COVID-19 cases due to several reasons such as poor quality specimen from the patient, inappropriate processing of the specimen, taking the specimen at an early or late stage of the disease [5]. For this reason, X-ray imaging has a great potential to be an alternative technological tool to be used along with the other tests for an accurate diagnosis.
In this study, we aim to differentiate X-ray images of COVID-19 patients among other classes; bacterial pneumonia, viral pneumonia, and normal. For this work, a benchmark COVID-19 X-ray data set, Qata-Cov19 (Qatar University and Tampere University COVID- 19 Data set) that contains 462 X-ray images from COVID-19 patients was collected. The images in the data set are different in quality, resolution, and SNR levels as shown in Fig. 1. QaTa-Cov19 also contains many X-ray images from the COVID-19 patients who are in the early stages; therefore, their X-ray images show mild or nosign of COVID-19 infestation by the naked eye. 1 Some sample images are shown in Fig. 2(b). Another fact that makes the diagnosis far more challenging is that interclass similarity can be very high for many X-ray images as some samples are shown in Fig. 2(a). Against such high interclass similarities and intraclass variations, in this study, we aim for a high robustness level.
In numerous classification tasks, deep learning techniques have been shown to achieve state-of-the-art performance in terms of both recognition accuracy and their parallelizable computing structures which play an important role, especially in real-time applications. Despite their advantages, in order to achieve the desired performance level in a deep model, proper training over a massive training data set is usually needed. Nevertheless, this is unfortunately unfeasible for this problem since the available data is still rather limited.
An alternative supervised approach, which requires a limited number of training samples to achieve satisfactory classification accuracy is representation-based classification [6]- [8].
In representation-based classification systems, a dictionary, the columns of which consist of the training samples that are stacked in such a way that a subset of them corresponding to a class, is predefined. A test sample is expected to be a linear combination of all points from the same class as the test sample. Therefore, given a predefined dictionary matrix, D and a test sample y, we expect the solutionx from y = Dx, carry enough information about the class of y. Overall, in this study, we draw a convolutional support estimation network (CSEN) [9]-based solution pipeline, which fuses the representationbased classification scheme into a neural network (NN) body.
The rest of this article is organized as follows. In Section II, notations and mathematical preliminaries are given with emphasis on sparse representation (SR) and sparse support estimation (SE). Then in Section III, a literature review on deep learning models over X-ray images and representationbased classification is presented. The proposed CSEN-based COVID-19 recognition system is introduced in Section IV along with two recent alternative approaches that are used as the competing methods. The data collection is also explained in this section. Experimental setup and the main results are provided in Section V. Finally, Section VII concludes this article and suggests topics for future research.

A. Notations
In this study, the p -norm of a vector x ∈ R n is defined as On the other hand, the 0 -norm of the vector x ∈ R n is defined as Sparse support set or simply support set, ⊂ {1, 2, 3, . . . , n} of sparse signal x can be defined as the set of nonzero coefficients' location, i.e., := {i : x i = 0}.

B. Sparse Signal Representation
SR of a signal s ∈ R d in a predefined set of waveforms, ∈ R d×n , can be defined as representing s as a linear combination of only a small subset of atoms in the dictionary , i.e., s = x. Defining these sets, which dates back to Fourier's pioneering work [10], has been excessively studied in the literature. In the early approaches, these sets of waveforms have been selected as a collection of linearly independent and generally orthogonal waveforms (which are called a complete dictionary or basis, i.e., d = n) such as Fourier transform, DCT, and wavelet transform, until the pioneering work of Mallat [11] on overcomplete dictionaries (n d). In the last decade, interest in SR research increased tremendously. Their wide range of applications includes denoising [12], classification [13], anomaly detection [14], [15], deep learning [16], and compressive sensing (CS) [17], [18].
With a possible dimensional reduction that can be satisfied via a compression matrix A ∈ R m×d (m d), sample can be obtained from s where D ∈ R m×n can be called the equivalent dictionary. Because (1) describes an underdetermined system of linear equations, finding the representation coefficient vector x requires at least one more constraint to have a unique solution.
Using the prior information about sparsity, the following representation: which is also an SR of x has a unique solution provided that x is strictly sparse and D satisfies some required properties [19]. For instance, if x 0 = k, the minimum number of linearly independent columns of D, spark(D), should be greater than 2k, i.e., spark(D) ≥ 2k in order to not to have Dx = Dx for distinct k-sparse signals, x and x [19]. However, the optimization problem in (2) is a NP-hard. Fortunately, the following relaxation: produces exactly the same solution as that of (2) provided that D obeys some criteria: the equivalence of 0 -1 minimization problems can be guaranteed when D satisfies a notation of null space property (NSP) [20], [21] not only for exact sparse signals but approximately sparse signals. Furthermore, the query sample y can be corrupted with an additive noise pattern. In this case, the equality constraint in (3) can be further relaxed such as in the basis pursuit denoising (BPDN) [22]: min x x s.t. y − Dx ≤ , where is a small constant that depends on the noise level. In this case, a stronger property which is known as restricted isometry property (RIP) [23], [24] is frequently used which both cover conditions satisfying exact recovery of BP and stable recovery of BPDN, e.g., exact recovery of x from (3) is possible when D has RIP and m > k(log(n/k)).
We may refer to the sparse SE problem as finding the indices a set, , of nonzero elements of x [25], [26]. Indeed, in many applications, SE can be more important than finding the magnitude and sign of x as well as , which refers to the sparse signal recovery (SSR) via a recovery technique, such as (3). For example, in a sparse representation-based classification (SRC) system, a query sample y can be represented with sparse coefficient vector, x, in the dictionary, D in such a way that when we recover this representation coefficient from y = Dx, the solution vectorx is expected to have a significant number of nonzero coefficients coming from the particular locations corresponding to the class of y.
Readers are referred to [9] for a more detailed literature review on SE and its applications. In the sequel, we briefly summarize the building blocks of the proposed approach.

A. CheXNet
In the proposed approach, we first use the pretrained deep network, CheXNet, to extract discriminative features from raw X-ray images. CheXNet was developed for pneumonia detection from the chest X-ray images [27]. In [27], it was claimed that CheXNet can perform even better than expert radiologists in the pneumonia detection problem. This deep NN design is based on the previously proposed DenseNet [28] that consists of 121 layers. It is first pretrained over ImageNet data set [29] and performed transfer learning over 112120 frontal-view chest X-ray images in the ChestX-ray14 data set [30].

B. Representation-Based Classification
Consider we are given a test sample y, which represents either the extracted features, s, or their dimensionally reduced version, i.e., y = As. In developing the dictionary, training samples are stacked in the dictionary D with particular locations in such a way that the optimal support for a given query y should be the set of all points coming from the same class as y. Therefore, a solution vector,x of y = Dx is supposed to have enough information, i.e., the sparse support should be the set of location indices of the training sample from the same class as y. This strategy is generally known as representationbased classification. However, a typical solutionx of y = Dx is not necessarily a sparse one especially when its size grows with more training samples, which results in a highly underdetermined system of linear equations. Fortunately, if one estimates the representation coefficient vector with a sparse recovery design such as 1 -minimization as in (3), we can expect that the important nonzero entries of the solution,x, are grouped in the particular locations that correspond to the locations of the training samples from the same class as y. This can be a typical example of scenarios where SE can be more valuable than the magnitudes and sign recovery as explained in Section II-B.
For instance, Wright et al. [8] proposed a systematic way of determining the identity of face images using 1 -minimization. The authors develop a three-step classification technique that includes: (i) normalization of all the atoms in D and y to have unit 2 -norm; (ii) estimating the representation coefficient vector via sparse recovery, i.e.,x = arg min x x 1 s.t. y − Dx 2 ; and (iii) finding the residuals corresponding to each class via e i = y − D ixi 2 , wherex i is the group of the estimated coefficients,x, that correspond to class i . This technique, which is known as SRC, and its variants have been applied to a wide range of applications in the literature [31], [32], e.g., human action recognition [33], and hyperspectral image classification [34], to name a few. Despite the good recognition accuracy performance of SRC systems, their main drawbacks is the fact that their sparse recovery algorithms (e.g., 1 -minimization) are iterative methods and computationally costly, rendering them infeasible in real-time applications. Later, the authors of [6] introduced collaborative representation-based classification (CRC), which is similar to SRC except for the use of traditional 2 -minimization in the second step;x = arg min x y − Dx 2 2 + λ x 2 2 . Thus, CRC does not require an iterative solution to obtain representation coefficient thanks to that 2 -minimization has a closed form solution,x = D T D + λI n×n −1 D T y. Although, the sparsity inx cannot be guaranteed, it has often been reported to achieve a comparable classification performance, especially in small-size training data sets.

IV. PROPOSED APPROACH
For a computer-aided COVID-19 recognition system design, our primary objective is to achieve the highest sensitivity possible in the diagnosis of COVID-19 induced pneumonia with an acceptable false-alarm rate (e.g., specificity > 95%). In particular, the misdiagnosis of a COVID-19 X-ray image as a normal case should be minimized whilst a small number of false negatives (FNs) is tolerable.
Our interest in representation-based classification is that they perform well in classification tasks even in the cases where training data is scarce. As mentioned, the two wellknown representation-based classification methodologies are SRC [7] and CRC [6]. Among them, SRC provides slightly improved accuracy by solving an SR problem, i.e., producing a sparse solutionx from y = Dx. Then, the location of the nonzero elements ofx, which is also known as support set, provides the class information of the query y. Despite improved recognition accuracy, SRC solutions are iterative solutions and can be computationally demanding compared to CRC. In a recent work [9], a compact NN design that can be considered as a bridge between NN-based and representationbased methodologies was proposed. The so-called CSEN uses a predefined dictionary and learns a direct mapping using moderate/low size training set, which maps query samples, y, directly to the support set of representation coefficients, x (as it should be purely sparse in the ideal case).
In this study, to address the data scarcity limitations in COVID-19 diagnosis from X-ray images we propose a CSEN-based approach. Since a relatively larger set of COVID-19 X-ray images ever compiled is used in this study, the proposed approach can be evaluated rigorously against a high level of diversity to obtain a reliable analysis. The general pipeline of the proposed CSEN-based recognition scheme is illustrated in Fig. 3. In order to obtain highly discriminative features, we use the recently proposed CheXNet [27], which is the fine-tuned version of 121 layer Dense Convolutional Network (DenseNet-121) [28] by using over 100 000 frontal view X-ray images form 14 classes. Having the pretrained CheXNet for feature extraction, we develop two different strategies to obtain the classes of query X-ray images: 1) using CRC with proper preprocessing; 2) a slightly modified version of our recently proposed convolution support estimator (CSEN) models. In the sequel, both techniques will be explained in detail as well as alternative solutions.

A. Benchmark Data Set: QaTa-Cov19
Accordingly, there are several recent works [35]- [38] that have been proposed for COVID-19 detection/classification from X-ray images. However, they use a rather small data set (the largest containing only a few hundreds of X-ray images), with only a few COVID-19 samples. This makes it difficult to generalize their results in practice. To address this deficiency and provide reliable results, in this study the researchers of Qatar University and Tampere University have compiled a bechmark Covid-19 data set, called QaTa-Cov19. Compared to the earlier benchmark data set created in this domain, such as COVID Chestxray Data set [39] or COVID-19 DATA SET [40], QaTa-Cov19 has the following unique benchmarking properties. First, it is a larger data set, not only in terms of the number of images (more than 6200 images) but its versatility, i.e., QaTa-Cov19 contains additional major pneumonia categories, such as viral and bacterial, along with the control (normal) class. Moreover, this is a diverse data set encapsulating X-ray images from several countries (e.g., Italy, Spain, China, etc.) produced by different X-ray machines.
COVID-19 chest X-ray images were gathered from different publicly available but scattered image sources. However, the major sources of COVID-19 images are Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 Database [40], Radiopaedia [41], Chest Imaging (Spain) at thread reader [42] and online articles and news portals [43]. The authors have carried out the task of collecting and indexing the X-ray images for COVID-19 positive cases reported in the published and preprint articles from China, South Korea, USA, Taiwan, Spain, and Italy, as well as online news-portals (up to 20th April 2020). Therefore, these X-ray images represent different age groups, gender, ethnicity, and country. Negative Covid19 cases were normal, viral, and bacterial pneumonia chest X-ray images and collected from the Kaggle chest X-ray database. Kaggle chest X-ray database contains 5863 chest X-ray images of normal, viral, and bacterial pneumonia with varying resolutions [44]. Out of these 5863 chest X-ray images, 1583 images are normal images and the remaining are bacterial and viral pneumonia images. Sample X-ray images from QaTa-Cov19 data set are shown in Fig. 4.

B. Feature Extraction
With their outstanding performance in image classification along with other inference tasks, deep NNs became a dominant paradigm. However, these techniques usually necessitate a large number of training samples (e.g., several hundred-thousand to millions depending on the network size) to achieve an adequate generalization capability. Albeit, we can still leverage their power by finding properly pretrained models for similar problems. To this end, we use a state-ofthe-art pneumonia detection network, CheXNet, whose details are summarized in Section III-A. With the pretrained model, we extract 1024-long vectors, right after the last average pooling layer. After data normalization (zero mean and unit variance), we obtain a feature vector s ∈ R d=1024 .
A dimensionality reduction PCA is applied to s in order to get the query sample, y = As ∈ R m , where A ∈ R m×d is PCA matrix (m < d).

C. Proposed CSEN-Based Classification
Considering the limited number of training data in our COVID-19 data set, a representation-based classification can be applied hereafter to obtain the class of y using the dictionary (in the form of D = A ), whose columns are stacked training samples with class-specific locations.
As discussed earlier, SRC is an SE problem which is expected to be an easier task than an SSR problem. On the other hand, even if the exact signal recovery is not possible in noisy cases or in cases wherex is not exactly but approximately sparse (which is the case almost all the time in dictionary-based classification problems), it is still possible to recover the support set exactly [25], [38], [45], [46] or partially [46]- [48]. However, many works in the literature dealing with SE problems tend to first apply a sparse recovery technique on y to first getx, then use simple thresholding overx to obtain a sparse SE,ˆ . However, SSR techniques such as 1 -minimization are rather slow and their performance varies from one SRR tool to another [9]. In our previous work [9], we proposed an alternative solution for this iterative sparse recovery approach which aims to learn a direct mapping from a test sample y to the corresponding support setˆ . Along with the speed and stability compared to conventional SSR-based techniques and recent deep learning-based SSR solutions, CSEN has the crucial advantage of having a compact design that can achieve a good performance level even over scarce training data.
Mathematically speaking, an ideal CSEN is supposed to yield a binary mask which indicates the true support, i.e., = {i ∈ {1, 2, . . . , n} : v i = 1}. In order to approximate this ideal case, a CSEN network, P(y, D) produces a probability vector p which returns a measure about the probability of each index being in such that p i ∈ [0, 1]. Having the estimated probability map, estimating the support can easily be done viaˆ = {i ∈ {1, 2, . . . , n} : p i > τ}, by thresholding p with τ where τ is a fixed threshold.
A CSEN is composed of fully convolutional layers, and as input it takes a proxy,x, of sparse coefficient vector, which is a coarse estimation of x, i.e., D T D + λI −1 D T y or simplỹ x = D T y. Then, it yields the aforementioned probability like vector p via fully convolutional layers. Using such a proxy of x, instead of making inference directly on y has also studied in a few more recent studies. For instance, in [49] and [50], the authors proposed reconstruction-free image classification from compressively sensed images. Alternatively, one may design a network to learn proxyx by fully connected dense layers [49]. However, it increases the computational complexity and may result in an even over-fitting problem with scarce training data [9]. The input vectorx is reshaped to have a 2-D plane representation in order to use it with 2-D convolutional layers. This transformation is performed via reordering the indices of the atoms in such a way that the nonzero elements of the representation vector x for a specific class come together in the 2-D plane. A representative illustration of the proposed dictionary design compared to the traditional one is shown in Fig. 5.
Hereafter, the proxyx is convolved with the weight kernels, connecting the input with the next layer with N l filters to yield the inputs of the next layer, with the biases b 1 as follows:  where b 1 is the weight bias, S 1 (.) is either identity or subsampling operator predefined according to network structure and ReLu(x) = max(0, x). For other layers, i.e., l > 2, the kth feature map of layer l is defined as where S l (.) is either identity operator or one the operations from down-and up-sampling and N l is the number of feature maps in lth layer. Therefore, the trainable parameters of CSEN will be: for an L layer CSEN design.
In developing the dictionary that is to be used in the SRC, the training samples are stacked-in by grouping them according to their classes. Thus, instead of using traditional 1 -minimization formulation as in (3), the following group 1 -minimization formulation may result in increased classification accuracy: where x Gi is the group of coefficients from the i th class. In this manner, one possible cost function for a SE network would be where P (x) p is network output at location p and v p is the ground truth binary mask of the sparse code x. Due to its high computational complexity, we approximate the cost function in (8) with a simpler average pooling layer after convolutional layer, which can produce directly the estimated class in our CSEN design. An illustration of proposed CSEN-based COVID-19 recognition is shown in Fig. 3.

D. Competing Methods
This section summarizes the competing methods that are selected among numerous alternatives due to their superior performance levels obtained in similar problems. For fair comparative evaluations, all classification methods have the same input feature vectors fed to the proposed CSENs.
1) Collaborative Representation-Based Classification: As a possible competing technique to the proposed CSEN-based technique which is a hybrid method, CRC [6] is a direct and representation-based classification method that can be applied to this problem as shown in Fig. 6. It is a noniterative SE technique, that satisfies faster and comparable classification performance with SRC while it is more stable compared to existing iterative sparse recovery tools as it is shown in [9]. In the first step of CRC, the tradeoff parameter of the regularized least-square solution is set as λ = 2 * 10 −12 . In order to obtain the best possible λ, a grid search was made in the range [10 −15 , 10 −1 ] with a log scale.
2) Multilayer Perceptron (MLP) Classification: The proposed COVID-19 recognition pipeline can be modified by replacing CSEN or CRC part with another classifier. As one of the most-common classifiers, a 4-hidden layer multilayer perceptron (MLP) is used for this problem as shown in Fig. 7. For training, we used back-propagation (BP) with Adam optimization technique [51]. The network and training hyperparameters are as follows: learning rate, α = 10 −4 , and moment updates β 1 = 0.9, β 2 = 0.999, and 50 as the number of epochs. Fig. 8 illustrates the network configuration in detail. This network configuration has achieved the best performance among others (deeper and shallower) where deep configurations have suffered from over-fitting while the shallow ones exhibit an inferior learning performance.
3) Support Vector Machines (SVMs): For a multiclass problem, the first objective is to select the SVM topology for ensemble learning: one-versus-one or one-versus-all. In order to find the optimal topology and the hyperparameters (e.g., kernel type and its parameters) we first performed a grid-search with the following variations and setting: kernel function with a log scale.

4) k-Nearest-Neighbor (k-NN):
Finally, we use a traditional approach, k-nearest neighbor (k-NN) is used with PCA dimensionality reduction. In a similar fashion, the distance metric and the k-value are optimized by a prior grid-search. The following distance metrics are evaluated: City-block, Chebyshev, correlation, cosine, Euclidean, Hamming, Jaccard, Mahalanobis, Minkowski, standardized Euclidean, and Spearman metrics. The k-value is varied within the range of [1,4416] with a log scale.

A. Experimental Setup
We have performed our experiments over the QaTa-Cov19 data set, which consists of normal and three pneumonia classes: bacterial, viral, and COVID-19. The proposed approach is evaluated using a stratified fivefold cross-validation (CV) scheme with a ratio of 80% for training and 20% for the test (unseen folds) splits, respectively. Table II shows the number of X-ray images per class in the QaTa-Cov19 data set. Since the data set is unbalanced, we have applied data augmentation to the training set in order to balance the size of each class in the train set. Therefore, the X-ray images in viral and COVID-19 pneumonia and normal classes are augmented up to the same number as the bacterial pneumonia class in the train set. We use Image Data Generator by Keras to perform data augmentation by randomly rotating the X-ray images in a range of 10 • , randomly shifting images both horizontally and vertically within the interval of [−0.1, +0.1].
In each CV fold, we use a total of 8832 and 1257 images in the train and test (unseen in the fold) sets, respectively.
The experimental evaluations of SVM, k-NN, and CRC are performed using MATLAB version 2019a, running on PC with Intel ® i7-8650U CPU and 32 GB system memory. On the other hand, MLP and CSEN methods are implemented using Tensorflow library [52] with Python on NVidia ® TITAN-X GPU card. For the CSEN training, ADAM optimizer [51] is used with the proposed default learning parameters: learning rate, α = 10 −3 , and moment updates β 1 = 0.9, β 2 = 0.999 with only 15 back-propagation epochs. Neither grid-search nor any other parameter or configuration optimization was performed for CSEN.

B. Experimental Results
The same network configurations are used for CSEN as in [9]. Accordingly, we use two compact CSEN designs: CSEN1 and CSEN2, respectively. The first CSEN network consists of only two hidden convolutional layers, the first layer has 48 neurons and the second has 24. ReLu activation function is used in the hidden layers and the filter size was 3×3. On the other hand, CSEN2 uses max-pooling and has one additional hidden layer with 24 neurons to perform transposedconvolution. CSEN1 and CSEN2 are compared against the 6 competing methods under the same experimental setup.
For the dictionary construction in each CSEN design, 625 images for each class (from the augmented training samples per fold) are stacked in such way that the representation coefficient in the 2-D plane, X has 50 × 50 size as shown in Fig. 5. The rest of the images in the training set are used to train each CSEN, i.e., 1583 samples from each class. We use PCA dimensional reduction matrix, A with the compression ratio, CR = (m/d) = 0.5. Therefore, we have 512×2500 equivalent to obtain a coarse estimation of the representation (sparse in the ideal case) coefficients,x ∈ R n=2500 . Hereafter, the CSEN networks are trained to obtain the class information of y from inputx as illustrated in Fig. 3.
Due to the lack of other learning-based SE studies in the literature, we chose a deeper network compared to CSEN designs to investigate the role of network depth in this problem. ReconNet [53] was proposed as a noniterative deep learning solution to CS problem, i.e.,ŝ ← P(y) and it is one of the state of the art in compressively sensed image recognition task. It consists of six fully convolutional layers and one dense layer in front of the convolutional ones, which act as the learned denoiser for the mapping from y ∈ R m tos ∈ R d . Then, the convolutional layers are responsible for producing the reconstructed signal,ŝ froms. Therefore, by replacing this dense layer with the denoiser matrix B, this network can be used as a competing method.
Both CSEN and the modified ReconNet usex as an input, which is produced using an equivalent dictionary D and its pseudo-inverse matrix B.
In designing the dictionary of the CRC system, all training samples are stacked in the dictionary, , i.e., 2208 samples from each class. The same PCA matrix used in CSEN-based recognition, A is applied to features, s ∈ R d=1024 . Therefore, a dictionary D of size 512 × 8832 and the corresponding denoiser matrix B of size 8832 × 512 are used in the CRC framework.
Overall, the confusion matrix elements are formed as follows: true positive (TP): the number of correctly detected positive class members, true negative (TN): the number of correctly detected negative class samples, false positive (FP): the number of misclassified negative class members as positive, and FN: the number of misclassified positive class samples as negative (i.e., missed positive cases). Then, the standard performance evaluation metrics are defined as follows: where sensitivity (or Recall) is the rate of correctly detected positive samples in the positive class Specificity = TN TN + FP (10) where specificity is the ratio of accurately detected negative class samples to all negative class Precision = TP TP + FP (11) where precision is the rate of correctly classified positive class samples among all the members classified as a positive sample where accuracy is the ratio of correctly classified elements among all the data where F-score is defined by the weighting parameter β. The F1-score is calculated with β = 1, which is the harmonic average of precision and sensitivity. The classification performance of the proposed CSEN-based approach and the competing methods is presented in Table I. As can be easily observed from Table I, the proposed approaches surpass all competing methods in COVID-19 recognition performance by achieving 98.5% sensitivity, and over 95% specificity. As shown in Table III, compared to MLP and ReconNet, the proposed CSEN designs are very compact and computationally efficient. This is evident in Table IV where the computational complexity (measured as total computation, time over the 1257 test images) is reported.
Finally, Table V presents the overall (cumulative) confusion matrix of the proposed CSEN-based COVID-19 recognition approach over the new QaTa-Cov19 data set. The most critical misclassifications are the false-positives, i.e., the misclassified COVID-19 X-ray images. The confusion matrix shows that the proposed approach has misclassified seven COVID-19 images (out of 462). The 3 out of 7 misclassifications are still in "viral pneumonia" category, which can be an expected confusion due to the viral nature of COVID-19. However, the other four cases are misclassified as "Normal" which is indeed a severe clinical misdiagnosis. A close look at these false-negatives in Fig. 9 reveals the fact that they are indeed very similar to normal images where typical COVID-19 patterns are hardly visible even by an expert's naked eye. It is possible that these images come from patients who were in the very early stages of COVID-19.

A. CRC Versus CSEN
When compared against CRC in particular, CSEN-based classification has two advantages; computational efficiency and, a superior COVID-19 recognition performance. The computational efficiency comes from the fact that a larger size dictionary matrix (of the size of 512 × 8832) is used  For further analysis, we also tested the CRC framework by using the light dictionary (of size 512 × 2500) used in CSEN-based recognition. We called it CRC (light), and as it can be seen in Table VI, the performance of CRC further reduced, and there was no significant improvement concerning the computational cost. When it comes to creating deeper convolutional layers instead of using CSEN designs, such as the modified ReconNet, the results presented in Table I shows us that compact CSEN structures are indeed preferable to achieve superior classification performances compared to deeper networks.

B. Compact Versus Deep CSENs
Representation-based classifications are known for providing satisfactory performance when it comes to limited size data sets. On the other hand, deep artificial NNs usually require a large training set to achieve a satisfactory generalization capability.
In a representation-based (dictionary) classification scheme when the dictionary size getting bigger (increase the number of training samples), the computational complexity of the method drastically increases. The proposed CSEN is an alternative approach to handle both moderate and scarce data sets via compact as possible NN structures for the dictionary-based classification.
Since there is no other learning-based SE method except CSEN in the literature, we chose ReconNet as a possible competing algorithm for this problem as explained in detail in Section V. ReconNet has six fully convolution layers. As an ablation study, we also add more hidden layers to proposed CSEN models to compare: CSEN3 and CSEN4 models were obtained by adding one and two hidden layers to CSEN2, respectively, after the transposed convolutional layer.

VII. CONCLUSION
The commonly used methods in COVID-19 diagnosis, namely RT-PCR and CT have certain limitations and drawbacks such as long processing times and unacceptably high misdiagnosis rates. These drawbacks are also shared by most of the recent works in the literature based on deep learning due to data scarcity from the COVID-19 cases. Although deep learning-based recognition techniques are dominant in computer vision where they achieved state-of-the-art performance, their performance degrades fast due to data scarcity, which is the reality in this problem at hand. This study aims to address such limitations by proposing a robust and highly accurate COVID-19 recognition approach directly from X-ray images. The proposed approach is based on the CSEN that can be seen as a bridge between deep learning models and representationbased methods. CSEN uses both a dictionary and a set of training samples to learn a direct mapping from the query samples to the sparse support set of representation coefficients. With this unique ability and having the advantage of a compact network, the proposed CSEN-based COVID-19 recognition systems surpass the competing methods and achieve over 98% sensitivity and over 95% specificity. Furthermore, they yield the most computationally efficient scheme in terms of speed and memory.

ACKNOWLEDGMENT
The authors would like to thank the following medical doctor team for their generous feedbacks and continuous