A Novel Fusion Model of Hand-Crafted Features With Deep Convolutional Neural Networks for Classification of Several Chest Diseases Using X-Ray Images

With the continuing global pandemic of coronavirus (COVID-19) sickness, it is critical to seek diagnostic approaches that are both effective and rapid to limit the number of people infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The results of recent research suggest that radiological images include important information related to COVID-19 and other chest diseases. As a result, the use of deep learning (DL) to assist in the automated diagnosis of chest diseases may prove useful as a diagnostic tool in the future. In this study, we propose a novel fusion model of hand-crafted features with deep convolutional neural networks (DCNNs) for classifying ten different chest diseases such as COVID-19, lung cancer (LC), atelectasis (ATE), consolidation lung (COL), tuberculosis (TB), pneumothorax (PNET), edema (EDE), pneumonia (PNEU), pleural thickening (PLT), and normal using chest X-rays (CXR). The method that has been suggested is split down into three distinct parts. The first step involves utilizing the Info-MGAN network to perform segmentation on the raw CXR data to construct lung images of ten different chest diseases. In the second step, the segmented lung images are fed into a novel pipeline that extracts discriminatory features by using hand-crafted techniques such as SURF and ORB, and then these extracted features are fused to the trained DCNNs. At last, various machine learning (ML) models have been used as the last layer of the DCNN models for the classification of chest diseases. Comparison is made between the performance of various proposed architectures for classification, all of which integrate DCNNs, key point extraction methods, and ML models. We were able to attain a classification accuracy of 98.20% for testing by utilizing the VGG-19 model with a softmax layer in conjunction with the ORB technique. Screening for COVID-19 and other lung ailments can be accomplished using the method that has been proposed. The robustness of the model was further confirmed by statistical analyses of the datasets using McNemar’s and ANOVA tests respectively.


I. INTRODUCTION
The sickness that is commonly referred to as coronavirus (COVID-19) disease is brought on by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is also the virus that is responsible for the ongoing pandemic. A dry cough, myalgia, a sore throat, a headache, a fever, and chest discomfort are just some of the typical symptoms that are associated with COVID-19, which may be categorized as a respiratory disease [1]. After around 14 days, infected individuals often present their full complement of symptoms. As of July 2021, more than 200 countries and territories had reported a combined total of 190 million COVID-19 cases, which resulted in around 4 million fatalities [2]. As a consequence of this, there is now a significant basis for concern among members of the international community regarding the current situation of public health. On March 11, 2020, the disease, which began on January 30, 2020, was recognized as a pandemic, and on the same day, the World Health Organization (WHO) classified it as a public health emergency of international concern (PHEIC) [3]. Even if we have access to a wide range of vaccines at the moment, it will take a considerable amount of time for these vaccines to be distributed all over the world. As a result, the use of visual signals as an alternate method for the rapid screening of individuals who are infected with the virus is a realistic option. It is generally accepted that chest radiographs (CXR) and computed tomography (CT) scans include the most reliable visual indicator of a lung infection that is caused by this virus [3]. Lung infection is the most common symptom that patients experience as a result of exposure to this virus. Radiologists must manually examine these images to search for certain visual patterns that may indicate the existence of the COVID-19 infection. Even while the accuracy of the traditional way of diagnosis has increased over time, it is still possible for it to put the lives of those who work in the medical field in jeopardy. In addition to this, there is an increase in expenditures due to the requirement of diagnostic test kits for each patient. Screening, on the other hand, can be performed with medical imaging techniques such as CXR and CT scans, which, in comparison to other methods, are significantly more time efficient, substantially safer, and easily accessible. CXR image screening is preferred to CT scan screening for COVID-19 not only because it is more freely available, but also because it requires less spending of financial resources to acquire [4], [5].
A manual diagnosis of the virus using a CXR image may take a long time. In addition to the mistakes that are caused by people, there is likely a limited quantity of prior experience, which can result in a significant number of inaccurate readings. Because of this, there is an immediate need to automate such procedures on a massive scale, and this automation needs to be available to all individuals to make the process of diagnosis more efficient, accurate, and quick. Computer vision (CV) and artificial intelligence (AI) technology, especially deep learning (DL) models, has recently been included in recent initiatives [6], [7]. In particular, the usage of convolutional neural networks (CNN) is a helpful method for carrying out medical image analysis. CNN has recently been shown to be effective when applied to the process of assisting in the diagnosis of pneumonia in the chest CXR image of a patient [8], [9].
There has been a substantial amount of research done on the COVID-19 diagnostic technique, the most bulk of which has been on the application of DL models to CXR photos. Using CXR images and a DL network that they named DarkCovidNet, Ozturk et al. [10] developed an automated technique for diagnosing COVID-19. An accuracy of 87.02% was achieved by the model when it was used to categorize the data into three distinct categories i.e., COVID-19, normal, and pneumonia. When classifying into only two groups (COVID-19 and healthy images) it attained a classification accuracy of 98.08%. Hemdan et al. [11] took advantage of CXR and built a network that they later referred to as the COVIDX-Net as part of their research. This network was trained by making use of seven different CNN models, and the validation dataset for the model is comprised of fifty CXR pictures (25 normal and 25 COVID-19 cases). The quantity of validation data that was used in the testing of the model produced an accuracy of 90.00%, which is fairly effective taking into consideration the quantity of data that was used in the testing of the model. The advanced CNN model known as COVID-Net was developed by Wang et al. [12]. They put it through its paces and found that it had an accuracy rate of 93.3% after testing it out. In the study [13] they were successful in accomplishing their objective of achieving an accuracy of 95.38% by employing a combination of ResNet-50 and a support vector machine (SVM) classifier on 50 different samples (25 normal and 25 COVID-19 cases). The researchers Nayak et al. [15] devised a novel method that makes use of DL to screen the COVID-19 chest CXR images. They used a technique known as transfer learning in conjunction with eight of the most successful pre-trained CNN models. These models are as follows: MobileNet-V2, AlexNet, VGG-16, GoogleNet, ResNet-34, SqueezeNet, Inception-V3, and ResNet-50. The ResNet-34 model ended up being the most successful one.
Previous research on COVID-19 identification focused mostly on algorithm-specific aspects and did not pay much attention to the regions of interest (ROI) in CXR images. These ROIs are what reveal the distinctive patterns that are associated with the disease. As a result of this, there is an opportunity for research in this area that centers on making use of the CXR images that are supplied with simply the ROI. Even if it is not accurate, it has the potential to lead to a classification of images that is considerably more precise because it is based on true medical terms. To the best of our knowledge, this is the only work that uses ROI of the CXR images to classify ten different chest diseases such as COVID-19, lung cancer (LC), atelectasis (ATE), consolidation lung (COL), tuberculosis (TB), pneumothorax (PNET), edema (EDE), pneumonia (PNEU), pleural thickening (PLT), and normal. This work is based on three steps: segmentation, extracting features, and classification of the CXR into their respective classes. The preprocessed input CXR images are used in the image segmentation process [16], and information Maximizing GAN is put to good use to retrieve the lung images from those preprocessed photos (Info-MGAN). The feature extraction network uses a combination of deep neural networks (DNNs) and key point extraction algorithms such as Oriented FAST and Rotated BRIEF (ORB) [17] and speededup robust features (SURF) [18] to compute the discriminatory features of the several lung images. The DNNs that were taken into consideration for feature extraction in this work include a simple CNN model as well as a total of seven distinct transfer learning (TL) models. These models are as follows: ResNet-101 & ResNet-50 [12], DenseNet-169 [19], Inception-v3 [16], DenseNet-201 [19], VGG-16, and VGG-19 [20]. At last, several distinct machine learning (ML) methods were applied to categorize the retrieved features into ten distinct types of lungs. Combinations of the chosen models that have the highest levels of performance are investigated, assessed, and compared to determine which application of the suggested method will yield the best results. The contributions of the presents work are given below: 1. CXR images were segmented using the Info-MGAN network. 2. The development of a method for the extraction of features that integrates DCNN with ORB and SURF into a single coherent architecture. 3. The TL and ML models were used for the classification of ten different lung diseases using CXR images. 4. Various combinations of feature extraction techniques and ML models were used for classification performance evaluation.
This work has been divided into further sections: Section II contains modern literature. The datasets and methods that are utilized for image segmentation, feature extraction, and classification are described in Section III. Section IV presents the results of the experiments. This work is concluded in section V.

II. RELATED WORKS
Using a variety of medical imaging modalities, such as sonography, CXR, MRI, and CT scans, one of the most significant duties in DL [14] is the classification of respiratory system disorders. It has been suggested in a few studies that CXR pictures could be used to find COVID-19, which would save time and effort for those working in the medical field [5], [16], [17], [18], [19], [20], [21], [22], [23]. In the trials that are being done at the moment, locating COVID-19 at an early stage of the illness is difficult. The following is a selection of the most significant and pertinent studies about the application of DL and ML models to the diagnosis of various chest infections. The most recent studies on COVID-19 are compared to those conducted on a variety of chest conditions in Table 1.
The findings of this study [21] point to the utilization of a one-of-a-kind CNN model as a potential automatic method for detecting COVID-19 through the usage of chest X-ray pictures. The proposed CNN model is meant to serve as a reliable diagnostic instrument to classify the data into two categories: COVID and Normal. The COVID-19 dataset, which contains 13,824 X-ray pictures, is used to evaluate several architectures, such as the pre-trained MobileNetv2 and ResNet-50 models. The accuracy of the suggested model is compared to that of the existing COVID-19 detection techniques. According to the findings of their experiments, the proposed model can diagnose COVID-19 disease in patients with an accuracy of 96.71 percent and an F1-score of 91.89 percent. Ieracitano et al. [22] offer a fuzzy logic-based DL approach to differentiate between CXR images of patients who have pneumonia caused by Covid19 and images of patients who have interstitial pneumonia that is not related to Covid-19. The model that was constructed for this purpose and is referred to as CovNNet can be used to extract certain relevant features from CXR images if it is paired with fuzzy images that have been made using a fuzzy edge detection technique. Experimental results show that using a combination of CXR and fuzzy features within a DL approach by developing a deep network inputted to MLP results in a higher classification performance (with an accuracy rate of up to 81%).
To impose several distinct binary categories, Narin et al. [23] made use of the methodology of five-fold crossvalidation in their research. Because it has an accuracy level of 98%, a specificity value of 100%, and a recall level of 96%, the pre-trained ResNet-50 technique offers the maximum level of efficiency possible. Oh et al. [5] used a patchbased technique in their work, which allowed them to develop CNN models. Only a small number of data sets were used in the application of the approach. The conclusion was reached by the algorithm by first counting the votes cast by the vast majority of patched classifiers and then utilizing those combined votes. This experiment was conducted on a total of 15,043 pictures, which comprised 8851 healthy individuals, 6012 patients who were diagnosed with pneumonia, and 180 patients who tested positive for COVID-19. By achieving an accuracy of 88.9%, a precision of 83.4%, a recall of 85.9%, an f1-score of 84.4%, and a specificity of 84.4%, CNN was able to accomplish these astounding achievements. CNN has an overall accuracy rate of 96.4% for its forecasts. The use of an 18-layer residual CNN to CXR images was suggested as a potential diagnostic tool by Zhang et al. [24]. The three important contributions that they made to the field at the time served as the foundation for the work that they did later on. They began by obtaining features with the assistance of the CNN module, after which they moved on to working on the classification, and finally, for the very last phase, they utilized the anomalous module to compute the detection score. To accomplish the goals of this investigation, radiographs from a total of 1531 patients were analyzed. The results of one hundred of those radiographs came back positive for COVID, whereas the results of the other 1431 radiographs showed pneumonia infection. They were able to achieve a sensitivity VOLUME 11, 2023 of 96% when it came to the detection of COVID-19 while still maintaining a specificity of 70.65%. Apostolopoulos and Mpesiana [25] developed the CNN model and tested its efficacy on a limited number of datasets before publishing their findings. This allowed them to determine the extent to which the model was successful. This experiment was carried out in parallel on both of the databases that were being used at the same time. The initial database had a total of 1427 images, and 224 of those photographs returned positive results when tested for the presence of COVID-19. Although there were also 700 images depicting bacterial diseases, the remaining shots belonged to people who were in good health. In the second set of data that was collected, there was the same number of COVID-19 CXR images, pneumoniainfected patients, and normal cases as there was in the first set of data that was collected. On the second dataset, we utilized the MobileNetV2 approach, which, out of all of the CNN models, provided the best accuracy of 0.966, the highest specificity of 0.964, and the highest sensitivity of 0.986%.
Tsiknakis et al. [26] presented an innovative automated COVID-19 identification technique that was built on a DL model and was given the moniker Inception-v3. This experiment was conducted on a total of 572 patients, 122 of which were found to have a positive COVID result, 150 of which were found to be normal, and 150 were found to have bacterial or viral diseases. Their classifications are correct 76% of the time out of every 100 times they are used. To extract the features from the dataset, which was comprised of 381 chests of CXR pictures, Sethy et al. [27] used nine different TL algorithms on the dataset. They used SVM to make a diagnosis based on the features of COVID-19 that were received from the virus. When it comes to the process of extracting features from datasets, it has been demonstrated that the ResNet-50 approach is the one that has the highest rate of success. Accuracy values of 95.33% and f1-score values of 95.34%, respectively, are displayed by both a ResNet-50 model and an SVM model. The novel methods of COVID-19 detection based on CXR images have been designed by Saha et al. [28]. They came up with the name EMC Net for their proposed system, which was based on a simple CNN architecture and was designed to extract features from images. After the features had been obtained, they were fed into an ensemble of machine learning [29] classifiers to classify the COVID-19-infected cases. The EMC Net was able to attain an impressive level of accuracy, earning an accuracy of 98.91%. New automatic COVID identification algorithms were developed by Mahmud et al. [30] by making use of a DL model called as CovXNet. To accomplish automatic identification, their DL model made use of a phenomenon known as depth-wise convolution. Those who were diagnosed with pneumonia were evaluated with CXR technology in addition to patients who were found to have normal lung conditions. After this, they evaluated their proposed model by classifying CXR images of COVID-19 and pneumonia based on the results of the test. A slack technique for automatic detection is integrated with the model's gradient-based discriminative localization. The CovXNet model achieve an accuracy of 97.4% when it was employed to analyze both regular occurrences and COVID-19 situations. However, when it was used to analyze all other cases, such as COVID-19 cases, viral infections, and bacterial infections, it only attained an accuracy of 90.2%. To identify COVID-19, Horry et al. [31] applied four well-established TL classifiers to a total of 60,798 photographs taken from different datasets. This number included 60,361 normal cases in addition to 322 patients diagnosed with pneumonia and 115 COVID-19-positive cases. The datasets contained a combined total of 60,798 photos in their entirety. The VGG-16 and VGG-19 models produced solutions to classification issues that were of the highest possible quality consistency in each one of these four models. When it comes to correctly diagnosing COVID-19-positive cases, the VGG-19 demonstrated an accuracy rate of 81%.
In their study [32], they proposed a classification of infected lungs as COVID-19 (+Ve) and non-infected lungs. However, the COVID-19 classification, which is based on CXR, must be performed by a radiology specialist because it takes a considerable amount of time and requires their skill. Because of this, the creation of an automated testing technique is something that ought to be investigated because it will save a significant amount of time for medical professionals. Within the scope of this study, both the theoretical construction and the actual use of a CNN method are dissected and analyzed. In addition to this, the hyperparameters of CNN are modified with the assistance of multi-objective adaptive evolution of differences to produce superior results. Using the ResNet-18 model as a feature vector allowed Zhang et al. [33] to successfully extract acceptable feature representations from the CXR image. This was accomplished by using the ResNet-18 model. After that, those newly formed features were used as input in research on multi-layer perception. This came about as a result of the previous step. To achieve the best possible degree of accuracy, which was 96%, a dataset consisting of one hundred photographs taken by each of seventy different patients was utilized.
Ahmed et al. [34] worked on the raw data, it was improved by deleting several confusing components that had been introduced into it. To prevent new source information from being generalized, however, well-informed models can make use of source-specific confuses to discriminate between COVID-19 and pneumonia. Our models have the potential to provide an AUC of 0.38 in the worst-case scenario, while in the best-case scenario, they have the potential to give an AUC of 1.00 in the data sources. In light of this, it is abundantly evident that additional testing and development are necessary before the clinical deployment on any significant scale. An innovative pipeline for the deep transfer study of COVID diagnosis has been established for 19 patients, as the data that were displayed in the report that was authored by Michail et al. [35] demonstrated. Chest imaging with a CXR and the diagnosis of pneumonia make up the bulk of the foundation for this pipeline. Both of the models that come together to form our model were developed to add a layer of neural blocks to their architecture. It is feasible that the same technique will also prove effective in other cases, such as those in which two competing networks are complying with extra performance requirements. When analyzing our suggested network, we took into mind the difficulties that come when involving two classes (pneumonia versus healthy), three classes (including COVID- 19), and four courses (tuberculosis included). Zhao et al. [36] CT scans are now being developed as prospective testing and screening processes that are not only quick but also costeffective. In this article, we generate a COVID-CT dataset that contains 275 positive COVID-19 scans and make it available to the public to encourage research and the development of profound learning algorithms that can determine whether or not an individual is involved in COVID-19. Examining a person's CT scans, which can either be positive or negative for COVID-19, is the method that is utilized to determine whether or not these methodologies should be used. They design a CNN on this dataset, and it achieves an F1 of 0.85, which is encouraging but still has to be improved upon.
Thakur and Kumar [37] build a method that, by utilizing extensive prior knowledge, will be able to automatically detect and identify the COVID-19 disease. CNN is possible to implement not just one but two different classification systems, which are alternately referred to as binary and multiclass grades. The binary model was trained using a total of 3877 x-ray and CT scans, with Covid-19 serving as a source for 1917 of those photographs. In total, there were 3877 photos used for training the model. The instruction was carried out utilizing a mix of both of the available imaging modalities. There was a 99.64% overall accuracy for the binary classification, a 99% recall, a 99.56% accuracy, a 99.59% F1 score, and a 100% ROC score. The research that is currently being presented requires a substantial number of CT scans to create accurate and demanding diagnosis models, as indicated by the findings that were obtained by, He et al. [38]. They produce a public data set that is made up of hundreds of CT scans of COVID-19 and develops learning algorithms that are both effective and profound in their scope. Our efforts have led to the development of these algorithms, which have the potential to achieve a high degree of diagnostic accuracy for COVID-19 despite the limited quantity of CT images available for analysis. We propose a method for selftransitioning that combines contrastive auto-monitored learning with transfer education to acquire robust and impartial representations to reduce the likelihood of overfitting.

III. MATERIALS AND METHODS
This section consists of an experimental procedure that was conducted for the segmentation and classification of 10 different chest disease CXR images using DL models. For this work, the Info-MGAN network was used for the segmentation. ORB and SURF methods were used to extract the feature. Then, these extracted features were used by DL models for the classification of lung diseases into their respective classes.
A. DATASETS DESCRIPTIONS CXR images have been analyzed to identify between patients who have been diagnosed with COVID-19, normal, LC, ATE, COL, TB, PNET, EDE, PNEU, and PLT. The images were collected from a variety of various websites that were publically accessible to the researchers. The datasets that were utilized for segmentation and classification in this work are discussed as follows:

1) SEGMENTATION DATASET
We used CXR images for the segmentation collected from the chest radiographs dataset [39], which is obtained from [40], to train the Info-MGAN. Another set of CXR was obtained from [41], which has a total of 200 chest radiographs in its database. From the two different datasets, a combined total of 447 images showing the frontal aspect of the CXR were obtained. The primary intention behind the creation of this database is to simplify the process of dividing lung areas. Because of this, we divided the dataset into three separate subsets of the CXR images: one for training, one for validating, and one for testing. The primary dataset contained a total of images that could be accessed, and the testing subset had 45 of those images. This represents approximately 10% of the total number of images in the primary dataset. Following this step, the remaining 403 CXR images are partitioned into 80% for training and 10% for validation. Furthermore, the number of images is relatively low, we made use of affine transformation [42] methods to increase the number of images. Finally, the segmentation network used to generate lung VOLUME 11, 2023   masks from chest radiographs has been trained, validated, and tested.

2) CHEST INFECTION CLASSIFICATION DATASETS
A total of ten publically available multiple chest disease datasets were collected from a wide variety of different sources to train and test the DL models. In the beginning, we obtained 930 CXR infected with COVID-19 from a GitHub repository that had been set up by Cohen et al. [43].
This archive contains images that were obtained from a wide variety of hospitals and other public sources.
Although the complete set of metadata information is not going to be presented in this discussion, the patients who had the covid-19 infection were, on average, about 55 years old. From the SIRM database [44], TCIA [45], radiopaedia.org [46], Mendeley [47], and GitHub source [48], a total of 2371 covid-19 positive CXR were obtained. The RSNA [49] was the source from which the dataset of photographs of pneumonia was obtained. This data set 39248 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   includes a total of 5216 x-rays, 1349 of which are considered normal, and 3867 of which depict pneumonia. The CXR and CT scan images that make up the lung cancer dataset were obtained from [39]. This dataset contains around 20,000 chest X-rays and CT scans. From the dataset, a total of 5000 CXR images of people with lung cancer were extracted; however, the remaining CT scans were not taken into account for this investigation. The CXR of healthy people was obtained from archives Kaggle [50]. A total of 22,060 CXR images were collected from NIH [51] which included 5789 images of atelectasis, 6543 images of consolidation lung, 2793 of pneumothorax infected CXR, 6331 images of edema, and 6046 images of pleural thickening. At last, a total of 700 TB-infected CXR images were collected [52]. The sample CXR images of all these chest infections are presented in Figure 1.

3) IMAGE PRE-PROCESSING, ENHANCEMENT, AND AUGMENTATION
From the above literature, it is observed that processes for image preprocessing are helpful for improved training of DNNs [47]. The CXR images of chest infections were collected from multiple databases. The resolutions of CXR images varied from 660 to 720 pixels in width and 500 to 672 pixels in height. Thus, we scaled the CXR to the fixed resolutions of 224 × 224 to keep the coherence in this study. Before the segmentation network is built for lung masks, each image was manually reviewed to ensure that it matched the required standards for perfection. After the CXR images that were lacking, in contrast, were found, they were preprocessed with the use of adaptive histogram equalization (AHE) [53] and the thresholding operation is presented in Figure 2.
Furthermore, it is observed that the images of all chest infection classes are imbalanced. The Synthetic Minority Oversampling Technique (SMOTE) was used to balance the dataset and prevent the model from producing biased results. A k-nearest neighbor (KNN) method is utilized in the generation of synthetic data by SMOTE. The first step of SMOTE is to select data at random from the minority class (i.e., normal, LC, ATE, COL, TB, PNET, EDE, PNEU, and PLT), and then to determine the data that was selected. Now, the datasets are ready to be fed into the DNN model for the training. Therefore, 80% of the total CXR images are used to train the DNN, 10% of the images are used for validation, and the remaining 10% of the images are preserved for testing purposes. Table 2 provides an all-encompassing summary of the datasets that were utilized for this work.

B. LUNG IMAGES SEGMENTATION USING INFO-MGAN
The GAN models that were suggested by [54] have been widely utilized in the field of image processing to translate an input image into its matching output image. The GAN network is made up of two distinct networks: a generator that can produce photorealistic images and a discriminator that can determine whether an image is false or real based on whether or not it was taken from the training dataset. The generator creates images that are incredibly correct to the source material. The image is next analyzed by the discriminator, which determines if it is fake or authentic. The purpose of the generator is to produce photographs that are as lifelike as they possibly may be. The generator Gen will use a min-max strategy in conjunction with the discriminator to transform a set of noise samples Z from distribution P z into real-world data from distribution P data . Following the completion of the stage before this one, in which the noise samples were loaded into the generator, this transformation will take place.
When the discriminator network is being trained, it makes an effort to differentiate between converted data samples Gen (Z) with distribution P data and actual data samples Y with probability distribution P Y . To achieve this objective, it is necessary to conduct a comparison of the probability distributions of the two different sets of samples that have been collected. The following Equation (1) is a representation of the mathematical formulation that will be used for the minmax GAN goal function: The word A used for expectation, while log stands for logarithmic operations. The basic idea of GAN is elaborated upon by Info-MGAN, which consists of a generator called Gen that generates output data X based on actual data Y and a random noise vector Z. The Info-MGAN is expressed in Equation (2): The Info-MGAN discriminator B, on the other hand, makes an effort to differentiate between the synthesized data and the actual data (X and Y), both of which it receives as inputs in its operation. The objective function of the Info-MGAN system can be expressed in several different ways, including the following Equation (3): It is crucial to understand that the Gen serves more than one purpose, and the primary one is not simply to mislead the discriminator. The output of it should, at the same time, correlate with the actual data in an F1 sense as shown in Equations (4)(5): where the F1 parameter was used for F1 regularization weight while B represents the training discriminator B. The segmentation of the ten chest diseases CXR images was carried out with the assistance of Info-MGAN. In exchange for the CXR that we send to the generator and expect to receive a lung mask that is an exact fit for the patient. The actual CXR images of ten multiple chest diseases and their lung mask that corresponds to the ground truth make up the authentic pair for the discriminator. As can be seen in Figure 3, the Gen, and the D both go through training in an adversarial fashion. 39250 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   To conduct the segmentation of different chest diseases using CXR images, we make use of the info-MGAN model that was presented by Chen et al. [55]. The generator (Gen) in Info-MGAN stands in for the encoder-decoder network, while the discriminator (D) is a Patch-GAN. Figure 4 also includes an architecture design depicting the generator, discriminators, encoder, and decoder blocks that are utilized by Info-MGAN. Regardless the block is Gen and D, the convolutional (Conv), batch normalization layer (BNL), and ReLU make up each encoder block individually. Additionally, each decoder block incorporates deconvolutional, BNL, and ReLU into its architecture Encoder blocks are responsible for producing condensed versions of the data, in contrast to decoder blocks, which produce more detailed versions. The initial three blocks of the decoder are designed to be dropouts throughout both training and testing. This provides the generator with the required amount of random noise so that it can function correctly. In addition to this, the generator VOLUME 11, 2023  network includes skipping connections that run between the encoder and the decoder to facilitate pattern reinforcement on several levels. The output that is produced by the final block of the discriminator is a shape or patch that has dimensions of 29 × 29, and each of the patch's pixels is responsible for classifying a separate portion of the input image. Training an Info-MGAN, which is effectively a U-net architecture [39] on a more general level, is the method that is used to complete the process of CXR image segmentation, as was covered in a previous section of this article. For the training of the generator network, the pix-to-pix approach was applied, and for the training of the discriminator network, the patch GAN-like classifier was put to use. After the training has been completed, the generator will be able to produce lung masks that are suitable for the CXR pictures that have been loaded into it. Figure 5 displays the input CXR together with the ground truth mask and the lung mask that was produced by the Info-MGAN.
Both the ground truth lung mask and the predicted lung mask have a morphological form that is relatively comparable to one another, as is evident from the comparison that can be made between the two. The capture of images of the lungs is made possible by applying masks that were produced based on the CXR that were used as input. Figure 6 is an illustration of the image masks that are used in Posterior Anterior (PA) view CXRs to separate the lungs into their respective sections. Following this, the cleaned-up collection of lung images is put to use in the training of the proposed classification process.

C. EXTRACTION OF HIDDEN PATTERNS AND CLASSIFICATION OF CXR IMAGES OF CHEST DISEASES
In this section, we will investigate the process of automatically extracting crucial features from segmented lung images to arrive at an appropriate diagnosis of ten different chest diseases. The technique of extracting features from an image is one of the most critical processes in the image classification process [56]. It has been observed from the literature that conventional hand-crafted features can be useful for image classification. DL techniques greatly improve classification accuracy in contrast to standard feature extraction approaches. For the present study, a pipeline for in-hand image classification utilizes both DL architectures and more conventional feature extraction algorithms [57]. When deep architectures are employed separately, their classification performance is analyzed in detail. Figure 7 represents the pipeline that is utilized for the classification process. It is based on three blocks; the first block represents the feature extraction layer, the second block contains the MLP layer, and the last block is the final classification layer.

D. EXTRACTING FEATURES FOR TRAINING PURPOSES
Two key components are used for extracting features such as deep CNN and handcrafted features. The process of extracting features from the segmented lung images is shown in Figure 8. For this study, a total of eight DL models including simple CNN, ResNet-101, ResNet-50, DenseNet-169, DenseNet-201, Inception-v3, VGG-16, and VGG-19. The DenseNet models were chosen for this work because of the advantages discussed below. All layers are directly connected, the feature map's size is preserved, identity mapping attributes are integrated organically, it offers both shallow and deep supervision, and it may recycle previously used features. Conversely, VGG models with a depth of 16-19 weight layers and very small size (3 × 3) convolutional kernel showed a significant improvement over the state-ofthe-art (SOTA) models in terms of classification accuracy and validation error [58]. Table 3 presents a detailed summary of the parameters used in implementing the DCNN models and Figure 9 illustrates the architecture of these models. In addition, we also designed a simple CNN model which contains five 2D convolutional layers (ConvL), five 2D maxpooling layers (MPL), and three flattened layers. The purpose of this research is to evaluate the effectiveness of simple CNN and TL models in the process of extracting discriminatory characteristics from lung image segments. The simple CNN, on the other hand, was trained from the scratch, in contrast to the TL models. After the DL model has been trained, the fully connected layer (FCL) is used to compute the features that will be output by the model. Note that the dimension of the computed feature vector expands to 128 × 1 when all 128 units in the FCL are taken into consideration. In the second stage of the feature extraction process, which is depicted in Figure 8, computer vision methods are applied to separate the most significant aspects of the image. These foci are often the elements of the image that have a blobby local appearance. ORB [50] and SURF [52] are two algorithms from the pipeline that we used individually to identify these properties in the segmented lung CXR.
The ORB is a method that is utilized extensively in the field of computer vision for key-point extraction and the development of feature descriptors. To comprehend the ORB algorithm [51], there are four phases involved. To begin, a difference-of-Gaussian function is applied to the complete CXR image to locate the areas of interest over the whole of the image. The subsequent step is known as ''key point localization,'' and it involves the application of a precise model to determine the specific coordinates and size of each prospective key point [56], [57], [58]. By examining the stability metrics of the nodes, crucial nodes are identified. In the final step, directions of local CXR image gradients are employed to establish an orientation to each key point position [58]. In the fourth stage, we compute the local image gradients at the scale that you have selected at each critical point. Figure 10 (a) is a representation of the ORB method after it was applied to the masked lung CXR images as input.  VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. SURF is a feature detector method [26] and is used to determine not just the relevance of features on a geographical scale but also on a pixel level. We can ascertain the dimensions of the key points in the continuous domain and locate them precisely due to the application of the quadratic function fitting technique. A sample pattern is used to describe the neighborhood surrounding each key point (points on scaled concentric rings). Figure 10 (b) displays an example of the output that the SURF algorithm produces when it is given a masked lung X-ray picture as input. The k-means clustering approach is utilized to partition the important locations that were extracted from the images of the lungs into 64 separate categories. After that, we obtain a feature vector with a size of 64 × 1 by averaging the values of the descriptors that are contained within each class. After that, the feature vector (128 × 1) that was obtained from the FCL is attached to the feature vector that was previously described to raise its dimension to the maximum that the DL model will allow for it. The feature vector that is produced once the lungs have been masked has a size of 192 × 1. The aforementioned process is depicted as a block diagram in Figure 8. Next, as a component of our primary pipeline, we make use of a dense MLP model to feed the augmented feature vector (see Figure 7).

E. MLP MODEL
An MLP model is the following constructing element in this pipeline; it is the one that receives as input the combined characteristics from the model that came before it. Following this, the MLP model is trained to make predictions for the unseen right labels by using the labels that have been assigned to the input attributes as training data. The MLP model accepts as input a 192-element vector that is comprised of 128 features derived from the TL model's most recent iteration and 64 features derived from the mean points of the clusters in which the features were extracted utilizing either the ORB or SURF technique. The MLP model is comprised of seven deep layers, and there is no need for any additional hyper-parameters. In the first six layers of the network, ReLU activation functions were utilized. In the seventh and final layer, however, softmax activation functions were utilized. The principles of the MLP network's structure are discussed in Table 4.

F. LAST LAYERS FOR PREDICTING THE OUTPUT
In this investigation, in addition to the softmax layer, we also made use of the quadratic support vector machine (Q-SVM) kernel [57], the AdaBoost [58], and the random forest (RF) [59]. These ML classifiers were utilized in a broad variety of investigations [55], [56], [57], [58], [59] as a result of their efficiency, ease, and affluence of execution in terms of health application. Therefore, it is believed that they are capable of exploiting the features obtained from the Dense layer (see Table 4) of the MLP network to classify lung images into the ten different classes that have been established.

G. PERFORMANCE EVALUATION
Using segmented CXR images, determine the classification performance of the deep features extraction (DFE) approaches to identify the 10 different chest diseases. Once all of the models have been trained, the confusion matrix-based performance parameters are computed using data from each stage of the proposed approach. The methodology designed in this study effectively integrates the handmade and DFE methods. On the testing dataset, the identification performance of the DCNN models is evaluated in terms of many parameters such as accuracy (ACU) [60], recall (REC) [61], false positive rate (FPR), F1-measure, specificity (SPF), precision (PRE), negative predicted value (NPV) [62], and false negative rate (FNR) [63]. The following equations (6-13) are used to measure these parameters.

IV. RESULTS AND DISCUSSIONS
In this section, we show that the proposed approaches for extracting deep features efficaciously identified ten different chest diseases segmented CXR images. After the training of the model has been completed, the proposed method successfully incorporates both the handcrafted and the DFE methods by computing performance parameters based on the confusion matrix.

A. EXPERIMENTAL SETUP
The proposed method was implemented with the assistance of the Keras library [64]. Python [65] was used as the programming language for the procedures that did not have a direct connection to the CNN. The experiment was carried out using a computer equipped with a Windows operating system, an NVIDIA GeForce GTX GPU with 11 GB of memory, and 32 GB of total RAM.

B. TRAINING OF DCNN MODELS
To build the DCNN, we first trained a segmentation network on previously acquired CXR images, then utilized the generator model [66], [67], [68], [69] to segment the new data. Image augmentation was undertaken, which involved making the affine transformation to the segmented CXR images (such as rotations and shears), so that deep classification models could be adequately trained to make use of a large training dataset. Eight different DCNN models were investigated as DFE methods, and their performance was evaluated by combining these models with ORB and SURF-based feature extraction algorithms. The entire dataset was analyzed by using these eight different DCNN models, and the process used to do so was called 10-fold cross-validation. The last layer of classifications was made using several different ML methods, including Q-SVM, AdaBoost, and RF. Loss function [70], [71], [72] convergence for the DCNN models is displayed in Figure 11. We observe from the plots (see Figure 11) that the six DCNN models converge more quickly than the simple CNN model trained from scratch. The present work reports and analyses the training-phase execution times of the DFE models (shown in Table 5  training time of 405s, with an average training time per step of 69.7 ms.

C. RESULTS ANALYSIS OF DFE BY DCNN WITH SEVERAL ML MODELS
After extracting features, the last layers of eight DCNN models were replaced with softmax and several ML models such as Q-SVM, AdaBoost, and RF. To measure the classification evaluation of these models, several metrics such as ACU, REC, NPV, F1-measure, SPF, and PRE were considered for the diagnosis of ten chest diseases. The results of the DFE from eight different DCNN models are presented in Table 6, and the feature matrices generated by these models were categorized by several different ML models. From Table 6, we can see that the six DCNN models outperformed the simple CNN model in terms of ACU, REC, NPV, F1-measure, SPF, and PRE. Further, while comparing the AdaBoost classifier to the other ML classifiers, it is clear that it has the highest obtained classification performance characteristics. The best classification performance is seen when deep characteristics that are taken from the VGG-19 model are fed to the softmax. In this scenario, the values for ACU, REC, NPV, F1-measure, SPF, and PRE are obtained as follows: 96.97%, 96.92%, 96.93%, 96.92%, 96.97%, and 96.97%, respectively. The simple CNN model that used AdaBoost as the last layer had an ACU score of 94.30%, a REC score of 94.42%, an NPV score of 94.26%, an F1-measure score of 94.25%, an SPF score of 94.29%, and a PRE score of 94.28%. It has been discovered that the performance of some other DCNN models is on par with that of the best-performing VGG-19 model.

D. RESULTS ANALYSIS OF DCNN MODELS WITH FEATURES EXTRACTED BY ORB AND SURF
After fusing deep features with the hand-crafted features produced by the ORB and SURF algorithms respectively, the resulting performance parameters are given in Tables 7 and 8. According to Table 7, the incorporated SURF algorithm-based features did not result in a significant shift in any of the acquired classification performance parameters. With Q-SVM as the classification layer, the ACU, REC, NPV, F1-measure, SPF, and PRE scores were 97.11%, 96.81%, 96.48%, 96.23%, 96.51, and 96.25%, respectively, when DFE from the DenseNet-201 model were fused with SURF-based features. Furthermore, while utilizing softmax   Out of all the models tested, it was the ORB VGG-19 model with softmax layer that achieved the highest accuracy rates in classification. The detailed results are presented in Table 8. Figures 12-14 depict average FPR and FNR bar plots, which allow for further investigation into the categorization accuracy of the approach. To get higher classification performance, it is necessary to significantly lower the FPR and FNR values. Figure 12 shows that the lowest FPR value (3.85%) was achieved by VGG-16 when the Q-SVM layer was used as the last one. The VGG-16 softmax function applied to the last layer produces an FNR that is 6.35% lower than any other. The FPR and FNR for VGG-19 with softmax are both significantly lower than average, coming in at 1.81 and 4.33 percent, respectively. According to Figure 13, the combination of ResNet-101 and VGG-16 with SURF and Q-SVM in the final layer achieved the lowest FPR and FNR of 4.85% and 5.71%, respectively. Figure 14 represents that VGG-19 with ORB achieved the lowest FPR and FNR of 1.10% and 2.33%, respectively, when using softmax as the final layer, making it the best combination of the proposed feature extraction models. To further illustrate the class-specific performance of our proposed feature extraction models, we present the confusion matrices for a total of 7000 test CXR images in Figure 15.

E. STATISTICAL ANALYSIS
The feasibility of the suggested model was evaluated using McNemar's statistical test [67] and the analysis of variance (ANOVA) test [68]. This was done by comparing the proposed model to the base classifiers, the probability scores of which were utilized in the process of determining the development of the suggested model. Table 9 presents the findings of McNemar's, and ANOVA tests carried out on the multi-chest disease CXR dataset that was used in the present analysis. To conclude that the alternative hypothesis is  Table 9 show that the p values for all of the dataset samples fall below the 0.05 threshold. The null hypothesis cannot be maintained, according to the results of both of the statistical tests. This demonstrates that the proposed model contains more information from the base classifiers and that its predictions are better, ensuring that it is statistically distinct from the other contributing models. Additionally, this demonstrates that the proposed model contains more information from the base classifiers. Table 10, we compared our best-performing model (ORB & VGG-19 with a softmax layer) to other existing state-of-the-art DCNN methods to classify 10 different chest diseases using CXR images. ACU, REC, and SPF values of 95.11 percent, 93.15 percent, and 96.5 percent, respectively, were obtained through the fusion of features derived from local binary patterns (LBP) with the DFE by utilizing the Inception V3 architecture [56]. These values were derived from images that had been filtered with a Gaussian filter. In terms of classification, features based on MobileNet-v2 have an ACU of 93.33 percent, a REC of 90.66 percent, and an SPF of 95.22 percent. The study [57] introduced a novel model for the classification of COVID-19 using CXR images and they achieved the ACU, REC, and SPF of 95.72, 93.59, and 96.78 percent, respectively. In contrast, our suggested framework integrating the VGG-19 model, the ORB feature extraction method, and the softmax classifier outperformed the existing methods for classifying the 10 different chest diseases including COVID-19, LC, ATE, COL, TB, PNET, EDE, PNEU, PLT, and normal (see Table 10). In addition, it is important to note that the proposed DFE framework classifies ten distinct chest disorders using segmented lung pictures as input. In contrast to the approaches already in use that are summarized in Table 10 and compute features straight from the CXR images, this approach does not work.

G. DISCUSSIONS
In this work, we investigated all three stages such as image segmentation, feature extraction, and classification that are VOLUME 11, 2023   necessary for accurate image classification [5], [6], [7], [8], [12], [16], [73], [74], [75], [76], [77]. The segmentation process began with the use of fundamental supervised learning models, during which we investigated a variety of U-net designs. Despite this, the Info-MGAN model produced the most significant outcomes when compared to the other methods of supervised learning. A more extensive dataset of TB and COVID-19 CXR images was not readily available, so the SMOTE technique was utilized to increase the number of images belonging to the minority groups. The development of a feature extraction pipeline (see Figure 8) has been made by the integration of handcrafted feature [77] extraction algorithms with DCNN models. It was shown that key point descriptor techniques are effective at extracting object intensities from a CXR image [64], [66], [71], [72], [73], [74], [75]. To accomplish this goal, we make use of the key point descriptors (namely SURF and ORB) to locate candidate key intensity points that significantly contribute to the classification of CXR segments. Every single one of these models has demonstrated impressive accuracy, in addition to a convergence of nature and noticeably deeper coverage. DL function (such as softmax) and other ML techniques (such as Q-SVM, AdaBoost, & RF are used to categorize the computed features in the last layer. Both of the offered ways of segmentation and classification have been fine-tuned using data that is publically available [60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72]. We have offered a comprehensive evaluation of the usefulness of the proposed structure, examining its performance both with and without the concentration key point elements being taken into consideration. The findings of the experiments reveal that the DCNN models work more effectively than the simple CNN model consistently [78]. The results of this study substantially indicate the effectiveness of the suggested method (ORB & VGG-19 with softmax) in classifying ten different chest diseases using CXR images.
This technique can be utilized by radiologists as a supplementary resource during the diagnostic process of patient  VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  cases. The fundamental purpose of this work is to establish a diagnostic approach that is both efficient and cost-effective, and which is capable of promptly identifying COVID-19 and other chest disease patients based on their CXR images.

V. CONCLUSION
This study presents a proposed model (ORB & VGG-19 with softmax) with the amalgamation of lung image segmentation and classification to accurately classify the ten chest diseases i.e., COVID-19, LC, ATE, COL, TB, PNET, EDE, PNEU, PLT, and normal using CXR images. With the use of Info-MGAN, we were able to efficiently train the pix-topix algorithm that was used in the segmentation of the lung images. This was accomplished by making use of the CXR images as well as the ground truth masks. To finish the task of image segmentation, the trained segmentation network, which is essentially the generator element of the Info-MGAN, is utilized for the segmentation of the preprocessed images. After the lung images had been segmented, they were loaded into the feature extraction network, which is comprised of DCNN models (such as ResNet-50, ResNet-101, VGG-19, simple CNN, VGG-16, Inception-v3, DenseNet-169, and DenseNet-201) as well as key point identification techniques such as SURF and ORB. The extracted features obtained from DCNN models were then categorized using multiple ML approaches, including AdaBoost, Q-SVM, softmax, and RF, to diagnose ten distinct chest diseases. This study indicated that the classification accuracy was greatest when the VGG-19 model was paired with the ORB key points extraction method, and softmax was applied to the final layer. The model also yielded the lowest average FPR and FNR of all the architectures that were suggested in this study, with respective values of 1.10% and 2.33%. The performance of the suggested method for diagnosing ten chest diseases utilizing CXR images was superior to the state-of-the-art models. In the future, the suggested approach will be trained and tested on CT scan images for the identification of several chest diseases. VOLUME 11, 2023 HASSAAN MALIK received the M.S. degree in computer science from the National College of Business Administration and Economics, Lahore, Pakistan. He is currently pursuing the Ph.D. degree in computer science with the University of Management and Technology, Lahore. He is a Lecturer with the Department of Computer Science, National College of Business Administration and Economics, Multan. He has seven years of professional experience in education and industry. His research interests include blockchain, federated learning, transformers, machine learning, and deep learning.
TAYYABA ANEES was born in Pakistan. She received the Ph.D. degree from the Vienna University of Technology, Vienna, Austria, in 2012. Her Ph.D. dissertation is in the area of service-oriented architecture and web services availability domain. For four years, she was a Project Assistant with the Vienna University of Technology. She is currently the Director of the software engineering program/an Assistant Professor with the Software Engineering Department, University of Management and Technology, Lahore. Her research interests include service-oriented architecture, web services, the Web of Things, deep learning, artificial intelligence, semantic web, cloud computing, the IoT, software availability, software safety, software fault tolerance, deep learning, and real-time data warehousing. MICHAŁ JASIŃSKI (Member, IEEE) received the M.S. and Ph.D. degrees in electrical engineering from the Wrocław University of Science and Technology, in 2016 and 2019, respectively. Since 2018, he has been with the Electrical Engineering Faculty, Wrocław University of Technology, where he is currently an Assistant Professor. He is the author and coauthor of more than 100 scientific publications. His research interests include big data in power systems, especially in point of power quality and optimization in multicarrier energy systems. Currently, he is a Guest Editor of special issues in Energies, Electronics, Sustainability, and Frontiers in Energy Research.