Introduction
In medical imaging, automatic detection and classification of cancers like skin cancer [1], [2], lungs cancer [3], brain tumor [4], stomach cancer [5] and few more are most important research topics from last few decades [6], [7]. From these, stomach is most common cancer name colon. The most common stomach infections are ulcer, bleeding, and polyps. These gastric infections have become a major cause of deaths of humans. A worldwide survey shows that colon cancer caused 525,000 mortalities, and 765,000 deaths occurred due to stomach cancer since 2017. In the United States, currently about 1.6 million people are facing bowel infections, and every year 0.2 million new cases are happening [8]. In the developing countries of the world, 694,000 deaths occurred due to colorectal cancer [9]. This is also known as bowel cancer. In 2015, 132,000 fresh cases of bowel cancer are happened according to an American cancer society [10]. In worldwide common cancers, esophageal cancer is at
WCE is the medical imaging technique to examine the gastrointestinal (GI) tract. This technique is extensively used in hospitals for the detection of gastric abnormalities such as ulcer, bleeding, and many more as shown in Figure 1 [13], [14]. A recent report shows that the treatment of about one million patients has been successfully done with WCE [15]. A small camera is used to capture the images of human gastrointestinal tract. Then the gastroenterologists analyze these WCE images, and it is a time taking procedure. About more than 50,000 images are produced during a WCE examination. A physician required two hours average to analyze these images, and a risk of false detection is also present [16].
Many image processing researchers have developed the automated systems for the recognition of stomach infections from endoscopic images. These systems help in early detection of stomach diseases. The survival rate can be improved by diagnosing the gastric infections at early stage. The fundamental steps of the automated detection systems are features extraction, feature selection, and classification. Different methods for feature extraction utilized by the researchers are include point features [17], texture features [15], HOG features [18], and color features [19], [20]. Convolutional Neural Networks (CNNs) are combined with the handcrafted features to enhance the system’s performance. Different CNN models such as AlexNet [21], VGG-16 [22], and ResNet [23] are used for the deep features extraction. The most important step in image processing is to extract and select the best features for classification. Most appropriate features produce the high accuracy results for the infection detection and classification.
Related Work
Researchers have developed many automated detection and recognition systems. Mainly these are the supervised learning approaches that follows handcrafted and CNN features for detection and recognition of abnormalities in gastrointestinal tract. An esophageal cancer detection method is presented based on the Gabor features and Faster Region-Based CNN (Faster R-CNN). In this method, combined the handcrafted Gabor features with CNN descriptors [11]. Gabor features become more effective when combined with CNN features [24] and various studies have shown the effectiveness when handcrafted Gabor and deep features are combined [25]–[28]. A CNN based model is developed for the recognition of ulcer, polyp, and erosion. CNN features are used together with SVM for the detection of gastric infections. By using this technique, 80% accuracy was achieved [29]. This system utilized the fire segments from SqueezNet. This method reduces the size of network and achieves the accuracy of 88.90%. Billah et al. [30] combine the color wavelet features with CNN features. In classification phase, SVM is used to obtain the results. In [8], Geometric features are utilized from the segmented region of GI images. Then geometric features are combined with the features of VGG-19, and VGG-16. The deep features of VGG-19 and VGG-16 are fused based on the Euclidean Fisher Vector method.
A color transformation based technique is presented in [31]. HSI and YIQ transformation is applied on RGB images and calculate the maximum and minimum pixel values. Then Local Binary Pattern (LBP), and Gray Level Co-occurrences Matrices (GLCM) features are extracted and fused with the color-based features and the final vector fed to the multi-layer perceptron. This technique detects and classifies the stomach infections. A model is proposed for ulcer detection based on YIQ color transformation [32]. This method utilized the Y plane and SVM is used in classification phase. Suman et al. [19] developed a statistical color features based technique for automatic detection of gastric bleeding. A two phase model is introduced for automated detection of ulcer [33]. In first step, a super pixels-based saliency method is utilized, which identify the infected region. In second step, saliency based max pooling (SMP) technique is introduced. The SMP method then combines with locality constrained linear coding (LLC) and obtains the 92.65% of classification accuracy. For the classification of bleeding, polyp, and ulcer K-mean clustering technique was utilized and achieve 88.61% of accuracy. A method was developed based on texture features for the classification of ulcer and non-ulcer. The final feature vector then fed to the SVM classifier and achieved 94.16% of accuracy [10]. Fan et al. [16] introduced a stomach diseases recognition system based on LBP, and Scale-Invariant Feature Transform (SIFT) features.
Discrete Wavelet Transform (DWT), variance, and LBP features are extracted and classified using SVM for the detection of colon infections. Texture information is calculated from these features, and SVM classifier is used to obtain the classification results [34]. Bag of visual Words (BoW) is generated from the features extracted from different color spaces and color histograms for bleeding detection [35]. Features of pre-trained networks such as Inception-V3, and VGGNet are extracted and fused with the baseline features. This method achieved the 96.1% classification accuracy on SVM [36]. A similar method is proposed utilizing ResNet50 features. Feature vector fed to the logistic model tree (LMT) for classification, and 95.7% of accuracy is achieved [37]. A technique is introduced based on color and statistical texture features for detection of GI tract infections [38]. HSI and LAB color transformations are used to detect the bleeding area in WCE images. These color spaces are helpful in bleeding detection. Classification results are enhanced by using multi perceptron learning approach [39]. Shape, texture, and color features were utilized in different studies to detect the abnormal regions [40], [41]. Fisher scoring method was applied to select the feature set with maximum information, extracted from HSV color transformation and texture features. Researchers classify the ulcer and bleeding images using multilayered neural network [42]. Two SVM classifiers based on RGB and HSV color spaces were fused to build an automated detection system [43]. This classifier fusion technique achieved the classification accuracy of 95%.
Challenges and Contributions
In the above listed techniques, it is observed that the most of recent methods follows the fusion process of handcrafted and CNN features. However, this process increases the overall system execution time. Moreover, it is also noted that the existing techniques decreases the classification accuracy in the process of raw images for features extraction. For example, the pixels values of ulcer and original images are almost similar except infected region pixels. Therefore, it is a solution to first extract the ulcer regions from the original frames and then extract its features. Few other challenges are inconsistency of ulcer regions and selection of irrelevant features which cause a problem in accurate infection classification. In this article, a new method is proposed for automated gastrointestinal infections recognition using WCE imaging modality. Major contributions are:
A dark channel along with de-correlation formulation based approach is designed to improve the pixel range of ulcer region.
An optimized saliency based method is adopted along with few morphological operations for ulcer detection.
Using pre-trained deep learning model named VGG16 and extract features using transfer learning. Features are computed from two sequential layers and fused using array-based method.
Best features are selected through PSO-GM meta-heuristic approach and classify selected features using Cubic SVM. The results of both fusion and selection process are computed and analyzed in terms of confusion matrices and graphs.
Proposed Methodology
In this article, a new automated system is proposed for gastrointestinal infections recognition from WCE imaging modality. The proposed system includes few famous steps including preprocessing of ulcer frames through dark channel prior and decorrelation based ulcer visibility improvement, segmentation of ulcer using optimized saliency based method along with morphological operations, deep learning features extraction, selection of best features, and finally classification of selected features. These steps are clearly illustrated in Figure 2. The detail of each step is given below.
A. Data Acquisition
Image acquisition anticipates taking images for validation of the proposed method. In this work, WCE imaging modality is employed for the detection and recognition of stomach infections. These images are obtained from the CUI Wah database [31], which includes a total of 6000 RGB images of WCE modality. A few sample images are also illustrated in Figure 1. These images further include ulcer, bleeding, and normal where apiece category includes 2000 images. In this dataset, ulcer images are separate manually and further utilized for segmentation while the bleeding and normal images are directly supported to the feature extraction step as shown in Figure 2.
B. Ulcer Detection
In the ulcer detection step, the following process is followed- i) dark channel based contrast enhancement; ii) implementation of decorrelation formulation on dark channel enhanced image, iii) implement an existing saliency method on decorrelated image, and iv) perform morphological operations for final refinement. Improve the visibility of an image at the initial stage is an important step to get better detection and relevant features of an infected region. As shown in Figure 3 (a), the original WCE images have dark effects on ulcer regions which mean that the pixel range of infected part is towards 0. For this purpose, we implement a haze reduction based approach [44].
Dark channel enhancement and decorrelation formulation effects- a) original WCE image; b) dark channel enhanced image; c) formulation of decorrelation effects.
Let, we have original WCE frames denotes by \begin{equation*} \tilde {\varphi }\left ({f }\right)=J\left ({f }\right)t\left ({f }\right)+A(1-t(f))\tag{1}\end{equation*}
\begin{equation*} J\left ({f }\right)=\frac {\left ({\tilde {\varphi }\left ({f }\right)-A }\right)}{\left ({\max \left ({t\left ({f }\right),t_{0} }\right) }\right)}+A\tag{2}\end{equation*}
Later, the decorrelation formulation is employed on \begin{equation*} \beta =\tau \times \left ({\alpha -\mu }\right)+\mu _{t}\tag{3}\end{equation*}
\begin{equation*} Cor=inv\left ({\sigma }\right)\times Cov\times inv~(\sigma)\tag{4}\end{equation*}
\begin{align*} S_{f}\left ({k,k }\right)=&\frac {1}{\sqrt {\lambda (k,k)}} \tag{5}\\ \tau=&\sigma _{t}O_{m}S{(O_{m})}^{\prime }\tag{6}\end{align*}
\begin{equation*} \beta =\mu _{t}+\sigma _{t}O_{m}S\left ({O_{m} }\right)^{\prime }inv\left ({\sigma }\right)\times (\alpha -\mu)\tag{7}\end{equation*}
\begin{equation*} Sal\left ({\beta _{q} }\right)=\sum \nolimits _{\forall \beta _{i}\in \beta } {d(\beta _{q},\beta _{i})}\tag{8}\end{equation*}
\begin{equation*} Sal\left ({cl }\right)=\sum \nolimits _{k=1}^{K} {p_{j}d(cl,c_{k})}\tag{9}\end{equation*}
\begin{align*} Th\left ({x,y }\right)=\begin{cases} 1&if~Sal\left ({cl }\right)\ge Thr \\ 0&Otherwise \\ \end{cases}\tag{10}\end{align*}
Ulcer detection results- (a) implementation of decorrelation formulation as an input; (b) saliency estimation; (c) binary image through thresholding function; (d) refinement function and morphological operations based effects, and (e) mapped results.
C. Deep Learning Features
Classification is a key challenge in machine learning but the performance of this always depends on the nature of input data like features [47], [48]. The power of ML depends on the number of training data; however, a lot of samples are noisy and irrelevant which generates noisy and inappropriate features [49]. Through, these features, the performance of a system is decreased which is a key issue in this area [50], [51]. In medical imaging, the selection of best features is more important for classification [52]. The key challenge in the classification phase is how to select the most discriminant features for the final classification. In this work, we are using deep learning features. A pre-trained deep learning model named VGG16 is employing and re-trained with the help of transfer learning on collected WCE dataset. Later, an important texture features are concatenated with deep features and applied optimization of features using PSI-GM approach. The selected features are finally classified using Cubic SVM classifier. A flow diagram is showing in Figure 5.
1) Pre-Trained Deep Learning Model
A pre-trained CNN model named VGG16 [] is employing in this work for deep learning features. Originally, this model consists of five convolutional layers and 3 fully connected (FC) layers along with a Softmax layer for final classification. The layers are max pooling and ReLu. Between FC layers, a dropout layer is added of value 0.5. This model take an input image of size \begin{equation*} \lambda _{(t)}^{(k)}=\sigma \left ({\psi ^{\left ({k }\right)}\lambda ^{\left ({k-1 }\right)}\left ({t }\right)-\beta ^{(k)} }\right)\tag{11}\end{equation*}
\begin{equation*} \sigma \left ({k }\right)=\max \left ({\mathrm {k,0} }\right),\quad k\in \mathbb {R}\tag{12}\end{equation*}
2) Transfer Learning Based Feature Extraction
In this article, we are employing transfer learning to re-train this model on WCE images. In the TL, the same parameters of original model are utilized to train a new model. The main purpose of TL is to solve the problem of much time for training a new model from scratch. Based on TL based training, it is easy to train a model with less computational time. After retraining this model on WCE images, the activation is employing on last two FC layers for features extraction. The size of resultant vector of each layer is
3) GRAY Level Difference Matrix (GLDM)
The GLDM features [53] are represented as absolute difference among two gray level pixels of an image. In this method, three core parameters are required such as difference, distance, and angle. Mathematically, it can be formulated as:\begin{align*} D_{v}=&(\nu _{k},\nu _{l}) \tag{13}\\ I_{n}\left ({k,l }\right)=&\left |{ I_{n}\left ({k,l }\right)-I_{n}(k+\nu _{k},l+\nu _{l}) }\right | \tag{14}\\ \left ({k,D_{v} }\right)=&Prob(I_{n\left ({0 }\right)}\left ({k,l }\right)=1)\tag{15}\end{align*}
D. Features Fusion and Selection
After that, a simple array based method is proposed to combine both sequential layer features in one matrix. The main purpose of fusion process is to obtain more informative feature vector for best classification. Mathematically, this process is explained below:
Let \begin{equation*} \sum \left ({\Delta }\right)=\sum \nolimits _{i=1}^{3} \sum \nolimits _{j=1}^{N} \left ({\Delta X_{i}^{j} }\right)\tag{16}\end{equation*}
\begin{align*} \Delta X_{N}=&(\Delta X_{1},\mathrm {\Delta }X_{2}) \tag{17}\\ \Delta X_{N}=&\left ({{{\begin{array}{l} \Delta X_{1}\\ \Delta X_{2} \\ \Delta X_{3} \\ \end{array}}} }\right)_{N\times \sum \left ({\Delta }\right)}\tag{18}\end{align*}
\begin{equation*} {Vl}_{i,j}\!=\!{Vl}_{i,j}\!+\!c_{1}r_{1,j} \left ({\varphi _{i,j}^{pbt}\!-\!\varphi _{i,j} }\right)+c_{2}r_{2,j}\left ({\varphi _{j}^{gbt}-\varphi _{i,j} }\right)\tag{19}\end{equation*}
\begin{align*} \varphi _{i}=&\left ({\varphi _{i,1},\varphi _{i,2},\ldots,\varphi _{i,n} }\right)^{T} \tag{20}\\ {Vl}_{i}=&\left ({{Vl}_{i,1},{Vl}_{i,2},\mathrm { }\ldots,{Vl}_{i,n} }\right)^{T}\tag{21}\end{align*}
\begin{align*} \varphi _{i,j}^{pbt}=&\left ({\varphi _{i,1}^{pbt},\mathrm { }\varphi _{i,2}^{pbt},\ldots.,\varphi _{i,n}^{pbt} }\right)^{T} \tag{22}\\ \varphi _{j}^{gbt}=&\left ({\varphi _{1}^{gbt},\varphi _{2}^{gbt},\ldots.,\varphi _{n}^{gbt} }\right)^{T}\tag{23}\end{align*}
\begin{equation*} Fitness=\sqrt {\sum \nolimits _{i=1}^{C} \left \{{{(\mu _{i}-\mu _{0})}^{t}\left ({{(\mu }_{i}-\mu _{0} }\right) }\right \}}\tag{24}\end{equation*}
Results and Analysis
In the experimental process, a Privately collected dataset is employed as a detailed description is provided in Section 4.1. The images in this dataset are complex like low brightness of ulcer regions and similarity of pixels. In the evaluation step, the performance of Cubic SVM is compared with a few other classification techniques as illustrated in Figure 7. A 10-Fold cross validation is performed for both training and testing data [55], where selected ratio is 70, 30. For performance analysis, standard parameters are employed- sensitivity (Sen), precision (Pre), F1 Score (F1-S), area under the curve (AUC), FP rate (FPR), and accuracy. All simulations were employed on MATLAB Tool with Deep Learning (DL) toolbox name Matconvnet. This tool was employed for CNN feature extraction.
A. Numerical and Visual Results
The numerical results of this work are presented in this section. The results are acquired in two different steps. In the first step, fusion of extracted features based experiment was performed and results are given in Table 1. In this table, it is clearly illustrated that the recognition results on CSVM are best in the form of standard calculated parameters such as Sen (96.42%), Pre (96.20%), F1-S (96.31%), AUC (0.992), FPR (0.019), and accuracy is 96.50%, respectively. In Figure 8, the performance of CSVM can be confirmed by calculating the true positive rates (TPR). In this figure, it is indicated that the normal class gives maximum of 97% TPR. Compared with other classification techniques, it is observed from Table 1 that the second best performance in the fusion process is 96% (accuracy) which is achieved on MGSVM. While, the lowest attained accuracy is 86.80% for EBT classifier. On the remaining classifiers such as LSVM, QSVM, Co-SVM, FKNN, EBT, and DT, the accuracy performance is 93.40%, 94.92%, 90.14%, 91.40%, 86.80%, and 87.42%, respectively. Overall, it is observed that the fusion of all calculated features was performed well.
In the second step, the best selected features based on PSO-GM approach are employed for experimental process and results are given in Table 2. In this table, it is clearly illustrated that the recognition results on CSVM are best in the form of standard calculated parameters such as Sen (98.33%), Pre (98.36%), F1-S (98.34%), AUC (1.00), FPR (0.007), and accuracy is 98.40%, respectively. In Figure 9, the performance of CSVM can be confirmed by calculating the true positive rates (TPR). Compared with the other classification techniques, it is observed from Table 2 that the second best performance in the fusion process is 98.20% (accuracy) which is achieved on MGSVM. While, the lowest attained accuracy is 91.20% for EBT classifier. On the remaining classifiers such as LSVM, QSVM, Co-SVM, FKNN, EBT, and DT, the accuracy performance is 96.60%, 97.90%, 93.80%, 93.00%, 91.20%, and 91.60%, respectively. Overall, it is observed that the selection of best features through proposed approach gives sufficient accuracy on all listed classifiers.
B. Discussion and Comparison
Initially, the visual analysis is performed of proposed numerical results as presented in Table 1 and 2. From these tables, it is observed that the selection of best features through proposed PSO-GM based method given better results as compare to fusion process. The comparison in the form of fused features based accuracy and selection based accuracy is plotted in Figure 10. In this figure, it is observed that the selection accuracy is almost 3% to 4% increases as compared to fused vector results. Furthermore, the visual effects of each step as listed in Figure 2 are also shown such as in Figure 3, the effects of contrast improvement and implementation of de-correlation formulation are illustrated. Later on, through saliency based ulcer segmentation is performed that further refined through morphological operations as shown in Figure 10. In this figure, the final output mapped results are put in features extraction step instead of whole ulcer image. The purpose of this step is to get the more relevant features which are dissimilar as compare to normal image features. The whole architecture of features based recognition is shown in Figure 5. Overall analysis, it is observed that the selection of best features based recognition process gives better accuracy on CSVM as compared to other classifiers. Moreover, it is also observed from the results, given in Table 1 and 2, the accuracy of fused vector is lesser related to selection vector.
In the addition, we compare the feature fusion and selection results with few deep learning models. In this comparison, we extract features from original models and perform classification without any selection approach. Results are plotted in Figure 11. In this figure, it is showing that the feature selection process gives improved performance as compared to other deep learning models. Also, this figure shows the choice of VGG16 pre-trained model for deep feature extraction for WCE images.
Comparison of proposed results with existing deep learning models for WCE images.
The visualization of selected deep feature is illustrated in Figure 12. In the last, we compare the proposed fusion and selection results with few latest techniques to further authenticate the performance of proposed architecture. In [20], authors achieved an accuracy of 97.89%. Liaqat et al. [14] attained accuracy of 98.49% on same dataset while our method achieved an accuracy of 98.40% where number of images in the updated dataset was higher than [14]. From the results, it is clearly illustrated the achievement of proposed scheme.
Comparison of proposed results with existing deep learning models for WCE images.
Conclusion
A fully automated CAD system is proposed in this work for stomach infection diagnosis and classification using deep learning. In the proposed design, initially, an ulcer is detected through a saliency-based method and next step deep learning and GLDM features are extracted. The array based approach is employing to fused these features and optimize the resultant vector using PSO-GM evolutionary approach. The selected features are classified using Multi-class Cubic SVM and achieved an accuracy of 98.40%. Based on the results, it is concluded that the process of ulcer detection instead of direct used for features extraction gives most discriminative features as compared to features computed through original images. However, it is also noted that this process increase the system computational time which is a key limitation of this work. Besides, it is concluded that the reduction of fused features through Meta heuristic approach returns more informative features which helps in better recognition performance. The main limitations of this work are- (i) incorrect segmentation of ulcer regions create a problem in false training of a deep learning model and also responsible for extracting irrelevant features, (ii) selection of features using evolutionary techniques consume higher time as compare to heuristic techniques. In the future studies, our focus will be on minimizing the computational time.