Deep Convolution Neural Network for Big Data Medical Image Classification

Deep learning is one of the most unexpected machine learning techniques which is being used in many applications like image classification, image analysis, clinical archives and object recognition. With an extensive utilization of digital images as information in the hospitals, the archives of medical images are growing exponentially. Digital images play a vigorous role in predicting the patient disease intensity and there are vast applications of medical images in diagnosis and investigation. Due to recent developments in imaging technology, classifying medical images in an automatic way is an open research problem for researchers of computer vision. For classifying the medical images according to their relevant classes a most suitable classifier is most important. Image classification is beneficial to predict the appropriate class or category of unknown images. The less discriminating ability and domain-specific categorization are the main drawbacks of low-level features. A semantic gap that exists between features of low-level as machine understanding and features of human understanding as high-level perception. In this research, a novel image representation method is proposed where the algorithm is trained for classifying medical images by deep learning technique. A pre-trained deep convolution neural network method with the fine-tuned approach is applied to the last three layers of deep neural network. The results of the experiment exhibit that our method is best suited to classify various medical images for various body organs. In this manner, data can sum up to other medical classification applications which supports radiologist’s efforts for improving diagnosis.


I. INTRODUCTION
Due to increase in digital devices and advancement of camera technology, there is an exponential increase in the production of medical images. Nowadays modern hospital uses a digital image to predict patient disease intensity. With the rapid development of digital images, image classification has become a significant challenge with a large number of medical images. Therefore, classification methods are required to assign these medical images the most relevant class according The associate editor coordinating the review of this manuscript and approving it for publication was Liangxiu Han . to their similarity. In a medical image classification domain, there are images of different body organs such as computed tomography images (CT-Scans), X-Rays (electromagnetic waves), positron emission tomography (PET) scan is type of imaging test that helps reveal how your organs and tissue of the body are functioning. Magnetic resonance imaging (MRI) is another kind of scan that can produce and gave the complete information and clearly defined the pictures of the parts of the body including brain. Due to large volume of medical images, it is nearly impossible for a doctor or physician to classify images with manual effort. Other than the categories discussed above, there are different images of body parts such VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ as chest, knee, bladder, eye, breast, colon etc. are examples of this scenario is shown in Figure 1.
In recent years, handcrafted feature representation are widely used in medical domain and many techniques have been reported by using the fusion of low-level features. Low-level features were obtained and evaluated along with single features, texture features, or based on colour features and multiple feature to combine the powerful representation of images. Therefore, classification performance is only dependent on the representation of features which is based on the images. These features are important for the classification but handcrafted features to maintain consciousness of the prior knowledge. This is arduous task to create the hand-craft features and needs labelling the data for training and tuning. A data driven method for image representation and classification exploration, would be more vigorous to the diversity of image modalities and diseases although it has less susceptible to human domain-specific subjectivity.
In the work of [1], author used boundary-weighted adaptive neural network for classification purpose. The suggested method helped to reduce the complexity among prostate and other structures. This method showed more accuracy for segmentation of prostate. However, the evaluation of the method is performed on small datasets which makes the application limited. Qikui et al. used particle swarm optimization with the help of shape features are suggested for segmenting the outer and inner boundaries of the bladder wall [2]. To segregate the weak and strong boundaries the authors utilized gradient information. A segmentation method is proposed to extract intra and inter-slice data of the prostate using the bidirectioal convolutional recurrent layers [3].
The main objective of medical image classification research is to accurately classify the medical images of different body organs according to the relevant classes, therefore, training-testing methods are required to solve the problems. For the purpose of classification, representation of images play an important role. Feature representation is a crucial step in medical domain. The learning of our method is done by using a set of training images and features, while the output is reported based on values obtained from the test dataset. To represent the data, low level visual features such as color, shape, spatial layout, and texture are commonly used to decode the images as feature vectors [4], [5]. Most of the low-level features are domain-specific that results in a semantic gap between the visual high level features and their representation. In the last two decades, significant research has been conducted to reduce the semantic gap which is a major requirement to develop a reliable system closer to the human visual system. Machine learning techniques are applied to the low-level image visuals to perform image classification.
In this regard, the research of computer vision and image classification is shifted to deep learning techniques that are considered as more reliable than traditional machine learning approaches to enhance the performance of image classification and retrieval problems [6]. The working theme of the deep neural network is inspired by the human brain. The deep neural network architecture consists of multiple layers of neurons. The information is passed through the complex networks that work like human brain where neurons are the processing units. Different deep learning techniques are applied in various applications for image and video classification [7], [8] such as visual tracking [9], speech recognition [10] and natural language processing [11]. Inspired from the efficient and reliable performance of deep learning framework, Deep Convolutional Neural Network (DCNN) is deployed for image feature extraction in the medical domain as the applied area of this research. The main theme of this paper is to classify 2D images of the medical image dataset. The dataset used in this research consists of different classes that are assigned labels with respect to the image class. The pre-processing is applied as the first step to resize the images to 224 × 224 pixels and to remove image noise different operations performed. After pre-processing, all the images are of same size. In this regard, the images are alienated into different classes which is depend on different body part or organ as shown in figure 1. Therefore, a task is to classify each image in to relevant class. DCNN is applied to formulate the features and classifier from the given training data in an end to end learning mechanism. Thereafter, a DCNN framework is applied to classify the images into the relevant class.
The main contributions of this research are mentioned as follows: I. Developed a dataset of different body organs using multiple free online databases of medical images. II. A novel deep-CNN algorithm is developed by using training-testing method for medical image classification problem. III. Accurately classify the image taking as input to a classifier and label with the relevant class name. IV. The features which is learned from training images are further used to present a highly efficient medical image classification system based on huge collection of medical dataset. This remaining paper is organized as follows: Section II presents the literature review regarding deep learning, section III is about the proposed method, section IV describes the experimental results and section V is about the conclusion.

II. LITERATURE REVIEW
In the past decades, two major approaches are commonly used for image classification. In the first approach, features are extracted, and traditional machine learning approaches are applied for image classification. In the second approach, features are automatically extracted by using a deep CNN through of hidden layers [12], [13]. The traditional approaches are domain-specific and the performance degrades if there is a change in the domain or increase in the number of classes [14]. Low-level features-based on the method such as color, texture, shape is spatial layout are the common examples of these approaches. In the past, significant research is performed by using the low-level visual features and traditional machine learning approaches [14]. The author applied Bag of Visual Words (BOVW) method by using Local Binary Pattern (LBP) and Scale Invariant Feature Transform (SIFT) to classify the medical images. The feature fusion of LBP and SIFT are compared with traditional feature extraction and machine learning approaches [15].
The recent research trends for medical image classification are now shifted to DCNN [16]. Author introduced a method based on hyper dimensional computing using different binary vectors to represent objects [17]. According to the work in [18], the high-dimensional feature vector based on a binary vector with more computations in feature vector computation is beneficial for image classification-based representation.
In the work of [19], author proposed to extract the handcrafted features which is based on complicated texture and its intensity distribution problem. Author described to resolve the segmentation problem using local smooth regularization technique which is not depended to the process of morphologic methods. In the work of [20], author proposed a clean and effective feature fusion adversarial learning network to mine useful features and relieve over-fitting problems. Firstly, we train a fully convolution autoencoder network with unsupervised learning to mine useful feature maps from our liver lesion data. Secondly, these feature maps will be transferred to our adversarial SENet network for liver lesion classification.
Supervised learning and unsupervised learning are the common trends that are applied to image classification-based problems. In the supervised learning approaches, the data is represented as D = x, y in which x and y are the variables of input features. In training phase, the optimized values are computed so that the best value from the test data can be predicted by using the function L (Y, yˆ), where, yˆrepresents the output. In case of unsupervised learning approaches, the unlabeled data is used for training and testing for the classification. Traditional neural networks use transfer functions that are known as tangent hyperbolic function. Multi-Layered Perceptions (MLP) are the example of neural networks with multiple layers. The transfer function can be expressed as: In the above equation, Z P is the combination of rows Z x and x is constant, n shows the numbers of bytes while P is the network layer. A Neural Network (NN) with many VOLUME 8, 2020 hidden layers is considered as Deep Neural Network (DNN).
In DCNN, activation functions determine the output of a deep learning method that can be expressed as: Here, Z P shows the value of the weights of the output layer that is linked with the layers of MLP. The Maximum likelihood with stochastic gradient is applied to compute the optimal value of θ. The small patches from the dataset are used by stochastic gradient descent for gradient value. The second parameter limits the tag negative log likelihood by using maximum likelihood and is expressed as: In [21], one of the main problems while using a DCNN is the high computational cost and requirement of large-scale image dataset for learning. A DNN with optimal parameters setting with enough samples of training data can outperform the existing hand-crafted features. Figure 2 (a-e) represents different configurations of DNN with their respective names/abbreviations.

A. CONVOLUTIONAL NEURAL NETWORK (CNN)
In CNN, the weight values of the networks are set in a way that all operations on the images are performed by the network. Due to this reason, this framework does not require additional detectors if there is a spatial change of an object within the image. The number of parameters of the learning network is also reduced in this case. Figure 2 represents a one-dimensional CNN with all layers of network by using different kernels. The weight W and biases B can be mathematically expressed as: The feature vector is generated by using these weights and biases. The process is repeated for each layer of the network in same manner. CNN architecture is different from MLP as the integration of pooling layers in CNN with pixel values of neighborhood use an invariant function such as mean and max. Through this scheme, there is an increase in translation invariance that increases the receptive field of convolutional layers. The last layer on CNN is fully connected and that points to the weight value. In MLP a soft-max function is used for classification problem that is activated in the last layer of the network.

1) MIC USING DEEP CNN
Alex Net and LeNet [22] are considered as popular architectures for the classification of deep CNN. Both are considered as lightweight CNN with two, and five CNN layers with different kernels used for input and output purposes. In Alex Net, hyperbolic functions are not used for activation and linear units are used in CNN layers. Later on, deeper networks with more numbers of the hidden layer are introduced for image classification and analysis [23]. In these architectures, smaller size kernel functions are used with the respective parameters instead of using larger kernels. Due to this reason, these networks require a smaller memory for storage and interfacing. Some of these networks are used in devices with limited memory and computation power like hand-held devices. According to [23]- [25], VGG19 is an example of deep network with 19 layers with small size kernels for all layers. An efficient deep network is the one that requires less training time and contains reduced number of parameters. Christian Szegedy et al. proposed a deep network method with 22 layers that is named as GoogleNet and is also known as inception [26].
Wei Shen et al. proposed a method that depend on different sizes of convolutional layers with smaller sized kernels and a smaller number of parameters are used to classify multiple classes. In the work of [27], ResNet architecture won the image-Net challenge and different sizes of ResNet blocks. The learning of each layer is dependent on residual blocks using the identity function rule. According to literature [28]- [30], Alex Net and VGG are used in various applications for medical image classification and shown state-of-the-art classification results. The selection of an appropriate network for a specific application of medical image classification is still an open research area. In deep network architectures, different layers of the network are joined at the terminating layers. Multiple tasks are divided among layers by using dual path-way architecture [31]. The context of any digital image provides an important clue about category-wise classification. In deep networks, numbers of patches are increased due to the available information about image context and this operation required more memory. The recent research of medical image classification is focused to enhance the classification accuracy of networks by using the architectures with more number of hidden layers [31]- [33]. Some medical image processing techniques require 3D data and computational complexity increases when deep network is dealing with 3D images. 3D image is processed in the network in the form of slices which carry information travelled within the layers of deep network [34].

B. APPLICATIONS OF DEEP-LEARNING WITH MEDICAL IMAGES
Deep learning is widely used in medical domain and popular architectures for classification of deep CNN.

1) TEXT IMAGE/ CLASSIFICATION
The first main contribution to the field of medical image analysis is the shape of the image. Depending on the application, single or multiple images are taken as the input and output is the diagnosis. In this technique, smaller size datasets are used during iterations, while in computer vision, a large amount of data is used with millions of samples. Deep convolution neural networks are also trained by using transfer learning mode, and for this case, a pre-trained deep-network is used on a new set of images to perform image classification-based tasks. There are two main types of learning which are: 1) use of the pre-trained deep convolution network for feature extraction, 2) Accurate fine-tuning of a trained deep network for image classification-based problem. The key advantage of transfer learning is the low training cost of the network, as the network is pre-trained and requires new samples according to the domain requirement. According to the literature [35], [36], feature extraction with fine-tuning can enhance the classification accuracy of a deep network. The authors reported the results for multi-class knee image dataset [35]. While in [36], results are reported for cytopathology image while dealing with a medical image classification-based problem. According to the work of [30], the pre-trained GoogleNet v3 with fine-tuning can outperform another deep network for the medical image classification case. Stack autoencoder (SAE) and Restricted Boltzmann Machines Simplified (RBMs) are examples of unsupervised classification techniques that are initially used for medical image datasets.
Medical image classification researchers are reported for neuro imaging and procedures based on DBNs and SEAs are used with Magnetic Resonance Imaging (MRI) for the patients who were suffering from Alzheimer disease [37]- [40]. The recent trends for medical image classification are shifted to the use of CNNs [41]. CNNs have shown good results for many cases including MRI and Computed Tomography (CT). 3D image slices have used the input to cure Alzheimer patients. This disease damages the human mind. MRI images are used with a CNNs framework for image classification-based problem [42], [43]. To enhance the performance of image classification, new layers are added in the deep network architecture so that the network can extract features using edge to edge, edge to node and node to graph layers. From all the above-stated discussion it can be concluded that CNNs can be explored further in medical domain for image classification [44], [45].

2) OBJECT AND LESION CLASSIFICATION
For the purpose of object classification, we can divide a medical image into two or more classes such as brain image classification for CT-scan images. Local and global contextual information is extracted from the images to identify the position of lesion that is beneficial for image classification. This approach is unexplored with deep learning algorithms. Many researchers have studied multi-scale fashion using multi-stream algorithms. Shen et al. [46] proposed a deep CNN by using fusion patches from start to last layers of the feature vector. Kawahara and Hamarneh [47] increased the number of layers for CNN and address the problem when there is a difference in sizes of image/difference of image resolution. In [48], a hybrid approach is applied by using a combination of CNNs and RNNs to divide multi-level nuclear images. The combination is beneficial as contextual information from different sizes of images can be extracted through this fusion.
For medical image classification-based problems, 3D data must be supplied at the input of CNN so that more information can be derived to obtain a high efficiency as output. Different custom architectures are proposed in the literature VOLUME 8, 2020 to extract the 3D data with CNNs [49]. In [49], multi-level CNNs architecture is applied to obtain the interest points in chest CT-scan images. The researcher used nine patches in convolutional layers and joint the entire fully connected layer just before the output layer. Research for medical image classification domain is also reported while using RBMs [50], [51], SAEs [51] and CSAE [52]. The main difference between these techniques is unsupervised pre-trained networks autoencoders. In [53], a hybrid approach is proposed that is based on the use of both supervised and unsupervised feature-based training without the use of handcrafted features. The experimental results presented in further validates that the MIL framework outperformed handcrafted features. According to the literature, the medical image annotation is still a challenging research task for the community of computer vision researchers. From the above discussion, it can be concluded that the use of a pre-trained CNN is a recommended framework for classification.

III. PROPOSED METHOD A. DATASET
In the proposed method, we have selected a publicly available dataset that consists of images with different classes of human body parts such as chest, breast etc. There is a total number of 12 classes and it is important to mention that 11 classes are collected from a public dataset of cancer image archives, while 12 classes are collected from Messidor that is collected from an open-access website of knee images. Each class contains 300 images and for the whole dataset, there are 3600 images with 12 different classes. We used a training-testing framework; therefore, we selected a random number of 70 percent and 30 percent as train-test ratio in our case. Therefore, 2520 images are used for training and 1080 are used for testing. There is no common image used for both the training and testing process. In the first step, all images are converted from DICOM (Digital Imaging and Communication in Medicine) to JPG format. Due to the increase in variation, there can be a more complex feature matrix for the neural network to classify more accurately. To handle this issue the intensity normalization was applied before feeding the data to the convolutional neural network.

B. PRE-PROCESSING
Nowadays, deep learning is widely used in many applications, which are based on artificial neural networks and treated like as human brain. The feed forward neural networks, comprising of many hidden layers are good examples of models with deep architecture. All the images are resized to achieve the same distance scale that is physical space represented by each pixel in an image. After applying the normalization, each image was converted to 224 × 224 dimension, an input image is 224 × 224 which is fed to first convolution layers. The first convolutional layer is applied using the kernel of 4×4 with the same padding, stride of 1 and 8 filters by using the activation function of ReLu. All the images are resized to 224 × 224 pixels so that they can be used as an input for GoogleNet method as shown in Figure 3.

C. REUSE PRE-TRAINED NETWORK
Characteristically, DCNN image classification techniques have depended two phases, one is feature extraction and second is classification module. To extract the feature by providing the training images and learned after that classifier of sofmax layer is used from the training image data in an end-to-end learning framework. The model used for training consisted of multiple layers, out of which five are convolutional layers and three are fully connected layers, as depicted in Figure 4. The algorithm of deep learning learns from the images as low-level, mid-level and abstract features directly which is against for handcrafted features. A training set of images are used as an input for GoogleNet pre-trained method to extract the deep features. In our case, we used 144 layers including convolution layers, fully connected layers and applied set of images with different modalities. We relocated features of the GoogleNet method by applying both  training and validation from our medical images dataset. The aim of soft-function is to perform re-learning according to 12 classes of the dataset. The effective rate to start training for method layers is from 0.01 to 0.0001 using stochastic gradient descent. GoogleNet is already trained for 1000 different semantic classes by using features from a large-scale dataset. Transfer learning is a technique of deep networks through which we can re-train a network according to the new levels through fine-tuning of parameters. In fine tuning process, features are extracted from the given set of images. Max-pooling layer is applied to reduce the size of the input using the kernel of 4 × 4, same padding, the stride of 1 and 8 filters. The output of this pooling layers gives 112 × 112 dimensional output. The centers of neighborhood neurons in the kernel map is a distance which is called stride. The output of first convolutional layer is fed to a nonlinearity after that the spatial max pooling layer for summarizing neighboring neurons. Rectified linear unit (ReLU) is used for nonlinerity to the output of all fully connected layers. The training and validation results from the proposed DCNN model is shown in Figure 11.
For a pre-trained GoogleNet method, images are used as input to replace the last three layers that are loss3-classifier, prob, and output, with the aim to re-join these layers with rest of the network. In our case, we added a fully connected layer, a soft-max layer, and a classification output layer to the pre-trained GoogleNet method. Soft max function to a N-dimensional vector, which scales the values of the vector in the range of (0, 1) and its summation gives a value of 1 to represent the each class. The last fully connected layer is of the same size that is a number of classes on our dataset that is 12. VOLUME 8, 2020 To enhance the learning process of GoogleNet, we increased the learning rate factors of the fully connected layer by connecting the transferred layer with a remaining network as shown in Figure 5. For learning process, stochastic gradient descent method is used with a low learning rate of 0.0001 with maximum of 30 epochs. In stochastic gradient descent method used an objective function which is known as negative log likelihood and it is very efficient in learning discriminative linear classifiers under a convex loss function like SVM or Logistic Regression. This function is widely used for classification problems. Once the DCNN model is successfully optimized and trained for classifying the medical images.

IV. EXPERIMENTAL RESULTS
In this manuscript, a popular and widely-used deep learning technique has been used for developing and training the proposed deep convolution neural network for classifying the medical images. In this regard, we performed image pre-processing as the first step and later performed fine-tuning to enhance the classification accuracy with reduced training time. The artifacts in the images are removed through pre-processing. Table 1 presents a short summary with a number of medical images that are used for training and validation. The training time computed in our experiments is a bit higher, as we are using the CPU-based framework as shown in Figure 6. We used two epochs, for each epoch the network performs 252 iterations. Therefore, the method takes 504 iterations to complete the training process. The validation process reflects the efficiency and accuracy of our proposed method.
In the training phase of transfer learning mode, we used the parameters that can avoid over-fitting of the network as classification accuracy of deep network decrease due to over-fitting [54]. The techniques such as re-scaling and rotation can be used to retrain a deep network. In our case, we used cropping feature only in order to avoid over-fitting. In training phase, first three layers has been used regularization layer to avoid the problem of over-fitting. The experimental results show that the proposed method in this paper outperformed state-of-the-art methods in term of classification accuracy as shown in Figure 8. Table 3 presents a detailed comparison of the proposed method with state-of-the-art methods. Figure 7 shows the comparison accuracy of the proposed method with state-of-the-art methods. Figure 9 shows the performance of proposed method in term of precision with  other state-of-the-art methods. Figure 10 shows class-wise classification accuracy results that are obtained while using 70-30 training-testing ratio. Figures 12 and 13 show that proposed method is achieved highest accuracy with other methods. Table 2 shows the class-wise classification accuracy values; we achieved 100% accuracy for the case prostate and brain, 99.9% accuracy for the brain, 99.8% accuracy for chest, breast and pancreas, 99.6 for soft-tissues while 99.4 for the class esophagus. All of these results show that proposed method based on deep convolution neural network is more reliable and efficient than the state-of-the-art methods in term of classification accuracy.
MATLAB is used to implement the proposed method with following hardware and software specification: Windows 7, Intel core i5, 1.60 GHz-2.30GHz with 4GB RAM. The training process took 682 minutes and 10 seconds. Table 3 shows

V. CONCLUSION AND FUTURE WORK
For classification purpose, deep learning based framework for medical image classification by training the images is proposed. In this regard, diagnosis is one of the main requirements of the existing era and investigated or examine to specific diseases. The use of computer-aided tools and reliable image analysis are the main aspects that can increase the efficiency of doctors and phycision. It is a requirement of the current era to develop such image processing methods that can help doctors in various fields of medical science. Such methods are beneficial to save human life and it is evident that diseases can be predicted before they can affect the human body. Since the last few decades, the computer vision research community is trying to reduce this gap by developing automated systems which can process medical images using machines to make decisions. We have proposed a novel deep convolution network-based approach that is assist of doctors and physicians in making reasonable decisions. The results obtained from the proposed method outperformed state-ofthe-art methods that is reported for the same dataset. In future, we aim to explore large-scale image datasets for medical image classification and detection problems. MUHAMMAD ASIF HABIB received the Ph.D. degree from JKU, Linz, Austria. He is currently an Associate Professor with the Department of Computer Science, National Textile University, Faisalabad, Pakistan. His research interests include information network security, authorization, role-based access control, the IoT, cloud and grid computing, association rule mining, recommender systems, wireless sensor networks, block chain, and vehicular networks. He serves as a Technical Reviewer for top tier journals and conferences. VOLUME 8, 2020