Digital Diagnosis of Hand, Foot, and Mouth Disease Using Hybrid Deep Neural Networks

Hand, Foot and Mouth Disease (HFMD) is a highly contagious paediatric disease showing up symptoms like fever, diarrhoea, oral ulcers and rashes on the hands and foot, and even in the mouth. This disease has become an epidemic with several outbreaks in many Asian-Pacific countries with the basic reproduction number $R_{0} > 1$ . HFMD’s diagnosis is very challenging as its lesion pattern may appear quite similar to other skin diseases such as herpangina, aseptic meningitis, and poliomyelitis. Therefore, clinical symptoms are essential besides skin lesion’s pattern and position for precise diagnose of this disease. A deep learning-based HFMD detection system can play a significant role in the digital diagnosis of this disease. Various machine learning and deep learning architectures have been proposed for skin disease diagnosis and classification. However, these models are limited to the image classification problem. The diagnosis of similar appearing skin diseases using the image classification approach may result in misclassification or misdiagnosis of the disease. Parallel integration of clinical symptoms and images can improve disease diagnosis and classification performance. However, no deep learning architecture has been developed to diagnose HFMD disease from images and clinical data. This paper has proposed a novel Hybrid Deep Neural Networks integrating Multi-Layer Perceptron (MLP) network and Convolutional Neural Network into a single framework for the diagnosis of HFMD using the integrated features from clinical and image data. The proposed Hybrid Deep Neural Networks is particularly a multi branched model comprising of Multi-Layer Perceptron (MLP) network in the first branch to extract the clinical features and the modified pre-trained CNN architecture: MobileNet or NasNetMobile in the second branch to extract the features from skin disease lesion images. The features learnt from both the branches are merged to form an integrated feature from clinical data and images, which is fed to the subsequent classification network. We conducted several experiments employing image data only, clinical data only and both sources of data. The analyses compared and evaluated the performance of a typical MLP model and CNN model with our proposed Hybrid Deep Neural Networks. The novel approach promotes the existing image classification model and clinical symptoms based disease classification model, particularly the MLP model. From the cross-validated experiments, the results reveal that the proposed Hybrid Deep Neural Networks can diagnose the disease 99%-100% accurately.

symptoms, like fever, diarrhoea, vomiting, and sore throat, the position of rash and patient's age can improve diagnostic accuracy and robustness. Symptoms play a significant role in the diagnosis and prediction of this disease [15]. Smartphonebased skin disease prediction or detection from integrated clinical symptoms and images is challenging due to the smartphone's resource limitation and lack of lightweight ML or DL architecture that can handle integrated or mixed data. For example, researchers from Google developed a DL architecture for integrated data-based skin diseases diagnosis [16]. However, their solution is not for resource-constrained mobile devices and also did not consider HFMD. We propose a lightweight and smartphone-friendly novel Hybrid Deep Neural Networks that can digitally diagnose HFMD using integrated clinical data and images. The proposed Hybrid Deep Neural Networks integrates Multi-Layer Perceptron (MLP) [17] and modified pre-trained CNN model into a single framework to classify HFMD with other skin diseases using clinical data and images simultaneously. The Hybrid Deep Neural Networks is particularly a multi-branched model which is composed of a Multi-Layer Perceptron as a clinical branch, and a modified pre-trained CNN model as an image processing branch. The MLP is responsible to extract the features from clinical data while the CNN extracts the features from the diseases' images. In particular, we modified the pre-trained models, MobileNet [18] and NasNetMobile [19] and used transfer learning to extract the features from images. The learnt features are finally concatenated to form integrated features from clinical data and images, which is then fed to the subsequent classification network [20]. Most previous studies relied on one source of data. An image classification-based approach, for instance, diagnosed skin diseases from images only. We ran a set of experiments on the proposed Hybrid Deep Neural Networks, image classification models and MLP. The image classification architectures implicated MobileNet and NasNetMobile. We used clinical data and images for our proposed architecture, while we satisfy only the images for the image classification architectures and the clinical symptoms dataset for MLP architecture. The cross-validated evaluation results demonstrate that the proposed Hybrid Deep Neural Networks architecture can diagnose HFMD with accuracy in the range of 99%-100% with very high precision.
The rest of the paper is organised as follows. Section II presents the related works on ML or DL based skin diseases diagnosis. Section III presents the proposed Hybrid Deep Neural Networks-based digital diagnosis of HFMD. The section discussed the proposed solution in terms of (i) the data collection and preparation steps, (ii) the proposed model and selection of a pre-trained model for feature extraction from images, (iv) the model tuning process and (v) the evaluation of the proposed Hybrid Deep Neural Networks architecture. Evaluation results of the proposed architecture are presented and discussed in section IV. Finally, section V concludes the work.

II. RELATED WORKS
Skin diseases detection or diagnosis is a challenging task in image processing and computer vision. Many research works have been carried out to detect or diagnose different skin diseases using AI-based image processing, including DL-based image processing. Alamdari et al. [10] have implemented k-means cluster and HSV model segmentation technique, Support Vector Machine(SVM) and Fuzzy-c-means clustering algorithms for acne classification with an accuracy of 70%, 66% and 80% respectively. Abdul-Rahman et al. [11] elaborated a prototype with Back Propagation Neural Network to assist dermatologists. They used Correlation Feature Selection and Fast Correlation-based Filter feature selection methods with higher accuracy of 91.2%. Another research was performed by Sae-Lim et al. [21] to classify skin lesions using Convolutional Neural Network (CNN) and Mobilenet. The experiment was performed on HAM 10000 skin cancer dataset with customisation of Mobilenet with an accuracy of 83.93%. Rimi et al. [12] have proposed a CNN architecture to detect six types of skin diseases: dermatitis hand, eczema subacute, eczema hand, ulcers, lichen simplex and stasis dermatitis with a precision of 70.8%. Aryan et al. [13] have performed several experiments with the combination of several image processing and recognition techniques for the detection of HFMD lesions. Their research finds that the pre-processing using colour-space conversion followed by segmentation using KMeans-Morphological process with SVM classifier classified the lesion with higher accuracy. Some researchers [22], [23] have classified skin lesions images using traditional machine learning and deep learning to diagnose multiple skin diseases. Hameed et al. [22] proposed an intelligent multi-class multi-level (MCML) classification algorithm to classify multiple skin diseases. Their study has implemented two approaches, traditional machine learning and deep learning, to classify skin lesions with an accuracy of 96.47%. Hameed et al. [23] have used image processing techniques and Quadratic Support Vector Machine to classify skin lesions with an accuracy of 94.74%. Vakili et al. [24] explored a classification in HFMD with other skin diseases using several pre-trained models such as Inception v3, ResNet-34 and ResNet-50. ResNet-50 model outperformed the classification with an accuracy of 95.4%. As their experiment was limited to image data only, some similar appearing skin diseases were misdiagnosed. Researchers from Google [16] have developed an integrated model to detect six skin diseases from skin images and metadata. They have used the Inception-v4 pre-trained model to classify images and the feature transformation technique to extract features from metadata. This model categorises the six skin diseases with an accuracy ranging between 69%-94%. However, this model is not lightweight and mobile-friendly. Also, this model does not consider the diagnosis of HFMD.
In most previous researches, image processing techniques, Convolutional Neural Network or other classification algorithms have been used to detect and classify skin diseases from images. Still, no DL architecture has been designed and developed to learn mixed/integrated clinical symptoms and associated lesion images simultaneously to diagnose HFMD. Figure 1 presents an overview of the proposed smartphone and Hybrid Deep Neural Networks based digital diagnosis of HFMD. The architecture proposed for the diagnosis of this disease is lightweight that can be used on smartphones. The model has been trained and validated on a high-performance workstation with HFMD/Non-HFMD skin images and clinical symptoms. This pre-trained model is transformed into a lightweight TensorFlow lite [25] that can be deployed in mobile devices to diagnose this disease. The images of skin lesions and the clinical symptoms taken by smartphones will work as input for the deep learning model deployed in an app to diagnose the disease.

III. HYBRID DEEP NEURAL NETWORKS BASED DIGITAL DIAGNOSIS OF HFMD
In the following subsections, we briefly discuss the datasets used, data pre-processing, proposed model, model tuning and evaluation process of the model.

A. DATA COLLECTION AND PRE-PROCESSING 1) DATASET
The most crucial step for deep learning is collecting an appropriate dataset to train and validate the model. Unfortunately, though HFMD is one of the most common diseases in Asian-Pacific countries, the dataset for clinical symptoms of HFMD and associated images are not readily available. Therefore, we collected 1455 HFMD lesion images and 1800 typical skin images in various diseases other than HFMD from the Internet [26], [27] for this experiment. Furthermore, we collected clinical data from paediatric doctors for 410 HFMD infected patients and 645 other skin disease infected patients. The clinical dataset has 13 features such as Age, Fever, Sore throat, Diarrhoea, Vomiting, Mouth ulcer, Blister rash, Distressed, Trembling limbs, Staggering, Eyes rolled, Sweating and Gender.

2) DATA PRE-PROCESSING
Deep Learning requires a larger dataset to achieve high accuracy and avoid overfitting. One of the significant challenges for our experiment was sufficient HFMD lesion images and clinical datasets. We handled this problem by generating data in two steps. First, we oversampled the clinical dataset equal to the number of available images. The clinical dataset provided by the doctors was significantly less compared to the number of images. Therefore, we had to generate some synthetic data from the existing dataset. We used Synthetic Minority Oversampling Technique (SMOTE) [28] to oversample the data for both HFMD and Non-HFMD cases. The clinical data contains numerical, Boolean and categorical data types. The numerical Age and Fever features from the clinical dataset were normalised using the MinMax Normalisation technique [29]. The categorical gender and 'position  of rash' features were encoded using the one-hot encoding technique [30].
After generating a sufficient number of clinical data, the next step was to map each clinical symptom with an image so that both the images and features will have the same classification label. HFMD images were mapped with HFMD related clinical symptoms. Similarly, images for normal skin or non-HFMD disease were mapped with clinical symptoms that do not appear in HFMD infected patients. The rash position plays a significant role in diagnosing this disease and distinguishes it from similar appearing diseases. Hence, we manually identified the position of rashes for each image and labelled the position. Figure 2 illustrates the final dataset prepared for our model. Here, each image is associated with a set of clinical symptoms.
After oversampling and pre-processing clinical data, the next step was to pre-process images and generate integrated input batches and the corresponding labelled output. ImageDataGenerator API [31] by Keras provides a feature to augment and pre-process images in batches. However, the limitation of this ImageDataGenerator is that this API can generate batches of input from images only. The proposed model was designed to feed integrated data of clinical symptoms and images. Hence, we built a custom data generator using Keras's Sequence API to combine the features from clinical symptoms and associated images and generate integrated input data batches. We implemented Keras's Image-DataGenerator API's image augmentation technique within the custom data generator to generate and pre-process images in batches. The image augmentation methods like rotation by 40 • , flipping the images horizontally and vertically, shearing and zooming were implemented to increase the number of training and validation images as shown in Figure 3. The images were then scaled down between 0 and 1 to improve the performance of the model. The generator then combines the augmented image with its associated clinical symptoms and class labels with providing a batch of integrated input for the model.

B. PROPOSED MODEL
In this paper, we propose a hybrid deep neural networks architecture to diagnose HFMD from clinical and image data. The proposed architecture is particularly a multi-branched model architecture comprising two input branches: (1) clinical branch (MLP) and (2) image-processing branch (see Figure 4). The clinical data is input separately into the MLP network (clinical branch: see section III-C), while the images are fed into the image processing branch (see section III-D), developed using Convolutional Neural Network (CNN). We employed customised pre-trained CNN models, MobileNet [18] and NasNetMobile [19] in the image-processing branch. Both the clinical and image-processing branches are responsible to extract the features from clinical and image data respectively. To combine the features learned from these branches, the last layers of both branches are concatenated to form a concatenation layer using Keras functional API. A classification network having two dense layers with 4 and 2 neurons respectively are added on the top of the concatenation layer. Thus, the final output layer of the hybrid deep neural networks model has two neurons to classify HFMD and non-HFMD datasets. The proposed architecture's novelty lies in designing a multi-branched lightweight and mobile-friendly Hybrid Deep Neural Networks to diagnose HFMD from clinical and image data.
Let us considering C as clinical input for MLP network and D as image input for pre-trained CNN, the mapping equation from inputs to learned features by MLP and CNN branches are expressed as in equations 1 and 2 respectively.
where, T m and T c are the learnt feature from MLP and CNN networks respectively, f and g represent MLP network and CNN network, and θ represents model weights.
The integrated feature Z obtained by concatenating all the learnt features is represented as in equation 3.
where, ''Conc'' represents feature-wise concatenation. After concatenation, the integrated features are considered as input for the subsequent classification network (layer), h φ where, φ represents the weights of classification networks. The classification label Y is achieved by equation 4.
To summarise, the Hybrid Deep Neural Networks architecture is composed of the combination of MLP function (f θ), CNN function (g θ), concatenation layer and classification network (h φ) which can be represented by the function F. Thus, the output of the proposed model with clinical input C and image input D can be represented by equation 5. The model weights θ and φ are optimised while training the model using Adam optimiser and categorical cross entropy loss function.   clinical data. We particularly selected MobileNet and Nas-NetMobile pre-trained models to extract features from images in our proposed model as these pre-trained models are lightweight and more efficient for mobile applications [33], [34].

1) MobileNet
We used a modified Mobilenet [35] architecture in the second branch of the proposed architecture to extract features from images. Mobilenet is built upon two layers: depth-wise convolution and point-wise convolutions. The depth-wise convolution applies a single filter to each input channel, and then the point-wise convolution applies a 1 × 1 convolution to combine the outputs of depth-wise convolution. After each convolution, batch normalisation and Rectified Linear Unit (ReLU) are applied. Figure 5 illustrates the architecture of Mobilenet consisting of depth-wise and point-wise convolutions. In order to extract the features from images, we modified the pre-trained CNN model by setting the parameter include-top = false to chop the dense layers, which particularly act as classifier [36]. Then we added a dense layer with 50 neurons and relu activation function to transform the image features learnt from the pre-trained model to N × 50 dimensional features, where N is the number of samples.

2) NasNetMobile
Secondly, we modified the NasNetMobile [37] architecture for our experiment. The NAS (Neural Architecture Search), developed by Google Brain, is a scalable CNN architecture consisting of basic building blocks configured by reinforcement learning. The cell consists of only a few operations (several convolutions and pooling) and is replicated several times according to the necessary network capacity. The lighter version of this architecture, NasNetMobile, consists of 12 cells with 5.3 million parameters and 564 multipleaccumulators. Figure 6 illustrates the reduced architecture of NAS derived with NAS and CIFAR10. We relied on transfer learning for both models. We used these pre-trained models, which are trained over standard datasets such as CYPAR10 and ImageNet. Similar to MobileNet, we modified NasNetMobile architecture excluding classification layers and adding a dense layer of 50 neurons.

E. MULTI-LAYER PERCEPTRON MODEL FOR CLINICAL SYMPTOM-BASED HFMD CLASSIFICATION
To classify the disease solely based on clinical symptoms, we also created a separate Multi-layer Perceptron (MLP) network [17]. The basic architecture of MLP consists of three layers, as shown in Figure 7: an input layer, a hidden layer, and an output layer. However, modern MLP can have multiple hidden layers and dropout layers. Therefore, we developed three layers: input layers, hidden layer and output layer of 14 neurons, eight neurons, and two neurons, respectively. In addition, a dropout of 0.25 and L2 weighted regularisation were implemented to regularise the model and avoid overfitting. We used the Relu activation function for input and hidden layer and softmax activation function [38] for output layer to perform the classification.

F. MODEL TUNING
We developed the model using TensorFlow and Keras based on modern deep learning architectures. We used Adam optimiser to optimise the hybrid multi-branch model. The hyperparameters of the integrated models are the learning rate, decay rate and initial weights. At the same time, the clinical VOLUME 9, 2021 branch of the integrated model had two hyperparameters: the number of layers and the number of nodes in each hidden layer. Systematic experimentation is the most reliable way to configure these hyperparameters [39]. We used a hyperparameters tuning technique to tune and optimise the parameters and train the model with the highest accuracy. We applied a grid search approach to estimate the hyperparameters of our model. Alongside the number of dense layers and the number of nodes in each layer of the clinical branch, we used a grid search approach to optimise the operation related (e.g., training) hyperparameters such as learning rate and decay rate. For each experiment, the optimal hyperparameters were chosen to minimise the error or loss function. Finally, we finalised the clinical branch with three layers with 16, eight, and four neurons by tuning the model. The hyperparameters, learning rate and decay rate of the hybrid deep neural networks model, were determined to be 1e-3 and 1e-3/200, respectively. After getting the optimal parameters, we trained the model with the optimal hyperparameters. We also used an Earlystopping callback while training the model to avoid overfitting.

G. EVALUATION OF THE PROPOSED MODEL
We conducted several experiments to compare the performance of our proposed multi-branch model to that of an image classification model and a clinical symptom-based disease (particularly HFMD) classification model (MLP). We used both images and clinical data for our proposed model, only images for the image classification model and clinical symptom data for the symptoms-based HFMD classification model (MLP). In the first experiment, we retrained the pre-trained MobileNet and NasNetMobile models using images only and evaluated their performances. We trained the Multi-Layer Perceptron using the clinical dataset only in the second experiment. In the final experiment, we satisfied the proposed hybrid model employing mixed/integrated clinical dataset and images data. This proposed model consists of a clinical branch and an image processing branch. Thus, we again adopted an experimental approach to select the best pre-trained image classification model for HFMD diagnosis. Firstly, we used MobileNet along with clinical branch to train mixed input data and secondly, MobileNet was replaced with NasNetMobile, and the same dataset was trained in the model. For all these experiments, we created a checkpoint to save the model with the highest accuracy so that the saved model could be used for the prediction with better accuracy. We evaluated the models using accuracy, sensitivity, specificity and F1-score and visualised the performances using a confusion matrix.

IV. RESULT AND DISCUSSION
We produced three different results for three datasets. All the evaluation results were cross-validated using the k-fold (5-7) validation technique.

A. IMAGE CLASSIFICATION
In the first experiment, we classified HFMD using images only. Here, we retrained the pre-trained models of MobileNet and NasNetMobile using the images. The results are produced by 5-fold cross-validation. Table 1 (first two rows) present the results in terms of accuracy, sensitivity, and specificity of image classification. As shown in the table, the MobileNet model outperforms the NasNetMobile in classifying HFMD images with an accuracy of 88%. Figures 8 and 9 demonstrated the accuracy and loss of the MobileNet and NasNetMobile models using image data. As seen in the figure, both pre-trained models' accuracy increases (Figures 8 and 9 (a)), and loss value decreases (Figures 8 and 9 (b)), gradually with more epochs. This pattern demonstrates that both models can predict HFMD from images with high accuracy and can be used for our proposed hybrid deep neural networks model. Figure 10 presents a confusion matrix for each model to visualise the performance of the models in the validation dataset. As shown in Figure 10 (a) that MobileNet successfully classified 79% of HFMD images correctly and 96% of Non-HFMD images correctly, while NasNetMobile (Figure 10 (b)) classified HFMD images with an accuracy of 85% and Non-HFMD images with an accuracy of 87%. In addition, we trained our images dataset with RestNet50 pre-trained model to further compare our model's performance with the image classification approach. As claimed by Vakili et al. [24] in their experiment, RestNet50 model classified our dataset with an accuracy of 91.2 % (see Table 1 (last row)). From confusion matrices (see Figure 10), we can see that the image based classification models misclassified some skin lesion. We manually verified some false-positive results from MobileNet model and it   was found that similar appearing lesions (e.g., herpangina and HFMD) were both classified as HFMD(see Figure 11). Thus, this example illustrates the limitation of existing image based HFMD diagnosis approach, where non-HFMD image is misclassified as HFMD.

B. CLINICAL DATASET CLASSIFICATION
In the second experiment, only the clinical dataset to classify HFMD with other skin diseases using MLP architecture.  HFDM's clinical symptoms with very high accuracy (99%). Figure 13a and 13b illustrate the accuracy and loss of one of the validation sets for 50 epochs. Figure 13c visualised the MLP model's performance on the validation dataset. The figure shows that it accurately classifies 100% of the HFMD clinical samples and 92% of the Non-HFMD clinical samples. This result shows that based on the clinical symptoms, HFMD disease can be predicted accurately. However, HFMD's clinical symptoms may conflict with other non-HFDM diseases [40], [41]. Clinical symptoms integrated with images can minimise this conflict and correctly diagnose HFMD.

C. HFMD DIAGNOSIS USING IMAGE AND CLINICAL DATA
The proposed hybrid deep neural networks architecture was tested in two settings: (i) MLP with pre-trained model MobileNet and (ii) MLP with pre-trained model NasNetMobile on the integrated clinical symptoms and images. Table 1 (fourth and fifth rows) present the 5-fold cross-validated evaluation results of the proposed models on the integrated data. As seen in the table, the hybrid deep neural networks using integrated data (clinical symptoms and images) are outperforming MobileNet, NasNetMobile and MLP. According to the results, these models can classify HFMD and non-HFMD with 100% accuracy. The claim is robust as the 6-folds and 7-folds cross-validations of the hybrid deep neural networks demonstrated very similar results (Table 1 (sixth-ninth rows)). Figures 13 and 14 present the training and validation accuracy and loss for both MobileNet and NasNet-Mobile based proposed models respectively. As seen in the figures, hybrid deep neural networks with both pre-trained models' accuracy increases (Figures 13 and 14 (a)), and loss value decreases (13 and 14 (b)), gradually with more epochs. Figure 15 compares the confusion matrix of MobileNet and NasNetMobile based proposed models respectively. The figure illustrates that both the models correctly (100%)    For the HFMD diagnosis, the position of the lesion is essential. It is essential to demonstrate whether our model extracts significant features from the expected region or position of interest in the image during model training.  To interpret the feature extraction from images, we plotted the heatmap over the validation image using a technique called Grad-CAM (Gradient Class Activation Map) [42]. For each model, a validation image was selected for the prediction, and a heatmap was plotted over the image, as shown in Figure 16. These images illustrate that the image classification branch was extracting the features from the images' expected region.

D. LIMITATION
Deep learning necessitates large datasets in order to develop more accurate and robust models. Despite the fact that we gathered data from various sources, it was still relatively small in the context of deep learning. Although HFMD is one of the most common diseases in Asian-Pacific countries, clinical data and images for the same patient were not readily available. Further, our dataset has an uneven distribution of clinical data and images. The clinical data collected from doctors was significantly less than the number of images collected over the internet. The presence of some low-resolution images was another limitation of our dataset. This research can further be improved using data from diverse ethnic groups.
The proposed experiment integrates the features from clinical data and images; however, we have not analysed the correlation and association between images and clinical symptoms. This experiment can be further extended to analyse the correlation between image and clinical features and its impact on disease diagnosis.

V. CONCLUSION
In this paper, we proposed a lightweight and efficient Hybrid Deep Neural Networks to detect or diagnose HFMD using clinical symptoms and image data. The proposed Hybrid Deep Neural Networks architecture has two input branches 1) Multi-Layer Perceptron and 2) modified pre-trained CNN model to integrate the features learnt from clinical symptoms and image data. The performance of our proposed multi-branch Hybrid Deep Neural Networks for diagnosing HFMD was compared with the image classification model and clinical symptom-based HFMD classification model (MLP). The image classification models: MobileNet, Nas-NetMobile and RestNet50, classified the skin lesions with an accuracy of 88%, 85% and 91.2%, respectively; however, this approach has some limitations of misdiagnosing similar appearing skin lesions. In another experiment, the MLP model using the clinical dataset predicted HFMD with an accuracy of approximately 100%. As HFMD is a skin disease, clinical symptoms-based detection/diagnosis may not always be correct as many other diseases (e.g., chickenpox) may have similar symptoms. Thus, using both images and clinical symptoms can improve the diagnosis of this disease. It is worth noting that previous studies have used only image classification techniques using traditional machine learning or deep learning architectures to diagnose skin diseases. However, to the best of our knowledge, no studies have been conducted to diagnose HFMD from integrated features of image and clinical symptom data. The proposed multi-branch model overcomes these limitations and predicts the disease with accuracy between 99%-100% using clinical symptoms and images. The learned model is lightweight and efficient, which can be deployed in a smartphone to develop a mobile app to detect or diagnose HFMD.
Most medical datasets contain images along with clinical datasets. Thus, this proposed Hybrid Deep Neural Networks architecture can help diagnose other diseases with integrated images and clinical symptoms data for the same patient. Furthermore, this model can be enhanced to learn other diseases using complex radiological images like X-Ray, CT-Scan, MRI images and clinical data. The outputs should be in better expectation by replacing the existed model with the MobileNet layers with other image classification or image segmentation models like U-Net, DenseNet, VGGNet, Rest-Net50 or Alexnet.