Internet of Things and Synergic Deep Learning Based Biomedical Tongue Color Image Analysis for Disease Diagnosis and Classification

In recent times, internet of things (IoT) and wireless communication techniques become widely used in healthcare sector. Biomedical image processing is commonly employed to detect the existence of diseases using biomedical images. Tongue diagnosis is an efficient, non-invasive model to perform auxiliary diagnosis any time anywhere that is support the global necessity in the primary healthcare system. Conventionally, medical practitioners investigate the tongue features based on their expert’s knowledge comes from experience. In order to eradicate the qualitative aspects, tongue images can be quantitatively examined, offering an effective disease diagnostic process in such a way that the physical harm of the patients can be minimized. Numerous tongue image analysis approaches exist in the literature, it is required to develop automated deep learning (DL) models to diagnose the diseases using tongue image analysis. In this view, this paper designs an automated IoT and synergic deep learning based tongue color image (ASDL-TCI) analysis model for disease diagnosis and classification. The proposed ASDL-TCI model operates on major stages namely data acquisition, pre-processing, feature extraction, classification, and parameter optimization. Primarily, the IoT devices are used to capture the human tongue images and transmitted to the cloud for further analysis. In addition, median filtering based image pre-processing and SDL based feature extraction techniques are employed. Moreover, deep neural network (DNN) based classifier is applied to determine the existence of the diseases. Lastly, enhanced black widow optimization (EBWO) based parameter tuning process takes place to enhance the diagnostic performance. For assessing the effectual performance of the ASDL-TCI model, a set of simulations take place on benchmark tongue images and examined the results under distinct dimensions. The simulation outcome verified the enhanced diagnostic performance of the ASDL-TCI model over the compared methods with the maximum precision, recall, and accuracy of 0.984, 0.973, and 0.983.


I. INTRODUCTION
Internet of Things (IoT) and deep learning (DL) have broad range of applications, healthcare is one among them. Due to the advancements in the internet, the classical schemes The associate editor coordinating the review of this manuscript and approving it for publication was Chi-Hua Chen . for patient services weakened and replaced by electronic healthcare systems. The utilization of IoT technologies provide healthcare professionals and patients the most modern medical device environment. At the same time, the worldwide demand for major healthcare support and the development of techniques enables point of care (POC) diagnosis. In spite of the current advancement in automatic VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ disease diagnosis equipment, the necessity of blood serum, reliability, non-professionals, detection time, accuracy, and need for additional confirmatory testing are the difficulties to conquer [1]. Therefore, tongue diagnosis, temperature, facial expressions, retinopathy, surface, and skin color can be some of the major variables for forthcoming smartphone based medical proficient system for attaining immediacy, simplicity, automated analysis, and non-invasiveness. Tongue diagnoses is an efficient non-invasive method for evaluating the state of a person's interior parts in oriental medication, i.e., Japanese traditional herbal medicine, traditional Korean medicine (TKM), traditional Chinese medicine (TCM), [2]. The human tongue comprises several features which are utilized for diagnosing disease, with colour features being significant. Conventionally, medicinal physicians will investigate these colour features depending upon extensive knowledge. But, subjectivity and ambiguity are often followed with their diagnoses outcome. For removing these qualitative factors, tongue colour analyses are objectively examined by its colour features that provide a novel method for diagnosing disease, one that reduces the physical injury caused to person (related to another medicinal investigation). The diagnosis procedure is based on professional's view according to visual examination including form, substance, color, motion, and coating the tongue [3], [4]. Rather than tongue anomalous occurrence and disease, conventional tongue diagnoses are highly motivated for recognizing the disease. For instance, the tongue coating white greasy and yellow dense occurrence indicates hot and cold syndromes, correspondingly that are connected with health states like immune, inflammation, infection, endocrine, or stress disorders [5]- [8]. It contains 2 parallels, still interrelated syndromes in TCM. Removing the dependence on experience and subjective based analysis of tongue diagnoses might significantly raise the possibility of using tongue diagnoses in the world comprising Western medicine. Computerized tongue assessment involving color correction, light estimation, image analysis, geometry analysis, tongue segmentation, and so on can be an efficient device to diagnose disease that aims for addressing this concern [9]- [11].
Though several modules are depending upon approach of individual features that were presented and attained effective outcomes, this kind of technique uses lower level features [12]- [14]. Thus multi features are useful for detecting abnormal and normal tongue images. The study utilized multi features (like integration of shape, color, and texture) for identifying and matching tongue images. [15] introduced a coating separation phase beforehand feature extraction. [16] presented colour texture operators named PDSLBP for handling the tongue image corresponding problems. In [17], a TCoM is depending upon quantitative measurement which includes textural and chromatic features was presented for diagnosing appendicitis. In [18], multi labeled learning was utilized for classifying tongue images afterward extraction of texture and colour features. Indeed, the above-mentioned techniques utilized lower level features whether it is individual /multi features that could not fully describe the features of tongue. It is required for integrating an architecture which could create whole features from tongue images. Therefore, higher level features have been required for CAD tongue analyses. One of the present publications was defined and employed DL modules for extracting higher level depictions to extensive vision analysis processes like object recognition, hand-written digit identification, and face recognition. But, there occurs slight or no study on CAD tongue image analyses by DL modules, where CAD professional systems with objectivity and unambiguity tongue analysis outcomes are utilized for facilitating the Western and TCM practice diagnosis outcomes.
This paper designs an automated IoT and synergic deep learning based tongue color image (ASDL-TCI) analysis model for disease diagnosis and classification. At the initial stage, the IoT devices are used to capture the human tongue images and transmitted to the cloud for further analysis. Besides, median filtering (MF) based image pre-processing is applied to enhance the quality of the tongue image. In addition, SDL based feature extraction model is used for deriving a helpful set of feature vectors. Moreover, deep neural network (DNN) based classifier is applied to determine the existence of the diseases. The DNN model is chosen due to the following features: It has high capacity to execute feature engineering on its own. Besides, it scan the data to search for features that correlate and combine them to enable faster learning without being explicitly told to do so. Finally, enhanced black widow optimization (EBWO) based parameter tuning process is utilized to boost the diagnostic results. In order to assess the proficient results of the ASDL-TCI model, a series of experimentations are performed on benchmark tongue images and examined the results under distinct dimensions.

II. LITERATURE SURVEY
Zhou et al. [19] presented Tongue Net that is a specific and faster automated tongue segmentation scheme. The U-net is used as segmentation backbone employing a smaller scale image dataset. Moreover, a morphological layer is presented in the last stage of the framework. Wen et al. [20] proposed a novel constitution recognition technique depending upon zero shot learning with knowledge of TCM. To enhance the efficiency, a novel zero-shot learning technique is presented by combining features and learning discriminant latent attributes that could resolve the imbalance challenge of constitution classifications. Kamarudin et al. [21] proposed a 2 phase tongue multicolor classification depending upon SVM that is decreased by this presented k mean clustering detectors and red colour range to diagnose accurate tongue colour. Initially, k-means clustering is utilized for clustering a tongue image to 4 clusters of deep red region, image background (black), transitional region, and red or light red region. Next red or light red tongue images are additionally categorized to red tongue/light red tongue depending upon red colour range acquire in this study. Li et al. [22] proposed light weighted framework is depending upon encoder and decoder structures. The TIFE model is implemented for generating the feature with huge responsive regions with no sacrifice spatial solution. The context model is utilized for increasing the efficiency by combining multi scale contextual data. The decoder is implemented as simple still effective feature up sampling model for fusing distinct depth features and improve the segmentation outcomes together with tongue borders.
Li et al. [23] presented an effective and simple tongue image segmentation technique. Particularly, the presented technique initially extracts the first tongue body parts with the execution of image threshold on converted hue element in HSI color space. Later, this one image threshold leads to red element of an actual tongue image is selected in adaptive manner for finding the gap among the tongue body root and upper lip. Lastly, the early tongue body area is developed with the removal of fake object areas like upper lip for obtaining last tongue image segmentation outcome. Zhang et al. [24] improve a diagnosis technique of diabetes depending upon normalized tongue image by SVM. Tongue coating and body have been divided using chrominance threshold and division merging methods. By extracting colour and texture features of tongue image as input parameters, the diagnoses module of diabetic with SVM is trained. When enhancing the integration of SVM kernel variables and input parameters, the effects of an integration on the module have examined.
Fan et al. [25] examined various tongue features in person with diabetic, gastric symptom utilize gathered images by digital tongue imaging. In feature extraction phase, texture and 4 TCM tongue features have been detected: coating colour, slenderness, cracks, constitution colour, and plumpness. In classification phase, 2 distinct classification techniques have been utilized, SVM and RF, to categorize TCM and DM gastric diseases symptom. Wang et al. [26] presented an AI architecture by DCNN to recognize tooth-marked tongue. Initially, they created comparatively huge dataset with 1548 tongue images taken using various tools. Later, they utilize ResNet34 CNN framework for extracting features and performs classification. Meng et al. [27] proposed a new feature extraction architecture named CHDNet for extracting unbiased features and decrease human labor to diagnoses tongue in TCM. Prior CNN modules are mainly concentrated on learning convolution filters and adapting weights among themselves, however, these modules contain 2 main problems: redundancy and inadequate ability in managing unbalanced sample distribution. They present higher dispersal and local response normalization function for addressing the problem.
In [34], a tongue color image analysis model is presented based on the shape of a human tongue and analyzing it by the use of geometry features by means of computerized methods. [35] investigates the design of new automated tongue diagnostic system on mobile enabled environment. Some other tongue color image analysis techniques are available in the literature [24], [36]- [39].

III. THE PROPOSED ASDL-TCI MODEL
The proposed ASDL-TCI model operates on major stages namely data acquisition. median filtering (MF) based preprocessing, SDL based feature extraction, DNN based classification, and EBWO based parameter optimization. Primarily, the IoT devices are used to capture the human tongue images. With the improvement of hardware, preprocessing requirements are reducing day by day. The commercially available Camera OV7670 module possesses camera resolution as high 656 × 488 pixels with 30 frames per second (fps). Then, they are transmitted to the cloud for further analysis. Followed by, the remaining modules are detailed in the succeeding subsections.

A. IMAGE PRE-PROCESSING
Firstly, MF technique is applied to pre-process the input images and enhance the image quality. The MF is a non-linear signal modeling approach which depends upon data. The noisy value of digital images is exchanged with the median values of neighborhood (mask). The pixel of mask is ordered from sequences of its gray levels, and median values of set are saved for exchanging the noisy value. The MF output are the original and output images correspondingly, W refers the 2D neighbourhood. As the MF is non-linear filtering, their mathematically analyses are comparatively difficult to an image with an arbitrary noise [28]. In order to image with zero mean noise in normal distributions, the noise variance of MF is given as: (1) where σ 2 i represents the input noise power (variance), n refers the size of MF neighborhood, f (n) implies the function of noise density. With the noise variance of average filter as: Relating of Eqs. (1) and (2), the MF properties dependent upon 2 things: the size of mask, and the distribution of noise. The MF efficiency of arbitrary noise reduction is more than the average filter efficiency, however to the impulse noise, particularly narrow pulses are further away and pulse width is lesser than n/2, the MF is most effectual. The MF efficiency is enhanced when the MF technique, related to the average filtering technique, appropriately resizes the neighborhood based on the noise density.

B. FEATURE EXTRACTION
During feature extraction process, the pre-processed image is fed into the SDL model for deriving a helpful group of feature vectors [40], [41].  The SDL module extracts the beneficial group of features from the preprocessed image. It denotes SDL k by three major components like input layer, C 2 k synergic network (SN), and k DCNN component. All DCNN elements of network give an independent learning depiction from information with proper input data. The SN is consisting of a procedure with FC framework for ensures the input layer belongs to similar classes and provide remedial comments on the application of synergic error. Later, the SDL technique is separated into three sub-modules. Fig. 1 shows the structure of SDL network. }, aims to progress with group of parameter θ that endures cross entropy (CE) loss given by.
where n denotes number of classes, Z (a) = F(x (a) , θ) represents forward computing. An attainted set of parameters for DCNN−a denotes θ a , and the parameters don't allocate huge DCNN units.

2) SDL MODEL
The DCNN unit with synergic label of pairs is employed for the embedding, input layers, and FC learning. Consider (Z A , Z B ), as a data pair provided by the input for two DCNN components like (DCNNa, DCNNb), correspondingly. The concluding outcome from successive FC layers in DCNN embodies deep data feature which is learned by DCNN recovered from forwarding compute, given by.
Later, the deep features from entire data are embedded by f A • B , and specific results with synergic label as follows.
To conquer this problem, the percentage data pair in class is high and simpler to 0 views the synergic signal with alternative sigmoid layer and following binary CE loss is denoted by where θ S indicates SN attribute,ŷ S denotes SN forward computing. It authenticates when the input data pairs belong to similar class and provide remedial respond in the presence of synergic error.

3) TRAINING AND TESTING
When the training is made, the feature of DCNN components and SN is increased.
where η (z) and S (a, b) denotes rate of learning and SN between DCNNa and DCNNb is given by.
and λ indicates trade-off among sub-models of classification and synergic error. The association of trained procedure of SDL 2 model has been increased. In the application of trained SDL k , the test data x is categorized by DCNN unit DCNN −u whereas it gives predictive vector P (a) = (p The DL technique implements in this study is a feed-forward ANN, training with stochastic gradient descent (SGD) utilizing backpropagation (BP). Now, several layers of hidden units are executed among the inputs as well as outputs of model. All hidden units, j, classically utilizes the logistic function β and the nearly connected hyperbolic tangent is also frequently utilized. Besides, some function with orderly derivative is employed for mapping their outputting y j entire input in x j : 94772 VOLUME 9, 2021 For the multiclass classifier of the tongue diagnosis problem, the output unit j changes their entire input, x j , as to class probabilities, P j , by utilizing a normalized exponential function called ''softmax'': where h implies the index on every class. The DNNs are selectively training with BP derivatives of cost function which evaluate the discrepancy among the target and actual outputs caused to all trained cases [29]. If utilizing the softmax output function, the natural cost function C refers the CE among the target probability d and softmax outputs are P: where the target probability, usually getting values of 1 or 0, is the supervised data given for training the DNN technique.

D. PARAMETER OPTIMIZATION
Finally, the parameter optimization of the DNN model takes place by the use of EBWO algorithm in such a way that the classification performance can be increased. The western black widow spider (Latrodectus hesperus or L. hesperus) is a poisonous spider species made in western Canada to southern Mexico. The venom, existing in female black widow, comprises a potent neurotoxin active towards an extent of animals. Additionally, the assumed venom is highly risky for humans provided that just single bite could lead to mortality. This spider feeds on insects like butterflies, cockroaches, and beetles; it is weave their web from trees and inhabit forests and swamps. Males, utilize sex pheromone to distinguish female mating position, are well-known to demonstrate no concern in mating with poorly and starving fed females, since females could exhibit cannibalistic performance. More detail regarding black widow spider's performance is made in [30].

2) PHEROMONES
Pheromones act as a significant part of courtship mating of L. hesperus spiders. The connection among the spider's diet and modification in pheromone signals affect the quantity and quality of silk are demonstrated. Similarly, healthy female spiders generate additional silks compared to hungry females. Male spiders are highly responsible for sex pheromones from healthy females since they give advantage of having greater fertility, however, it mainly avoids the cost of risk mating attempt with probable hungry cannibal female. Specifically, male black widow spiders choose for avoiding cannibalism instead of search for extra fertile females. Sex pheromones only provide a perception to current feeding history of females, feasibly decreasing the cost for males exposing their selection in the region. Thus, female spiders with lower pheromone rates aren't chosen by male spiders. In this study, black widow spider pheromones rate values are determined by: where fitness max and fitness min indicates worst and optimum fitness values from present generation, correspondingly, where the fitness (i) indicates present fitness values of ith search agent. The pheromone vectors, Eq. (16), comprises the standardized fitness from interval of zero and one. For lower pheromones rate values equivalent/lesser compared to 0.3, Eq. (17) is employed. The lower pheromone level in female spider represents hungry cannibal spider. Thus, they are existing, the assumed female spiders won't be selected however it would be substituted to other ones:  where x i (t) represents search agent (i.e. female spider) with lower pheromone rate would be upgraded. r 1 and r 2 indicates arbitrary integer number created in the interval from one to the maximum size of search agent (i.e. spiders), with r 1 = r 2 , where x r 1 (t) and x r 2 (t) denotes r 1 , r 2 th search agent chosen, x * (t) represents optimum search agent made from prior iteration, and σ indicates binary amount arbitrarily made, σ ∈ {0, 1}. In order to avoid the local optimal problem of the BWO algorithm, the Levy flight concept is integrated to design the EBWO algorithm, which contains robust stochastics for improving the position update the black widow spider. The Levy flight boosts the jumping capability of black widow spiders and also extends the searching area of the swarm.
Levy's flight is an arbitrary stage defining the Levy distribution. It is a kind of random step model which is inspired by the flight path of levy. The step length is generally low; however, it irregularly occurs in large pulsation. The levy flight can be defined using Eq. (18): The Levy random step by Mantegna can be defined using E q. (19): where β = 1.5,µ = N 0,σ 2 µ and v = N 0, σ 2 µ denotes gamma function [31]. The variance of the parameter can be defined using Eq. (20):

IV. EXPERIMENTAL VALIDATION
The proposed model is simulated utilizing PC i5-8600k processor, GeForce 1050Ti, 4GB RAM, 16GB OS Storage, and 250GB SSD File Storage. The simulation tool used is Python 3.6.5 tool along with some packages namely tensorflow, keras, numpy, pickle, matplotlib, sklearn, pillow, and opencvpython. The proposed model is experimented with using a benchmark dataset, comprising 936 images with 78 images under distinct 12 class labels. The information related to the dataset is provided in Table 1 [36]. Besides, 10 fold cross validation technique is employed. Fig. 3 illustrates some of the sample test images. The parameter setting of the DNN model is given as follows: mini batch size: 200, dropout: 0.5, number of hidden layers:3, number of hidden units: 1024, and activation function: softmax. Fig. 4 illustrates some preprocessed images from the applied dataset. The figure depicted that the quality of the images gets increased in the applied test images.          Table 3 examines the comparative results analysis of the ASDL-TCI technique with other recent methods interms of different measures [32], [33]. Fig. 8 shows the result analysis of ASDL-TCI model interms of prec., rec., and accuracy. From the obtained values,  it can be clear that the SIFT-DTCA model has attained least performance with the lowest prec. of 0.834, rec. of 0.824, and accuracy of 0.824. Next, the SIFT-SVMA model has attained certainly increased outcomes with prec. of 0.842, rec. of 0.841, and accuracy of 0.841. At the same time, the HOG-DTCA technique has offered improved results over the other two methods with prec. of 0.898, rec. of 0.889, and accuracy of 0.889. Meanwhile, the HOG-SVMA model has resulted in a moderate result with prec. of 0.917, rec. of 0.911, and accuracy of 0.911. Eventually, the VGG19-RF model has gained somewhat enhanced performance with prec. of 0.938, rec. of 0.937, and accuracy of 0.937. Furthermore, the VGG19-GNB approach has portrayed competitive outcomes with prec. of 0.945, rec. of 0.925, and accuracy of 0.925. However, the ASDL-TCI model has gained maximum results over the other related manners with prec. of 0.984, rec. of 0.973, and accuracy of 0.983. Fig. 9 shows the result analysis of ASDL-TCI model interms of F-measure and kappa. From the obtained values,  it can be obvious that the SIFT-DTCA model has attained least performance with the lowest F-measure of 0.825 and kappa of 0.822. Next, the SIFT-SVMA algorithm has attained certainly increased outcomes with F-measure of 0.839 and kappa of 0.832. Followed by, the HOG-DTCA manner has offered improved results over the other two methods with F-measure of 0.891 and kappa of 0.878. Meanwhile, the HOG-SVMA technique has resulted in a moderate result with F-measure of 0.911 and kappa of 0.910. Eventually, the VGG19-RF model has gained somewhat enhanced performance with F-measure of 0.937 and kappa of 0.931. Furthermore, the VGG19-GNB approach has showcased competitive outcomes with F-measure of 0.928 and kappa of 0.922. However, the ASDL-TCI methodology has gained superior outcomes over the other related approaches with F-measure of 0.977 and kappa of 0.978. Table 4 and Fig. 10 examining the comparative outcomes analysis of the ASDL-TCI model with other existing methods [24], [34]- [39]. From the obtained values, it can be obvious that the VGG-SVM model has attained least performance with the lowest acc. of 0.594. Next, the K-NN model has attained certainly an increased outcome with an VOLUME 9, 2021 acc. of 0.734. Likewise, the Bayesian technique has offered improved outcomes over the other two methods with an acc. of 0.75. Meanwhile, the Geometry features have resulted in a moderate result with an acc. of 0.762. Eventually, the SVM model has gained somewhat enhanced performance with an acc. of 0.765. Followed by, the CADAE method has obtained moderate outcome with an acc. of 0.77. Moreover, the GF-SRC technique has accomplished reasonable outcomes with an acc. of 0.792. Also, the GA-SVM approach has showcased competitive outcomes with an acc. of 0.831. But, the ASDL-TCI model has gained superior results over the other related techniques with an acc. of 0.983.
From the above mentioned results, it is evident that the proposed ASDL-TCI model has obtained improved performance over the other methods due to the inclusion of SDL based feature extraction and EBWO based parameter optimization.

V. CONCLUSION
This paper has designed a new IoT enabled ASDL-TCI model for disease diagnosis and classification. The presented ASDL-TCI model aims to determine the existence of the diseases using tongue color images. The proposed ASDL-TCI model operates on major stages namely IoT based data acquisition, MF based pre-processing, SDL based feature extraction, DNN based classification, and EBWO based parameter optimization. The utilization of EBWO algorithm to tune the parameters of the DNN helps to considerably enhance the classification results to a maximum extent. For assessing the proficient results of ASDL-TCI model, a series of experimentations are carried out on benchmark tongue images and examined the results under distinct dimensions. The obtained results guaranteed the improved diagnostic performance of the ASDL-TCI model over the compared methods interms of different measures. As a part of future work, advanced DL architectures can be employed to further enhance the diagnostic performance.