Targeted Ensemble Machine Classification Approach for Supporting IoT Enabled Skin Disease Detection

The fast development of the Internet of Things (IoT) changes our life in many areas, especially in the health domain. For example, remote disease diagnosis can be achieved more efficiently with advanced IoT-technologies which not only include hardware but also smart IoT data processing and learning algorithms, e.g. image-based disease classification. In this paper, we work in a specific area of skin condition classification. This research work aims to provide an implementable solution for IoT-led remote skin disease diagnosis applications. The research output can be concluded into three folders. The first folder is about dynamic AI model configuration supported IoT-Fog-Cloud remote diagnosis architecture with hardware examples. The second folder is the evaluation survey regarding the performances of machine learning models for skin disease detection. The evaluation contains a variety of data processing methods and their aggregations. The evaluation takes account of both training-testing and cross-testing validations on all seven conditions and individual condition. In addition, the HAM10000 dataset is picked for the evaluation process according to the suitability comparisons to other relevant datasets. In the evaluation, we discuss the earlier work of ANN, SVM and KNN models, but the evaluation process mainly focuses on six widely applied Deep Learning models of VGG16, Inception, Xception, MobileNet, ResNet50 and DenseNet161. The result shows that each of the top four models for the major seven skin conditions has better performance for the specific condition than others. Based on the evaluation discovery, the last folder proposes a novel classification approach of the Targeted Ensemble Machine Classify Model (TEMCM) to enable dynamically combining a suitable model in a two-phase detection process. The final evaluation result shows the proposed model can archive better performance.


I. INTRODUCTION
With increasing the availability of advanced IoT devices, smart health applications have integrated into our life more easily than ever. Currently, these applications are more concentrating on health monitoring such as heartbeat, body temperature and room environment [1] and online diagnosis platform [2]. During the current COVID-19 pandemic, remote diagnosis becomes more important than ever and demands more on how to apply cutting edge AI technologies to assist doctor more accurate and efficiently from an IoT-data perspective. In our previous work, we investigated using The associate editor coordinating the review of this manuscript and approving it for publication was Po Yang . trust healthcare web data to assist disease prediction and mostly based on the causal reasoning logic framework [3]. However, there are some kinds of disease predictions that can be predicted more efficiently with image processing. In particular, the diseases are more related to appearance and images such as skin disease, infective rash, and bone-related injuries. In this paper, we focus on the IoT-led system architecture and importantly the AI-oriented skin disease prediction research. We provide a survey and evaluation on different machine learning models, especially the CNN-based algorithms to develop our novel two-phase TEMCM method.
Skin diseases can be minor issues caused by bacteria, allergy, or viruses, or fungal infection. However, skin diseases can be developed to become chronic and seriously develop into skin cancer. Thus, early detection is required to reduce patients' development risks. Currently, there are many different machine learning models have been proposed to support automatic dermatological tool development. However, these experiments sent confusing conclusions because of using different image dataset, comparing a subset of learning models and unique data processing methods or pre-trained model. Thus, the main goals of this paper are to provide an evaluation of possible solutions in fairness and unified way and enable to develop of a novel approach.
The research methodology to reach the model evaluation goal is display in figure 1: In the first approach, we will review the existing proposed and implemented methods in the literature.
Secondly, we will identify the classification algorithm that is applied in each paper for downloading or rebuilding the work in our local experimental evaluation space. In the end, we will group similar methods into different groups.
Thirdly, we will look at the dataset that is used for the proposed method in detail and analysis the features of different datasets.
Finally, we will define our evaluation parameters and provide the evaluated outcomes of different classification models.
By analysing the evaluation results, our research hypothesis is that a TEMCM-based two-phase method can provide better accuracy and confidence to assist doctor remotely by collecting skin image examples through remote camera or other kinds of IoT devices. The overall architecture of the remote diagnosis is presented in figure 3. The major challenge is that remote IoT devices may not enable to provide high-quality input e.g. images, which means one round examination and one specific model can have a higher level bias to these low-quality inputs. The paper is focusing on the skin disease classification problem rather than hardware research. The structure of the paper is organised as: Section 2 will propose a dynamic AI-model configuration focused on remote disease diagnosis architecture and hardware possibilities. The proposed architecture based on the state of art IoT-Fog-Cloud technologies. Section 3 will be a literature review on different kinds of skin disease training datasets and their studies. Section 4 will discuss existed published models and their evaluation limitations. Section 5 will set our unified evaluation environment and present the comparing discoveries on different CNN models. Section 6 will introduce our novel TEMCM approach. Section 7 will be the conclusion and overview of our future work.

II. IoT-LED REMOTE DIAGNOSIS FRAMEWORK AND DEVICES
IoT-led remote diagnosis becomes more attractive, important, powerful and efficient through recent fast development on advanced IoT devices and importantly the current COVID-19 conditions. In this context, a state of art research paper on the role of IoT-based application areas in the current pandemic has concluded that wearables devices, smartphones, digital measurement tools including vision-enabled device provide good facilities to implement remote early diagnosis or called Phase 1 diagnosis [11]. Based on the early diagnosis, the GP can have a good reason to decide if the patient needs to go to the hospital. The investigated architecture includes four major components: • Sensor notes (IoT devices) • Communication device (e.g. Smartphones) • Cloud services (Data storage and computation) • Authorisation Interface This architecture can provide a fast response to the health event, e.g. to detect COVID-19 symptoms and alert local authorities. However, it hides many open questions in the long-term such as cloud security issues, privacy issue and computation methodology clarity (algorithm run time selection and accuracy measurement).
A more comprehensive cloud-centric IoT based disease diagnosis healthcare framework is proposed in [12]. The three-phase architecture is presented in Figure 2. A similar work focus on personalised healthcare system is also investigated in [13].
• Phase 1: acquiring patient data from medical IoT devices. The obtained data will be uploaded to the cloud subsystem through the Fog layer that can be a gateway or local processing unit (LPU).
• Phase 2: applying healthcare diagnosis application to do disease measurements.
• Phase 3: generating advice message to the patients or their caretakers or emergency alter to local healthcare organisations. The proposed framework certainly gets strength on potability through LPU or gateway devices. However, data security is still a big issue here to upload the data directly to the cloud. Besides, the cloud diagnosis layer is very general to indicate the high-quality result generation methodology for certain disease detection. VOLUME 9, 2021 FIGURE 2. The three-phases and cloud-centric IoT based disease diagnosis healthcare framework architecture [12].
A more medical specific and secured IoMT (Internet of Medical Thing) architecture is proposed in [15]. The IoMT have three layers of • WBSN(Wi-Fi -Bluetooth-GPS network) to connect smart and embedded devices.
• Field Sensor Networks and Local services, Gateways to store use sensitive data in local edge side.
• Cloud services with high performance computing facilities and data storage for non-sensitive data processing From the recent development on IoT enabled disease detection, the clear trend is to integrate the Fog middle layer to solve the security issue of sensitive data process and generate data from Edge layer (IoT devices) [16]. Therefore, we proposed a more detailed IoT-led remote diagnosis system architecture that is presented in figure 3. The major fresh idea is the event trigger data transmission strategy and the detail design on the AI-based model management, communication and dynamic configuration inside the cloud layer.
We argue that the patient monitoring data including 3D data [14] should only be stored in the Fog notes until there is a trigger event occurs to permit the data transmission by a patient. The trigger event can be the patient's request, IoT device alerts or abnormal data detected, which means part of the intelligent computation should be stored in the Fog layer as well.
The cloud side only stores necessary data temperately for the event just for runtime response. Different AI diagnosis Models need to be indexed with semantic annotations for dynamic selection to response to the runtime event request. Furthermore, different models should be used together to produce an integrated diagnosis output to increase accuracy. This is the second part of our research in this paper to details an approach that builds a skin disease classifier from individual model generation to models combination method.
The hardware devices support such an architecture in the device and Fog layers can reuse the Doctor Hazel project [17] output which is specific for skin cancer detection purpose. The hardware components are: • A screen or monitor In addition, we can also apply Lightweight Deep Learning-based IoT hardware design on the Fog layer which is specific to support the Low-Cost and easy-to-access skin cancer detection [18].
After clarifying the architecture and hardware feasibility, this paper will start to focus on the skin disease data evaluation and the AI model generation.

III. DATASETS AND LEANING MODELS A. EARLY RESEARCH FOR SKIN DISEASE CLASSIFICATION
Research on Machine Learning and AI technologies to assist skin disease diagnosis has two-decade history already. In 2001, researchers started experiments and comparing artificial neural networks (ANN), decision tree, logistic regression, and support vector machines (SVMs) as well as an unsupervised machine learning method (KNN) for skin disease classification [4]. The images' features of 107 morphometric were extracted using an adaptation of grey level thresholding to the three-dimensional colour space (hue, saturation, value). The work had limitations on disease type (only focused on three types of pigmented skin lesions), data size (totally 1619 pictures), quality of images and hardware support due to aged technologies. However, there are still two valuable conclusions: • ANN and SVM are performed better than other nonlinear methods. • Logistic regression performed better than all nonlinear methods if applied on fewer data features.
Five years later, a PhD thesis published a new research result using ANN and SVM with Fourier transformation image processing [5]. Besides, four more skin conditions are added to experimental research. However, the data size and quality of the image have not been improved. The data size is even smaller (average around 30 pictures for each condition). The output tool Skincheck claimed ANN had a better performance than SVM by around a 10% accuracy rate. An automatically early skin cancer detection research was published in 2009 [6], which used ANN combined with Image de-noised by wavelet transforms and statistical region merging (SRM) segmentation process. The work [6] applied a relatively bigger dataset and claimed two interesting points: SRM can support better performance to the ANN methods than the region of interests (ROI) segmentation algorithm. It achieved a similar performance in terms of accuracy compared to the work described in [5]. The earlier research results showed that the ANN has good performance (all above 85%) to deal with skin condition classification and detection but only be tested on the small numbers of test pictures. The major short consequence is the underfeeding problem that will produce a significantly decreased performance for real scenarios. One of the important contributions of this paper is to evaluate the most recent newly developed methods -deep learning models -to see if they can produce similar or even better performance by having a much larger size of a dataset with advanced data processing methods in a unified evaluation setting. Moreover, we will do the cross-testing evaluations that had not been researched from existing research work.

B. SKIN DISEASE DATASETS
We have identified three major datasets that were used in current existing machine learning methods for automatic dermatological research.
The most used dataset as we indicated as 'Use ratio' criteria in table 1 is the HAM10000 (Human Against Machine with 10000 training images) dataset [7]. Figure 4 shows the examples of images contained in the dataset. There are 17 research papers out of 23 that we examined used it as their model development dataset. HAM10000 contains a large collection of multi-source images of common pigmented skin lesions and were collected over 20 years.
There are also many independent smaller dermoscopic datasets are created for skin disease prediction purpose. The datasets have the range from 200 to 1097 images and condition classes from 2 to 6 published from 2000 to 2016 [8]. Based on these previous datasets, a large dataset contains 198 types of skin diseases SD-198 was created and can be assessed at [9].
The other useful and large dataset related to skin conditions is the Skin Segmentation Dataset (SSD) [10]. The dataset published in 2009 contains 245057 instances and 50859 skin samples. However, the dataset is mostly applied in the area of skin segmentation research area rather than skin conditions. VOLUME 9, 2021  Although SD-198 contains more conditions, individual condition image samples are relatively smaller compared to HAM10000. In addition, HAM10000 is used by most of the research development and testing benchmark. We investigated 25 different methods or implementations to provide a dataset comparing table 1.
Based on dataset analysis, we set to use HAM10000 as the benchmark dataset to do the evaluation on deep learning models. Figure 5 shows the image examples with annotations of disease name, age and location.

IV. RESEARCHED MACHINE LEARNING MODELS
Before the evaluation, we divided the existing skin classification learning models into two groups according to the algorithms and model architectures.

Group 1: SVM-based models
The SVM-based skin disease recognition method that worked on the image colouring with texture features was proposed in 2018 [19] for three skin diseases. The results claimed 85% recognition rate for herpes, 90% for dermatitis and 95% for psoriasis skin conditions. The results are better than KNN based method proposed in [20] and grey-level co-occurrence matrix method proposed in [21]. The method includes three parts of image rotation, image segmentation and SVM classification. The most important process is image segmentation before applying SVM methods for classification. The image segmentation consists of Texture featuring (contrast, correlation, entropy, uniformity and energy) and colour featuring with image segmentation. However, the data itself is very small around 20 tests images and image source is not open to access. Moreover, there are no implementation details available allowing us to rebuild the method for evaluation. This is another literature shows that unsupervised method KNN is not good enough compared to other methods.
Group 2: ANN-based models There is a new implementation of the ANN method based on HAM10000 dataset without image processing by invoking Keras machine learning libraries and 50-epoch. The implementation has around 76.67% accuracy rate for the seven skin-disease classification task [22]. Group 3: CNN-based methods CNN is the specialist machine learning algorithm to deal with image classification. We grouped CNN algorithms into two subgroups of shallow and deep models.
• Sequentially shallow CNN models where less than 5 convolutional layers For example, [23] implement a CNN model using Keras Tensorflow library that has a two-Conv2D layer with 32 filters and an addition two-Conv2D layer with 64 filters at the end. The model can achieve around 76.5% validation accuracy rate. A case study applied only 2-layer of single filter model published in [24] produced only about 51.8% validation accuracy rate. The similar experimental tests were explored in [25] with incrementally adding more layers and increasing size of the feature mappings (filters) from 32 to 128 and concluded a 79.3% validation accuracy rate by having 4 layers and 25-epoch. Some other implemented applications based on the shallow CNN model achieved a similar validation accuracy rate on Kaggle notebooks.
• Deep CNN models of VGG16, ResNet, Inception, mobileNets, Xception and DenseNet There are many research papers have compared the subset of deep CNN models for skin condition classification and detection recently. Research on early skin cancer detection (7 major conditions) [26] compared custom 3-layer CNN, VGG-16 and Inception V3 (Google library) with ROI image processing to conclude that Inception V3 has the most validation (sensitivity) accuracy of 84% comparing to the other models. In the meantime, there are two different implementations on  individual models of ResNet50 [27] and mobileNet [28] with fastAI library with image dataset resetting of test and validation sets built only from images of lesions with one associated image. The ResNet50 can achieve more than 85% and mobileNet can achieve 90% accuracy rate with the image data resetting. By review these experiments, there are some inconsistent and conflict results due to different settings on model epoch, image data refining, APIs and validation methodologies. To avoid such variate evaluation settings, we design an unified evaluation environment in terms of the same dataset, model building APIs, epoch and data processing steps and validation process.

V. EVALUATION OUTCOMES WITH UNIFIED EXPERIMENTAL SETTINGS
The evaluation settings are: • Dataset: HAM10000 • Deep learning APIs: Keras with GPU • Epoch: 13 (we tried 20 epoch first and record the epoch that has the significant learning rate deduction for all six deep models, the optimization for epoch select is 13) • Data processing steps: 1. no any data processing; 2. with color featuring and transfer learning; 3. with color featuring, data balancing and transfer learning. The transfer learning model bases on the imageNet.
• Validation process: testing accuracy and cross-validation accuracy.  • Data split: we split the data into three groups of training, testing and validation as 6:3:1.
The figure 6 shows an example of image changes by applying color featuring (left is the original image) to The test evaluation result displays in table 2 and we can see that DenseNet has the best testing performances comparing to other models. Table 2 only shows the test validation not reflect cross-test. The second real cases testing (cross-validation) by using the 10% reserved images which have not been seen by the training model on top of all data processing steps. Furthermore, we look at the detail accurate rate against the individual skin disease. We select the top 4 models for presenting the results in figures 7 -10.

VI. TARGETING ENSEMBLE MACHINE CLASSIFY MODEL AND EVALUATION
Based on the evaluation, we propose a two-phase TEMPM (Targeting Ensemble Machine Classify Model) approach (see figure 11): 1) Initial voting step 1: Take an image as input for all different trained models and then voting the classification with equal weights. If all the models agreed on one single classification result, then the detective conclusion will be provided. 2) Nomination step 2: If there are different classifications, the most popular one will be the detective result. the matched pre-trained specialist CNN model will be nominated to recheck the correctness, as   before the process is pruned, then the detective conclusion will be provided. Otherwise, the new IoT image is requested to restart the classification from the beginning.
We use the same validate data which were not seen at training time to test the accuracy of our model. Figure 12 shows the image distribution to the seven skin disease categories.
Since the models are all pre-trained, the TEMCM is a type of transform learning process and very efficiency working at runtime. The overall evaluation result is about 98.48% and the figure 13 indicates a very high accuracy rate for all seven diseases.

VII. CONCLUSION AND FUTURE WORK
In the paper, we presented a research work that firstly discussed a dynamic AI-model configuration and secured IoT-Fog-Cloud architecture for remote disease diagnosis, especially it related to skin diseases detection. Secondly, we provided a new classification process to produce better classification results in skin disease detection. To achieve it, we evaluated the existed machine learning models in a controlled and standard testing environment. Different from the other evaluation work, our evaluating environment applies to a well-known and widely used HAM10000 dataset rather than the customised dataset. Besides, the evaluation uses only Keras GPU APIs to test on different combinations of three pre-processing methods working on the condition images, which are colour featuring, model transfer and data balancing. Moreover, we did not only evaluate the training-test accuracy but did cross-validation analysis. In the end, the evaluation outcomes enhanced our research hypothesis of having a two-phase classification process can produce a better result than only using one specific CNN model. Our next research directions include four-directions: • Evaluating the proposed two-phase classification process as a base of transfer model for wider skin-related disease classification problem such as rash, allergy or bone-related diseases.
• Continuing the integration of image-based disease recognition models with IoT devices and data management systems to build a working application based on the presented framework in figure 3.
• Enhancing the capabilities to support personalised remote healthcare systems [29].
• To implement the proposed architecture to support a real-world remote disease diagnosis application with our industry partners.