Internet of Medical Things—Based on Deep Learning Techniques for Segmentation of Lung and Stroke Regions in CT Scans

The classification and segmentation of pathologies through intelligent systems is a significant challenge for medical image analysis and computer vision systems. Diseases, such as lung problems and strokes, have a serious effect on human health worldwide. Lung diseases are among the leading causes of death worldwide, lagging behind strokes that in 2016 became the second leading cause of death from illnesses. Computed tomography (CT) is one of the main clinical diagnostic exams, linked to Computerized Diagnostic Assistance Systems (CAD), which are becoming solutions for health technologies. In this work, we propose a method based on the health of things for the classification and segmentation of CT images of the lung and hemorrhagic stroke. The system called HTSCS - Medical Images: Health-of-Things System for the Classification and Segmentation of Medical Images, uses transfer learning between models based on deep learning combined with classical methods for fine-tuning. The proposed method obtained excellent results for the classification of hemorrhagic stroke and pulmonary regions, with values of up to 100% accuracy. The models also achieved outstanding performances for segmentation, with Accuracy above 99 % and Dice coefficient above 97% in the best cases with an average segmentation time between 0.095 and 1.7 seconds. To validate our approach, we compared our best models for the segmentation of lung and hemorrhagic stroke in CTs, with related works found in state of the art. Our method brings an innovative approach to classification and segmentation through the use of the Health of Things for different types of medical images with promising results for medical image analysis and computer vision fields.

T. Han et al.: Internet of Medical Things-Based on Deep Learning Techniques for Segmentation COPD involves obstructive bronchiolitis, emphysema, or characteristics of both. The first causes a permanent state of inflammation of the airways, causing swelling inside the breathing tubes, interfering with the airflow capacity and efficiency of the lungs, while the second destroys the alveoli, structures that promote gas exchanges in the organ [4].
The condition is dangerous because, in addition to the potential of inhibiting breathing altogether, it decreases the circulation of oxygen in the blood and triggers other inflammatory responses throughout the body, causing the risk of heart attack and stroke to double. Patients may also suffer from muscle weakness, impaired reasoning and even become more subject to depression.
Although the majority of COPD are caused by tobacco smoke (85%) [5], other harmful compounds such as heavy exposure to certain dusts at work, chemicals, and indoor or outdoor air pollution (including wood smoke or biomass fuels) and genetic factors (inherited) can contribute to COPD. The first challenge to identify and classify COPD is to carry out an accurate segmentation. Bearing in mind that this disease visually alters the appearance of the lung in the CT examination, and partially alters the sharpness of the lung boundaries [6].
Like COPD, a Cerebral Vascular Accident (CVA), also known as a stroke, is a significant cause of mortality and disability with considerable economic costs for post-stroke care, on a global scale [7]. Strokes are a group of disorders that involve sudden interruption of the blood flow in the brain. Obstruction of the cerebral arteries may cause neurological deficits. Hemorrhagic and ischemic strokes are the two most common type of stroke. A hemorrhagic stroke occurs when an artery in the brain ruptures, causing wide and devastating bleeding in the brain; whereas an ischemic stroke occurs when a blood vessel that carries blood to the brain is blocked [8].
In 2016, stroke was the second leading cause of death globally (5.5 million). The number of women who died from a stroke was slightly lower than the number of men (2.6 million and 2.9 million, respectively). The number of deaths from ischemic stroke was 2.7 million, somewhat less than the number of deaths due to hemorrhagic stroke which was 2.88 million [9]. In 2015, according to the American Heart Association (AHA) and the American Stroke Association (ASA), approximately 800.000 strokes occurred in the United States, and these were responsible for one in every 20 deaths [10]. In addition to the high mortality rate, most people who survive a stroke end up with some kind of disability in relation to their basic activities, compromising quality, and life expectancy [11]. Disability varies according to the degree of neurological recovery, the location of the injury, the patient's pre-morbid status, and environmental support systems [8].
Currently, various medical areas carry out diagnoses using images [12]. Computed tomography (CT) stands out as the most important equipment used to acquire these images [13] due to its availability in almost all emergency units and its fast acquisition of the results. Also, CT has gained increasing importance as the diagnosis is less invasive, than some other systems, and gives precise results [14]; in addition, it can be used to acquire images of lung, heart, brain, arteries, and bones, among others [15].
The diagnosis of COPD through CT helps to evaluate the extent and distribution of COPD [16], estimated by visual quantification or by analyzing the distribution of lung density [17], providing a more accurate and objective assessment of the disease [18]. Unlike COPD, stroke is considered a medical emergency and needs to be diagnosed and treated promptly to minimize the implications that may occur [19]. Thus, CT presents itself as the most adequate and financially viable technique, due to its low cost, and agility [20].
Diagnoses by CT exams can be improved through computer-aided diagnosis (CAD) systems. Thanks to the performance of CAD systems in improving the efficiency and accuracy of clinical diagnostics by detecting and/or automatically classifying abnormalities and/or diseases in radiological medical examinations, many commercial systems have been developed, with specialized systems for specific areas. For example, many CAD systems are aimed at detecting breast, lung, or colon cancer using X-rays, CT, or magnetic resonance imaging. To assist in the diagnosis, the first challenge of the system is to locate and segment the region of interest. Segmentation techniques are applied to find regions of interest in an image. In the case of medical images, it is common to use segmentation methods to demarcate organs and associate them with the study of pathologies. Given this, a significant number of studies are produced emphasizing the use of CAD systems in the diagnosis of diseases. Among the techniques developed to automate the task of segmentation of pulmonary regions, we have the traditional segmentation techniques based on region growing [21], region growing with local thresholds, and watershed [22] approaches. However, these traditional techniques do not obtain consistent segmentation for regions with low contrast parenchyma, leading to inappropriate results when applied on CT exams [23].
Recent works using Convolutional Neural Network techniques for classification [24], segmentation [25] and detection [26] of objects of interest in images have been proposed with a fair amount of success. However, when dealing with medical images, this technique becomes a challenge [27], because to perform an effecient training of these deep models requires a large set of medical images [28]. To overcome this, a strategy called transfer learning [29] suggests that the resources learned to solve a specific problem can be used to solve problems in other domains [30].
Despite this, several studies using deep learning methods for segmentation in medical images have been developed [31]. Hu, Qinhua et al. [32] proposed the Convolutional Neural Network (CNN) Mask R-CNN combined with supervised and unsupervised machine learning methods for automatic segmentation of the lungs in CT images. Medeiros et al. [33] proposed a new approach using the Mask R-CNN to segment the left-ventricle with success. Zhang, Rongzhao, et al. [34] proposed an automatic segmentation of acute ischemic stroke using fully convolutional DenseNets.
Several techniques have received a lot of attention from researchers. Techniques using fine-tuning have come up with new possibilities in the field of computer vision [35] and data science [36]. In parallel with the use of transfer learning, different monitoring systems in the medical area based on IoT principles [37] have been proposed. These systems, called Health of Things, work with the rapid exchange of medical information concerning the conditions of patients, medical images and diagnostics, and they, thereby optimize the work of health professionals and generate significant improvement in the quality of medical treatments, in addition to decreasing medical costs per consultation, follow-up and diagnosis [38].
Motivated by the success of the Health of Things systems, we propose a system to aid medical diagnosis called HTSCS -Medical Images: Health-of-Things System for the Classification and Segmentation of Medical Images, using principles of IoT, transfer learning, deep learning and finetuning. Specifically, this work aims to: • Extract deep features from Lung Image Databases and Stroke Databases using two different CNN models pre-trained on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) such as: Xception [39] and VGG16 [40].
• Apply a Support Vector Machine (SVM) for each extracted feature set.
• With pre-trained classifiers, classify the CT images of the lung and hemorrhagic stroke and separate the CT images that contain or not the region of interest (RoI) for segmentation step.
• Having detected the pulmonary region, or hemorrhagic stroke in a CT image at the classification stage, segment the area of interest using the Detectron2 network.
• Apply three fine-tuning techniques based on Region Growth [41], K-means clustering [42], and Parzen Window [43] to improve the edges of the region considered as a region of interest.
• Compare the results between the proposed methods and other works in the literature through five evaluation metrics: Accuracy (ACC), Dice coefficient (Dice), Sensitivity (Sen), Specificity (Spe), and Time.

II. RELATED WORKS
In this section, we will present several works in the research areas covered by this article, as well as works related to the theme in different contexts.

A. MEDICAL IMAGE PROCESSING USAGE
Image processing techniques have become increasingly common in the medical field, especially methods using filters in computer vision with CAD systems. These techniques are in demand due to the fast results, effective segmentation and classification of medical images [44].
Understanding the importance of the medical applicability of DPI systems, Bouchet et al. [45] proposed the use of fuzzy [46] mathematical morphology in the segmentation of branches in angiographic images. The results were visually superior when compared to conventional PDI techniques for the same cases. However, the analysis was only visual, and without comparison with the results provided by a specialist doctor containing the ground truth.
Aware of the power of the region growth technique, Duan et al. [47] proposed a method of segmenting the pulmonary vessels in CT images using computational operations to filter anisotropic diffusion and region growth. Their results obtained a sensitivity of 92.9% and specificity of 91.6%. However, the developed algorithm is only able to successfully segment the vessels of a healthy lung, in addition to the fact that the method had difficulties in finding the ideal filtering parameters, which consequently may have reduced the values of the evaluation metrics.
Raja et al. [48] proposed a method of segmenting MRI T1 images of sick brain and breast, using the integration of the Chaotic Bat algorithm, Tsallis-based threshold, and the region growth technique. After testing the algorithm in the BRATS Brain Tumor Segmentation Challenge, the method obtained an 97.5% Accuracy and 90.36% Dice. The method encountered difficulties only in segmenting some slices of the brain. However, the method is not automatic, requiring human input to adjust the parameters for each image.
Pei et al. [49], developed a technique based on the density of non-automatic CT images, for pre-defining the number of clusters aiming at an automatic clustering that helps in the segmentation of medical images. The authors used a simple threshold to separate the clustered region of interest. The results using different datasets, such as Ecoli and iris, obtained an average accuracy of 83.15%. It remains to be seen whether the proposed method is capable of effectively clustering brain CT images.

B. IoT SYSTEMS IN MEDICAL IMAGES
The medical monitoring systems, called Health-IoT [50], perform a quick exchange of information about the conditions of patients, medical images and diagnoses, to optimize the work of professionals in the clinical area and generate a significant improvement in the quality of treatments [38]. Santos et al. [51], proposed in their work an architecture that encompasses the use of different types of health managers and gateways. In addition to the interoperability, through the use of adopted standards, their architecture makes data exchange between machinery much faster, thus making management much more effective. However, as it is a recent technique, its method still has some security flaws and interface problems. Following the same line of thinking Al-Hamadi and Chen [52] proposed a communication protocol between the Health IoT systems based on decision making. This protocol enables the creation of a collective knowledge database between the devices, which will later make it possible to make decisions by the devices scheduled. VOLUME 8, 2020 Hassana et al. [53] carried out a study on systems based on Health of Things [54] aimed at monitoring health with the aid of sensors and visualization devices. In their conclusions, the authors stressed the effectiveness of these systems in monitoring the health of patients from a distance, which reduces the need to go to the doctor's office, consequently reducing costs and providing a significant improvement in health monitoring.
Aware of the effectiveness of Health of Things-based systems in health monitoring, Ray [55] developed H3IoT, an architectural framework based on the Internet of Things Home Health Hub [56]. Its architecture focuses on monitoring the health of the elderly in their homes. The authors concluded that the system is light and easy to install, but that it needs some adjustments to work in a clinical setting.

C. COMPUTER VISION APPLIED TO IMAGES in PULMONARY REGIONS
Knowing the importance of segmentation of the lungs for the detection of lung diseases, Hu et al. [32] proposed an automatic method using the Mask R-CNN network and Machine Learning [57] along with the aid of Transfer Learning to segment lung CT images. The method obtained excellent results (Accuracy of 97.68%) in comparison with the segmentation of a specialist doctor. The technique performed the segmentation in an average time of 11.2 seconds, which is considered a reasonable time for the segmentation of lung CTs when dealing with deep learning networks and automatic processes.
With the same purpose Wang et al. [58] used CNN [59] with Machine Learning for the segmentation of pulmonary nodules. The method obtained only 80% Accuracy, due to the difficulty of the method to segment small nodules, and the different sizes of the lung in the various CT images. These differences in size are due to the process of the respiratory system (breathing) during the performance of the exam.
Duraisamy and Duraisamy [25] developed an approach to segment MRI images of stroke and lung. His technique involved the use of CNN, fuzzy logic C, and K nearest neighbors; and his results were visually superior to the use of only CNN in the segmentation. The method obtained 95% in Accuracy. However, the dataset flow of lung images can influence the metric values. The Transfer Learning process [60] has been increasingly used which makes the model more robust and efficient, while the demand for data decreases for training.
Based on the above, Shin et al. [27] proposed a study with the most commonly used CNNs such as CifarNet, AlexNet, GoogleNet, and others. Although all of these networks are able to classify the most varied types of objects, they were trained to target lung diseases, such as Tocaro-Abdominal Lymph Nodes and DIPs. The study proposed by the authors consisted of using Transfer Learning between these networks so that the segmentation process is optimized. However, the method only obtained 85% sensitivity. Also, the models proved to be limited as the weights were reused, and their sensitivity was surpassed by more modern methods.
Wang et al. [61], proposed an approach to the interactive segmentation of medical images using deep learning [31] with Fine-tuning [62] at the end of the process. Even though the results are not robust and accurate enough for clinical use, which was probably due to the insufficient variation of the samples, the proposed method obtained cutting-edge results in the automatic segmentation of medical images.

D. COMPUTER VISION APPLIED TO IMAGES OF BRAIN REGIONS
Aware of the importance to segment brain quickly and effectively, Havaei et al. [63] proposed a method for segmenting brain tumor MRI images with the help of a cascading architecture CNN. The results obtained a Dice of 87% in comparison with the segmentation of the specialist doctor; the average segmentation time per image was 25 seconds. However, the method had difficulties in detecting small tumors in the brain.
Chen et al. [64] developed an architecture composed of CNN EDD Net and MUSCULE Net for the automatic segmentation of a database of 741 MRI-DWI images of Ischemic stroke of different patients. The results achieved a detection rate of 94% and DICE of 67% when compared to the gold standard. However, the authors admit that the method had limitations in the segmentation of small lesions.
Haan et al. [65] elaborated a method of the semi-automatic segmentation of CT and MRI images of Ischemic stroke, based on clustering techniques. The results were encouraging with Jaccard of 87%. However, as it is semi-automatic, the method requires operator intervention to configure the initial parameters. Sun et al. [66] also proposed a 3D segmentation method for Intra-Cranial Hemorrhage in CT images with the aid of a Supervoxel algorithm, which in turn is based on Simple Linear Iterative Clustering, and refined with the algorithm Graph. The proposed method obtained a True Positive Fraction of 97.94% and False Positive Fraction of 92.26%.
In the study by Rebouças et al. [43], the authors proposed a method of semi-automatic segmentation of cranial CT images through the use of a Parzen's Window. The method obtained an accuracy of 99.84% in comparison with the results of a radiologist (ground truth), and surpassed the results based on nebula C, watersheds, and region growth that were used to validate this method. However, since the method is not fully automatic and the user needs to select the initial point of growth in the injured region up to its edge.
Aiming to compete with state of the art works found in the literature, this study proposes a method based on health of things to classify and segment lung computed tomography (CT) and hemorrhagic stroke images. The system called HTSCS -Medical Images: Health-of-Things System for the Classification and Segmentation of Medical Images. The classification makes use of a VGG16 for the extraction of attributes and an SVM-RBF for the classification. Segmentation is based on the use of transfer learning between models and is based on deep learning using Detectron2, combined with the classic methods for fine-tuning.

A. DATASETS
This subsection presents the datasets used in the experiments of this study.
The CT lung images dataset contains 1,265 images, in the Digital Imaging and Communications in Medicine (DICOM) format, with dimensions of 512 x 512 pixels and 16-bit depth, containing the golden pattern. The images were acquired in partnership with the Walter Cantídio Hospital of the Federal University of Ceará (UFC), Brazil, and approved by the Research Ethics Committee. Committee -COMEPE (Protocol No. 35/06).
The CT stroke dataset consists of 100 axial images of hemorrhagic stroke in different patients. The Heart Hospital of Fortaleza, Brazil, provided the images, which were generated by a GE MEDICAL SYSTEM CT model Optima CT660. The images setting (parameters) were a slice thickness of 0.7 mm, 230 mm field of view, 120 kV tube voltage, and 80 mA electrical current. The dimensions of the image are 512 pixels high by 512 pixels wide, with a voxel of 0.488 x 0.488 x 1.5 mm. The quantification of the images is in the 16-bit standard, saved in the DICOM (Digital Imaging and Communications in Medicine) format.

B. DEEP EXTRACTION AND CLASSIFICATION
This subsection presents the deep extractors and classifiers responsible for extracting the attributes and classifying images of CT lung and CT stroke.
Deep attribute extractors are a transfer learning technique responsible for transforming the problem into a different domain space to increase the power of discrimination of a generic dataset. In the specific case of CNNs, a pre-trained model on a large data set is used to perform the extraction on an unseen data set. The last layer of the model is removed, and the model output will be of a size equivalent to the size of the last remaining layer remodeled into a one-dimensional vector [67], [68].
With the new extracted features dataset obtained via transfer learning, we can use the classic machine learning algorithms to carry out training and generalization of knowledge. The classification step is responsible for classifying the images and determining whether they contain an object of interest to segment or not.

1) DEEP FEATURE EXTRACTORS
Extreme Inception (Xception) is a CNN model that contains the Depth wise Separable Convolution layer, as proposed by François Chollet [39], in which is presented the Depth wise Separable Convolution layer. The model obtained good results using the JFT dataset [69], which contains 350 million images. Although Xception has the same number of parameters as Inception V3, this CNN uses the same model parameters more efficiently.
Visual Geometry Group (VGG) (Oxford University) [40] was runner-up in the 2014 ILSVRC challenge [70]. Its architecture consists of 16 uniform convolutional layers. Factorized Convolutions were the strategy used to increas depth, without causing overfitting of the model.

2) SUPPORT VECTOR MACHINES (SVM) CLASSIFIER
SVM with the Radial Basis Function (RBF) Kernel, also known as the Gaussian kernel, emerged to correct other kernels that did not adapt well to a large number of samples. The idea of the method is simple; the model establishes a boundary at the most extreme points of each class to create separation, thereby providing support vectors to define each class.

C. DEEP LEARNING
This subsection presents the deep neural network Detectron2, a state-of-the-art Mask R-CNN proposed by Facebook Artificial Intelligence Research (FAIR) [71].

1) DETECTRON 2
The deep neural network Detectron2 comes from the Mask R-CNN framework [72], proposed by Facebook Artificial Intelligence Research (FAIR). In order to perform object detection and segmentation, Detectron2 requires, in addition to the set of images and their respective ground truths (GT) that the data set be specified in a list of annotations. The format of the annotations for Detectron2 follows the format adopted by the COCO dataset [73]. The annotations must contain all the individual objects of all the images in the data set. In general, these annotations contain a list of vertices of polygons around each object of interest, their respective bounding box, category, and area. Thus, to conduct the training of Detectron2 with the sets of medical images studied in this paper, a function was created to analyze and prepare the images for the standard format of Detectron2. The function receives the category, an image and its respective ground truth (GT) and generates bounding boxes and masks of the objects present in the image. During training, the optimizer adjusts the parameters so that the predictions of the model correspond to the desired GTs.
After training, the neural network receives a new image and performs the generalized detection step, demarcating the possible regions of interest through the bounding boxes. These boxes demarcate the regions with the largest number of pixels whose characteristics are quantitatively similar to the characteristics of the regions of interest learned during the training stage. The specialized detection step consists of scanning the regions inside the bounding boxes pixel by pixel, and then the pixels of the image are classified as belonging or not to the region of interest. The result of classification is a binary mask composed of white pixels whose coordinates coincide with the pixels classified as belonging to the region of interest [72].

D. FINE-TUNING
Fine-tuning is understood to be a set of techniques used on the already consolidated techniques of Machine and Deep Learning in order to increase the effectiveness of the method with respect to classification or segmentation [74].
The fine-tuning techniques used in this article are based on Digital Image Processing, in which the Result Generated by Mask has its edges re-processed by combining the techniques Region Growth [41], K-means clustering [42] and Parzen Window [43], in order to improve the edges of the region considered to be the region of interest.
Region growth is a technique in which the set of pixels is incorporated as belonging, according to a given rule of adhesion based on gray levels [41]. The initial seed for the adhesion of the method is the result of Mask itself, so the trend of Region Growth over the Region of interest is only to increase its size.
K-means clustering, changes the gray levels of the image pixels, reducing their values to a K number of pre-defined average values [42]. This technique works by normalizing the pixel values of the exam, which can, help in other PDI techniques.
Finally, the Parzen Window is an advanced technique for adjusting the edges of the region of interest based on the probability density convoluted with a Gaussian Structuring Element [43].The borders of the region of interest go through a process of expansion or contraction according to the likelihood that the neighboring pixels belong to it or not as part of the region to be segmented.

E. EVALUATION METRICS
Evaluation metrics are important tools to analyze the efficiency and effectiveness of segmentation. Taking into account that each metric is responsible to evaluate a specific criterion of the segmentation, the set of metrics chosen covers the most important criteria. The main items to be evaluated are the background, the region of interest, the similarity with the specialist doctor and the number of correct answers in relation to the total [75]. The evaluation metrics adopted were: Accuracy: Accuracy is a classification metric and is directly related to the number of pixels that were correctly segmented over the total number of pixels segmented in an image [76]. Its formula is given by: Dice coefficient: The Dice Coefficient is an overlapping metric that analyzes the similarity between the region segmented by the algorithm and the region segmented by the specialist doctor, with a good segmentation being the one that most closely matches that of the specialist [77]. Its formula is given by: Sensitivity: Sensitivity is a classification metric responsible for evaluating the number of pixels correctly segmented as belonging to the region of interest among all the pixels actually belonging to that group [78]. Its formula is given by: Specificity: Specificity is a Classification metric responsible for indicating the number of pixels correctly segmented as belonging to the background region among all the pixels actually belonging to the background [78]. Its formula is given by: Statistical tests are important tools to compare metric values, considering that in some situations, the values may differ visually, but from the statistical point of view, they are equivalent. Using statistical tests, two quantities of samples can be compared as being greater, lesser, or equal, but this conclusion can only be drawn after the proper statistical test has been carried out [79]. The Kolmogorov-Smirnov Statistical Test is a parametric test whose functionality is to test the equality between two distributions of continuous samples of the same size. The hypotheses of this test are: Ho -the two samples are statistically identical, H1 -the two samples are different, one of which may be larger or smaller than the other [80].
The reliability coefficient (α) is a determining factor in deciding which hypothesis will be in force on the data because after the calculations performed on these samples. The final product will be a coefficient called P-Value. This P-value will be compared with the value α. This comparison defines the current statistical hypothesis according to the equation 5 [81].

K Test
The value of the reliability coefficient (α) commonly adopted is 5%, and for more rigorous tests, a value of 3% [81]. To perform the test with the metric values, we adopted α with a value of 3%.

IV. METHODOLOGY
In this section, we present our methodology, which consists of Three Steps; Step1-Classification of input images in the model. Step2 -Beginning of the segmentation process of the input images (Lung or hemorrhagic stroke).
Step 3: Use of fine-tuning for segmentation (Lung or hemorrhagic stroke) and the final result.

A. HEALTH OF THINGS
The Health of Things Internet of Things (IoT) systems are being applied in the healthcare field to connect computer vision systems with the end-user, health experts, and even patients who can use the application. In the proposed study, the API -HTSCS -Medical Images: Health-of-Things System for the Classification and Segmentation of Medical Images aims to communicate through the Representational State Transfer (REST) protocol, via JavaScript Object Notation (JSON). In this way, applications via WEB or applications can acquire the system because the API was developed in such a way as to interaction through IoT.
The model code is organized in nodes. The central part of the control is implemented in Java, where it controls the processing requests of the nodes implemented in python, which are the extraction and classification APIs using sklearning libraries, tensorflow and others.

B. CLASSIFICATION
Step 1 corresponds to the detection of two problems in a binary way, one is related to the lung (has lung or does not have lung in the image) and another related to the brain (has Hemorrhagic Stroke or does not have Hemorrhagic Stroke in the image).
In order to develop the Health of Things tool, which corresponds to step 1, we used the computational solution tool Lapisco Image Interface for Application Development (LINDA) [28].
Step 1 covers the beginning of the model as shown in Figure 1. LINDA is a cloud application; any device with Internet access can use it.
In order to find the best extractor-classifier combination, we follow the flow indicated in Figure 2. Registration is required to use the tool. After the login screen, the user must create his project by defining the project name and parameters, the amount of class and the desired action (extraction, classification). In the next stage, extractors and classifiers are chosen for training. Set the class number and upload the images in the PNG format. The images were initially in the Digital Imaging and Communications in Medicine (DICOM) format, which is the standard for medical images, the entire database is converted to Portable Network Graphics (PNG). When loaded, data extraction is started and then sorted. Finally, the best extractor-classifier combination is analyzed using graphs and confusion matrices.
The first step in image pre-processing is to normalize the image size to avoid very large or very small sizes, in addition to adjusting the color depth of the images, especially in the Dicom-type (DCM) images. Once the best combination has been defined, we can encapsulate the model trained to predict new image data in the cloud and thus proceed with the Health of Things model as shown in Figure 1. Figure 1 Step 1 item (1) is when the user uploads the image to be classified and segmented after training the model. In (1.a) the image is classified, and the Health of Things generates an output, as shown in item (1.b), showing whether or not it contains the object of interest. If it does not contain the object of interest, the computational model is closed, as shown in Figure 1. If the object of interest is detected in the network, the Health of Things model advances to Step 2 (segmentation of the object of interest), and in the case of this study, proceeds to the lung or hemorrhagic stroke segmentation process.

C. SEGMENTATION
After classification in Step 1, Figure 1, which identifies the object of interest (pulmonary or hemorrhagic effusion) according to the choice of images to classify and segment at the beginning of the processing system as shown in the Figure 1 -label (1). The result is given in item -(1.b), if (YES), then the model moves on to Step 2 of the system.
Step 2 represents the first phase of segmentation, as shown in item (2), representing the Detectron2 deep learning network as explained previously in Section III-C. The Detectron2 network identifies the real region belonging to the bounding boxes of the object of interest, and generates a characteristic map of (a lung or hemorrhagic stroke), as shown in the Figure 1 located in item (2.a). In item Detection in (2.a), the network detects the region or regions of interest in a generalized way through Bounding Boxes. These boxes detect the regions that are the most similar to the region of interest learned in the training stage. In the lung data set, the bounding boxes form around the lungs, in the Hemorrhagic Stroke data set, the bounding boxes form around the lighter region of the brain region, considering that this is possibly the region affected by leakage. The result of the detection process in this stage through the Detectron network is seen in Figure 3, which presents different results, for the CT images of the lung and the CT images of hemorrhagic stroke. Figure 3 shows the segmentation instance of Detectron2. The network detects and demarcates each object of distinct interest that appears in an image.
In this segmentation stage, called specialized segmentation, the pixels of the region demarcated by the bounding box are classified according to their attributes; thus, the network performs the construction of the region of interest. These classified pixels are used to create a binary mask represented in the Figure 1 item (2b); this mask is responsible for detection. Subsequently, it is used to target the region of interest in the (3) Health of Things model stage.

D. FINE-TUNING
Fine-tuning is the last step of the Health of Things system and is used as a segmentation adjustment, in order to improve the segmentation efficiency of the region of interest.
Three fine-tuning techniques were proposed together with the Detectron2 network as shown in the Figure 1   Step 1, consists of the classification of the input image (with the detection or not of the lung or hemorrhagic stroke. In step 2, the Detectron2 network starts the segmentation process of the lung or hemorrhagic stroke, generating characteristic maps through the bounding boxes. Step 3 continues of the segmentation process using fine-tuning, and finally presents the best result.  The second technique, Detectron2 + Parzen Window + K-means clustering (Detectron-fδ) item (3.2), it is the direct application of the Parzen Window technique over the segmented region. In this way, the clustering that acts to normalize the image aims at improving the boundaries of the region of interest in relation to the background of the image, which makes the Parzen Window more effective.
The third technique, Detectron2 + Parzen Window + Region Growth + K-means clustering (Detectron-fµ) item (3.3), consists of the reconstruction of the region of interest initially by the region growth technique on the clustered image with a K = 3 of the CT exam, where the network result itself serves as a seed for growth, in order to adjust the edges.
In item (3.a) of Figure 1, the segmentation results are presented with the different models based on fine-tuning, including the direct results (without the fine-tuning process). In item (3.b) and (3.c), only the results of the models that obtained the best performance in the segmentation are presented. Both models work in parallel for each input image in the proposed system.

V. RESULTS AND DISCUSSION
This section presents the results and discussions of the proposed method, according to the Methodology in Section IV. It presents the model based on Health of Things for classification and segmentation of Lung and hemorrhagic stroke in computed tomography using a deep learning network combined with fine-tuning methods.
The results are in three stages. In the first step, they present the classification of the input images after training the network, using the SVM method and extractors based on deep learning to define whether there is lung or stroke in the CT images presented at the network entrance. The classification is in the first step, then, in the second stage, it is subdivided into two phases; pulmonary segmentation and hemorrhagic stroke. In the first experiment, the metric values presented are for the segmentation of pulmonary CT. In the second phase of Step 2, the metric values presented are for the segmentation the hemorrhagic stroke. Finally, the Third Stage of Results are to validate the method proposed in this study, and we present our best results of pulmonary segmentation and stroke compared to the results of the methods reported in the literature.

A. FIRST EXPERIMENTAL STAGE -CLASSIFICATION
All the datasets used were pre-processed in the same way. Ten interactions were applied in which the patterns were arbitrarily divided into two groups, 80% for training and the rest for testing. The training sets were normalized (mean zero and unit variation), and the test sets were also normalized using the same normalization rules as the training sets. In order to find the best set of hyperparameters for the classifier, cross-validation of the k-fold with the grid search technique was applied. The hyperparameters that reached the highest precision in the validation set were stored, and the most repeated values at the end of the ten iterations were chosen as the best hyperparameters.
The grid search technique with cross-validation with 10-folds was adopted to choose the best hyperparameters of the SVM classifier with kernel RBF C and γ . The range of C and γ varied between [2 −5 ,    Table with the two best results of the extractor-classifier combination for the detection stage, whether there is a lung in the image or not. As can be seen, the combination of the VGG16 extractor and the Xceptron extractor, both with the SVM-RBF classifier, scored 97% in all evaluation metrics.  Table with the two best results of the extractor-classifier combination for the detection of hemorrhagic or non-hemorrhagic stroke in images of the skull. As can be seen, the best combination is the use of the VGG16 extractor with the SVM-RBF classifier, which scored 100% in all the evaluation metrics.
that solves the proposed problem. However, for the classification of lungs or not, two information descriptors were highlighted: Xception and VGG 16 with the SVM classifier configured with the RBF kernel. The Xception descriptor had an accuracy of 96.63% with an average time of 14.7ms and the VGG16 descriptor had 97.04% with an average time of 9.44 ms. The data set was organized as follows: class 0 without lung and class 1 with lung. To detect stroke or not in the brain, the descriptors Xception and VGG16 were applied with the SVM-RBF classifier. The two descriptors combined with the SVM-RBF reached 100% accuracy only with differences in the average time of image extraction with 16.10 ms for Xception and 10.88 ms for VGG16.

B. SECOND EXPERIMENT STAGE -SEGMENTATION
All segmentation experiments were conducted on an Ubuntu 18.04 operating system with 16GB RAM, Intel Core i7 processor, and NVIDIA GeForce GTX 1660 TI GPU as used for neural network training and inference. Initially, the model used was the Mask-RCNN R-50-FPN-3x, as can be seen in the overview of this models. 1 Since there were just over 1000 CT images of the lung and 80 of hemorrhagic stroke for training, we opted for pre-trained weights. The pre-trained weights from the model were used as initial weights in training. The neural networks were trained for 2,000 epochs with learning rates of 0.00025. The total training time for CT hemorrhagic stroke dataset was approximately 11 minutes, whereas in CT lung images it was 12 minutes.

1) SECOND EXPERIMENT STAGE -LUNG SEGMENTATION
The results generated in this section are based on the dataset presented in Section III-A; the same dataset was used in the works of Qinhua Hu et al. [32] and Rebouças et al. [82].  TABLE 3. Results generated by the proposed method using deep learning and fine-tuning combinations for lung CT images. The same images were used in the experiment of Rebouças et al. [82] and Qinhua Hu [32]. Table 3. Different models proposed based on Health of Things with the use of deep learning combined with the fine-tuning technique. Table 3 presents the results generated by this study in order to segment CTs of lung images. In this experiment 36 images of the dataset were also used in [82] and [32]. The dataset contains the GT. The first column presents the models Detectron2, Detectron-fλ, Detectron-fδ and Detectron-fµ.

FIGURE 4. Graphic illustration of
Model Detectron-fλ is based on fine-tuning using the Detectron2 combined with the Parzen Window technique.
Model Detectron-fδ is based on fine-tuning using the Detectron2 combined with the Parzen Window and Clustering.
Model Detectron-fµ is based on fine-tuning using the Detectron2 combined with the Parzen Window, Region Growth and Clustering.
The method was trained with 80% of the 1,265 lung CT images used by Qinhua Hu [32], all images contained the ground truth. This further validates our method, considering that the model was trained based on the notes of specialists.
According to the ACC column representing the Accuracy metric, one can see the success of the Detectron2 network in locating and segmenting the pulmonary region effectively and accurately. The Detectron2 network achieved an excellent result with 99.00 ± 0.60. With the exception of the network Detectron-fλ, the results had a slight improvement, with a kind of adjustment generated by fine-tuning with a minimum variation of 0.02, and then both reached 99.02% Accuracy. This is due to the efficiency of the Detectron2 network in truly detecting the location of the lung in the image, segmenting the contour that belongs to the object, and even predicting the true positives of the pulmonary region. This slight improvement with positive variations was reflected in the two models: Detectron-fδ and Detectron-fµ, that managed to improve the image adjustments more precisely. This is because both methods used Clustering techniques, which helped to identify with greater depth the relation of variations between the pixels belonging to the lung in the CT image.
The Dice coefficient (DICE) column showed some positive differences in comparison to the Detectron2 model (Model without the use of fine-tuning), except for Detectronfλ, which presented lower values than Detectron2 (model without the use of fine-tuning). Detectron-fλ was not able to overcome Detectron2, considering that only the Parzen Window as a fine-tuning had difficulty finding the edges of the more closed angles, as shown in the Figure 5. The models Detectron-fδ and Detectron-fµ, had slightly better results than Detectron2, but behaved similarly. With the DICE assessment that analyzes the calculated performance of the overlap, the results were relatively similar, both with 97%±0.03 and 97%±0.01 with an approximate 1.25% standard deviation, against 96.98%±1.21 from the network Detectron2. Such results show the efficiency of the models in making minor adjustments to the segmentation related to the edge of the lung.
Regarding the Sensitivity metric (SEN) of Table 3, we can a find similarity between the Detectron2 and Detectron-fµ models, with relatively equivalent values. In other words, both the networks without fine-tuning and fine-tuning were able to predict the pixels belonging to the area of the bottom of the lung correctly, and to classify them as a pulmonary region. Although the difference is small between the models Detectron-fλ and Detectron-fδ, this variation demonstrates the effectiveness of the method for different types of lungs in a CT image. Bearing in mind that the variation in lung size changes with each image acquired by the CT scanner, since the respiratory process responsible for inflating and deflating the lungs produces different lung sizes in a CT scan, making this variation of sensitivity somewhat complex. Not to mention the variety of different pathologies recorded by the CT, which show up as specific points that can be characterized as abnormalities on a CT as shown in the Figure 5.
The Specificity metric in Table 3 for both models achieved an excellent performance above 99% with variations of 0.16%. The model obtained an excellent performance in detecting the non-lung region. This means that the models are able to point out areas that are not part of the ROI in agreement with the excellent results of ACC predicted as the true region of the lung. In the Figure 4 graph representing Table 3 we can visually analyze the similarities between the models. Figure 5 presents the results of the segmentation of each model in different lung formats. There was a slight improvement made by the models with the use of fine-tuning in the search to get around the object from the result of the first segmentation step made by phase 2 of the proposed method. Table 4 refers to the segmentation time, and also highlights the success of the method based on the results obtained quickly and accurately in the segmentation of the lung. The average estimate of 1.7 seconds for the models, and excellent   performance makes the model effective and robust. Visually we can see the similarity between the results in Figure 1, also confirmed by the statistical test presented in Table 5. Figure 7 presents a box chart illustrating the average segmentation time of the models for CT hemorrhagic stroke images and CT lung images. Table 3 along with the images in Figure 8 illustrates the average time of segmentation per lung image, referenced in the graph with a blue bar. Also analyzing Table 4 for segmentation time in conjunction with Figure 5 of Segmentation Results, we can conclude that after the fine-tuning process the Detectron-fµ model obtained a relatively better performance in readjusting the shape of the object, starting from the result already generated by Detectron2 given as a growth start to meet the edges of the lung, mainly in the case of curvature points where the ends are quite accentuated. The models are statistically equivalent, and the contour of the Detectron-fµ model was shown to be relatively flexible at some lung curvature points, causing the model to have slight differences in the adjustment for the pulmonary and non-pulmonary regions.  In order to present the equivalence between the models, the Kolmogorov-Smirnov test was performed on the set of samples, in which each sample was tested with each other, and together they generated the average metrics of the Tables 4, 5 Table. Results of the metrics in the segmentation of the CT lung images, using the techniques: A) Detectron2, B)Detectron-fλ C) Detectron-fδ, and D) Detectron-fµ. Table 5 shows that the fine-tuning techniques developed and applied to the result of Detectron2, even if there was a slight increase, as there was in the values of the metrics, from the statistical point of view, these metrics remained similar.

2) SECOND EXPERIMENT STAGE -SEGMENTATION OF HEMORRHAGIC STROKE
In this subsection, the result of the hemorrhagic stroke segmentation is presented. Also used in the study of [43], the dataset contains the ground truth, and, with that, it was also possible to compare with the specialist's segmentation, validating the study, just as it was done in the first experiment with lung segmentation.  Comparison of the segmentation times between Detectron2 models and models using fine-tuning techniques. The results were generated from the CT hemorrhagic stroke images. Table 6 presents the results of the segmentation of 100 hemorrhagic stroke images. The results were surprising. The Detectron2 network managed to surpass the models in almost all metrics, with an accuracy of 99.89 ± 0.05, DICE of 94.81 ± 2.11 and SEN of 92.79 ± 3.87. This means that the Detectron2 network was able to detect hemorrhagic stroke and segment the brain region of interest relatively more accurately than models with fine-tuning. The network managed to be equivalent to the model Detectron-fµ in terms of metric specificity; however, both were successful in detecting the non-stroke regions. This is extremely important considering that the Accuracy values that represent the actual location of the lung are similar as shown in the ACC results with all models close to 100% accuracy.
The main difference between the models was found in the sensitivity (SEN) metric, varying 7% between the best and the worst case. This variation occurred only between the Detectron2 and the Detectron-fλ models, where they had a minor difficulty to circumvent the edge of the stroke that was already close to the limit. Only the Parzen Window technique to readjust to the edges of the stroke, makes the method more difficult to readjust. This can also be analyzed visually by observing the contour with slight deformities in the segmented image. In the Figure 6, we can visually analyze the results generated by each model, thus showing the evolution of each one and the similarity between the models.
The differences can also be seen in the graph in Figure 8, where they present the values in a bar form. This bar graph shows the great potential of the network in segmenting different objects. This is due to the fact that the Detectron2 network has weights pre-trained in different forms of learning in its structure, which is a kind of generalization among deep learning models. Table 8 demonstrates that the metric values of the fine-tuning technique Detectron-fλ are statistically different from the values of Detectron2, so it can be said that this method was less effective in the segmentation of  Table. Results of the metrics used in the segmentation of CT images of the brain, using these techniques: A) Detectron2, B)Detectron-fλ, C) Detectron-fδ and D) Detectron-fµ.
Hemorrhagic Stroke, as shown in Table 6 and illustrated in the graph in Figure 8. Figure 6 shows that the ground truth segmentation visually approximates the segmentation performed by Detectron2. This is because, in this specific case of hemorrhagic strokes, Detectron2 obtained the best performance among the models.

3) THIRD STAGE -COMPARISON BETWEEN METHODS IN THE LITERATURE
In this Section, in order to validate our method, we compared our best approach with other methods reported in the literature that used the same databases presented in Section III-A. Table 9 presents our best model based on this study, in comparison to the work of Qinhua Hu et al. [32]. The experiment used the same CT lung database with different types of methods. Table 9 shows the results generated by our approach using the Detectron2 network against the Mask R-CNN network. The values are much higher compared to Mask R-CNN. The Detectron2 network used in our approach managed to have a better result than Mask R-CNN by more than a 23% difference in DICE. Detectron2 with 99.00 ± 0. 60 Table 9, our model Detectron-fµ succeeded in performing better than all the models studied by Hu et al. [32], such as the Mask+bayes, Mask+SVM, Mask+K-means and Mask+EM models in all metrics shown in the Table. All the models used deep learning with methods combined with fine-tuning techniques. The graph represented by Figure 9 distinguishes clearly the variation between the lung segmentation metrics performed in [32]. Table 11 compares some results of [32] and [43], with different automatic methods. The Hu [32] method also used deep learning methods combined with fine-tuning to segment CT lung images. Our model Detectron-fδ presented the best results amongst those in the Table, and it was equivalent to the HU method for the DICE metric with 97%. However, our model was faster with a difference of 9.54 seconds compared to the model based on the Mask R-CNN combined  Table for the method proposed in the Health of Things system segmentation step with the lung image dataset. The methods used for validation can be found in the article [32]. Table graphic illustration of Table 9.  [43], with our best approach (Detectron2) for segmentation of hemorrhagic stroke.  Table of the method proposed in the segmentation stage of the Health of Things system for the lung image dataset, the methods used in the validation can be found in the articles (HU, GVF, VFC, SISDEP, OPS, CRAD) and [43].

FIGURE 9. Comparative
with fine-tuning through machine learning. Our approach was better than the other methods, including in segmentation time, ranging from 240 seconds in the worst case to 2 seconds in the best case of average lung segmentation time. Our approach was better than the CRAD model in the DICE metric that scored 94%, while our best model had 97%, thus surpassing most renowned works in the state of the art.
In Table 10 and 12 the results obtained by the work of Rebouças et al. [43] are compared to our best approach (Detectron2) for segmentation of hemorrhagic stroke.   [43]. Detectron2 reached 94% DICE against the best OPS Manhattan case with 84%, a difference of 10%. The difference is even greater compared to the worst case WS with 17%. In other words, the Detectron2 model proved to be more effective in segmentation hemorrhagic stroke. It is also worth mentioning that our approach is a fully automatic method, with deep learning, without human interference for the segmentation process, unlike the other methods presented in the Table. VOLUME 8, 2020  The Detectron2 network managed to have better results than all the works referred to in [43] including the average time of image segmentation. Although our approach uses deep learning where the process requires a higher computational cost, compared to the classic methods (without the use of deep learning), our approach through deep learning managed to be faster than the best case study in [43] for segmentation hemorrhagic stroke. Detectron2 took 0.09 seconds against 1.76 for the LSCPM method. There was a difference of 4.71 seconds compared to the worst case called the Ws method as illustrated in Figure 10 from the chart (table) comparing the models presented in the Rebouças et al. study [43].
The proposed method proved to be superior as well as fully automatic for the classification and segmentation of lung and hemorrhagic stroke CT images. The comparisons in the First Stage of Results showed that the method obtained excellent results using different models, and had the best and most effective approach for pulmonary segmentation. The comparison in the Second Stage of Results and Discussion showed that the Detectron2 network was able to detect and segment the stroke region without the need for fine-tuning. This shows the exceptional ability of the deep leaning network to get closer to the gold standard performed by a specialist; thus, making it possible to be used as a kind of pre-diagnosis for hemorrhagic stroke with an automatic segmentation in less than 1 second. The third step provided an updated comparison of renowned methods reported in the literature, as well as the validation of our approach. Table 13 presents a summary comparing the advantages and disadvantages of the proposed work and the works used for comparison.

VI. CONCLUSION
This work aimed to develop an innovative medical prediagnostic method based on Health IoT through deep learning and fine-tuning. The method proposes that learning through fine-tuning is able to generalize the learning to different types of CT images. The method was divided into two stages; classification and segmentation of Lung and Hemorrhagic Stroke on CT images. In the First Stage of our method, a model (classifier) was developed to classify the existence of strokes in a CT image. With the option also to classify the existence of lungs in the CTs.
In this process, if the network identifies the object of interest in the CT, it follows the Segmentation Process, which is the second stage of the method. This (second) stage is composed of four models to segment the pulmonary region and the hemorrhagic stroke injury. The models presented in this study, used deep learning combined with fine-tuning, and all the models obtained excellent results. The results were very satisfactory in both stages of the process. The best models obtained 97% Accuracy for image classification in pulmonary and non-pulmonary images and 100% in non-injured or hemorrhagic stroke images. The segmentation of classified images containing regions of interest also obtained with our best model Detectron-fµ excellent results, for pulmonary segmentation with 99% Accuracy and with an average time of 1.7 seconds, surpassing the works reported in the literature; thus showing the efficiency and robustness of our method. Our method was also very successful in the classification and segmentation of hemorrhagic stroke, reaching 94% DICE and 99% Accuracy, with an average segmentation time of less than 1 second, surpassing the works reported in the literature presented in Section V-B.3.
To validate our approach, we performed a comparison of our best models, against lung segmentation techniques and hemorrhagic stroke that used the same databases found in the literature. Our method was superior to all methods evaluated for pulmonary segmentation, surpassing the best models; HU [32] with 97% DICE with a segmentation time of 11.24. The second best model (CRAD), which had reached 94% with a processing time of 2 seconds, was also surpassed. Our model for stroke segmentation was also superior to the best method (LSCPM) of [43] with 84% DICE, that is, a difference of 10% in relation to our best case. Furthermore, the average time of the LSCPM model took 1.76 seconds, while our Detectron2 model took only 0.09 seconds. Thus, we proved the effectiveness of the method proposed by this study for classification and segmentation of pulmonary images and hemorrhagic stroke on CT.
For future studies, we propose to test different image bases (datasets) to validate our method as a powerful tool in helping medical diagnosis for the classification and segmentation of different types of pathologies through Health IoT, such as melanomas, and mammograms, among others. Also we propose as a future work to make the system HTSCS -Medical Images: Health-of-Things System for the Classification and Segmentation of Medical Images available for testing by other researchers and/or health professionals.
VIRGÍNIA XAVIER NUNES received the degree in computer science from the Instituto Federal de Educação, Ciências e Tecnologia do Ceará, (IFCE), Maracanaú, Ceará, in 2017, where she is currently pursuing the M.Sc. degree in computer science. She is a member of the Laboratory of Image Processing, Signals, and Applied Computing (LAPISCO). She contributes to projects involving computer vision, machine learning, and deep learning. Her research field focuses on medical images.
LUÍS FABRÍCIO DE FREITAS SOUZA received the degree in computing from the Universidade Federal Rural de Pernambuco (UFRPE), in 2013. He is currently pursuing the M.Sc. degree in computer science with Instituto Federal de Educação, Ciências e Tecnologia do Ceará (IFCE). He is currently working as a University Professor and carries out research. He is a member of the Laboratory of Image Processing, Signals, and Applied Computing (LAPISCO). His current research interests include design of signal processing methods, machine learning, and pattern recognition. His most recent article is an effective approach for CT lung segmentation using mask region-based convolutional neural networks in 2020.
ADRIELL GOMES MARQUES was born in Fortaleza, Ceará, Brazil, in 1999. He received the Electronic Technician degree from the Federal Institute of Science and Technology of Ceará (IFCE) Campus Caucaia. He is currently pursuing the degree in mechatronics engineering from IFCE, Campus Fortaleza. He is currently a member of the Laboratory of Image Processing, Signals, and Applied Computing, Lapisco, where he is working on several research projects.
IÁGSON CARLOS LIMA SILVA is currently pursuing the degree in mechatronics engineering with the Federal Institute of Education, Science and Technology of Ceará (IFCE), Fortaleza, Ceará. He is a member of the Laboratory of Image Processing, Signals, and Applied Computing (LAPISCO), where he is also working on various research projects. He has experience in digital image and signal processing, and with data science.
MARCOS AURÉLIO ARAUJO FERREIRA JUNIOR is currently pursuing the degree in mechatronics engineering with the Instituto Federal de Educação, Ciências e Tecnologia do Ceará, (IFCE), Fortaleza, Ceará. He is a member of the Laboratory of Image Processing, Signals, and, Applied Computing (LAPISCO), with experience in applied computing and embedded systems. His current interests include artificial intelligence, the IoT Systems, and robotics process automation.