Machine Learning Tools for Long-Term Type 2 Diabetes Risk Prediction

A steady rise has been observed in the percentage of elderly people who want and are still able to contribute to society. Therefore, early retirement or exit from the labour market, due to health-related issues, poses a significant problem. Nowadays, thanks to technological advances and various data from different populations, the risk factors investigation and health issues screening are moving towards automation. In the context of this work, a worker-centric, IoT enabled unobtrusive users health, well-being and functional ability monitoring framework, empowered with AI tools, is proposed. Diabetes is a high-prevalence chronic condition with harmful consequences for the quality of life and high mortality rate for people worldwide, in both developed and developing countries. Hence, its severe impact on humans’ life, e.g., personal, social, working, can be considerably reduced if early detection is possible, but most research works in this field fail to provide a more personalized approach both in the modeling and prediction process. In this direction, our designed system concerns diabetes risk prediction in which specific components of the Knowledge Discovery in Database (KDD) process are applied, evaluated and incorporated. Specifically, dataset creation, features selection and classification, using different Supervised Machine Learning (ML) models are considered. The ensemble WeightedVotingLRRFs ML model is proposed to improve the prediction of diabetes, scoring an Area Under the ROC Curve (AUC) of 0.884. Concerning the weighted voting, the optimal weights are estimated by their corresponding Sensitivity and AUC of the ML model based on a bi-objective genetic algorithm. Also, a comparative study is presented among the Finnish Diabetes Risk Score (FINDRISC) and Leicester risk score systems and several ML models, using inductive and transductive learning. The experiments were conducted using data extracted from the English Longitudinal Study of Ageing (ELSA) database.


I. INTRODUCTION
Diabetes, also known as diabetes mellitus (DM), is a chronic disorder characterized by high blood glucose levels, due to the inability of the pancreas to generate a sufficient quantity of insulin (Diabetes Mellitus Type-1 (T1DM)) or the failure of cells and tissues to utilize it (Diabetes Mellitus Type-2 (T2DM)) [1]. Apart from T1DM and T2DM, another type is Gestational diabetes, which affects women and develops during pregnancy. Since the prevalence of T2DM in ageing population (i.e., elderly people) is rising [2], [3], the analysis The associate editor coordinating the review of this manuscript and approving it for publication was Firooz B. Saghezchi . in the following sections focuses on such age group which constitutes the participants in SmartWork. Some characteristic signs and symptoms of high glucose include itching, frequent fatigue, unexplained weight loss, excessive urination, dry mouth and increased hunger [4]. The prevention and/or early diagnosis of diabetes is of high importance in order to avoid or mitigate the serious lifetime complications including cardiovascular ailment, stroke, kidney failure, ulcers in the foot, and eye complications etc [5], [6]. In conventional healthcare, the patient demographic data, case history, diagnostics and medication are manually managed and maintained, which may lead to human errors and affect patients suffering from chronic diseases. It is known that, VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ diabetes patients need to check their glucose level regularly or even continuously to make sure that their lifestyle (i.e., diet and physical activity) is the appropriate one to keep glucose levels under control. There are many such medical devices that facilitate the measuring of glucose levels from the patients themselves. Yet, the recent technological advances in networking, namely mobile communications (e.g., 5G and beyond networking), Cloud Computing, Internet-of-Things (IoT), Artificial Intelligence (AI) and Machine Learning have increased the number of internet-connected smart devices, such as wearable sensors, and revolutionized the way the medical industry operates. In fact, they paved the way to robust, fast and smart systems, known as Internet of Medical Things (IoMT), able to handle massive users data rapidly. IoMT with smart sensors, smart devices and smart communication protocols facilitated the development of various smart systems in the field of healthcare [7]- [9]. Such systems have become essential as they are expected to eliminate human intervention, thus significantly reducing human errors and assisting medical experts in diagnosing the diseases easily, remotely and accurately, by combining various data collected from the monitoring devices over a sensor network with a decision support system. In [10], authors conducted an extended literature review in different domains, such as clinical decision support systems, wireless body area networks, cloud computing and big data analytics, in which they identified a positive impact in mobile healthcare for diabetes mellitus. Recently, in [11], a smart healthcare framework for ambient assisted living using IoMT and big data analytics techniques was suggested.
In the special case of diabetes, smart devices measure the glucose level of the patients and make it available in real-time to the doctors through mobile or web applications. Authors in [12] suggest a personalized recommendation system to support diabetes management by the American Indians patients themselves. Some other remote monitoring systems for diabetic patients are mentioned in [13]. T2DM and other chronic diseases monitoring can be enhanced with the implementation of appropriate machine learning algorithms. Machine learning and data mining methods constitute key approach in T2DM research for extracting knowledge. The severe social impact of T2DM renders it one of the main priorities in medical research, which unavoidably generates huge amounts of data. Hence, predictive analytics, machine learning and data mining approaches in T2DM are of major concern when it comes to diagnosis, management and other related clinical aspects.
Machine learning approaches can be categorized as supervised, semi-supervised and unsupervised learning. In the context of this work, our focus is on supervised machine learning methods with the aim to predict the risk of T2DM. Supervised ML algorithms, and especially classification algorithms, use a two-stage methodology for the pattern recognition task. The first stage is dedicated to the development/construction of the model using existing labeled training datasets, while the second stage involves the prediction for new or unseen input datasets. During the training phase, the annotated dataset, for which both the inputs (features) and the outputs (classes) are known, is partitioned into two sets (training and test), with the model being trained on the training set and tested on the test set, and the performance of the model being evaluated based on the correct predictions made.
Predictive analytics [14], [15] is the process of learning from historical data in order to make predictions about future events. It is widely applicable to almost every domain, and enhanced by the increasing availability of large volumes of data. Statistical data analysis methods were the go-to choice in predictive analytics, but when it comes to pattern recognition in large data sets (e.g. dense time series), they are consistently outperformed by ML algorithms, both in terms of accuracy and scalability.
The individual risk of developing non-contiguous chronic conditions is linked to controllable lifestyle behaviour. The quantification of said risk is an important goal of prediction analysis in healthcare [16], since, not only is it linked to both the long-term wellbeing of the individual, but is also beneficial to social care systems. Recent research [17], [18] has demonstrated that it is possible to use ML tools to predict individual risk of hospitalization by only using data related to socioeconomic features (age group, gender and race) and behavioural data, without requiring clinical risk factors [19]. An extremely large number of ML algorithms and variations exist, and there is no unique or widely applicable solution for a specific domain or problem. As such, each particular problem and prediction task requires performance evaluation of multiple algorithms in order to identify the best performing one [20].
Given that T2DM is a multifactorial chronic condition, it requires adjustments in multiple aspects of a person's daily life in order to prevent it. For instance, alterations in dietary habits and physical activity might be deemed necessary, depending on their personal data. A person's motivation is important for the engagement and success of a digital health personal intervention. It is highly unlikely that people, who are used to a sedentary lifestyle, will suddenly adhere to guidelines regarding physical activity and dietary restrictions, even if the digital health intervention systems dictates it. Also, people, who do not need or want to change real-life behaviour, will not use any application as intended. Therefore, the motivation of the individual to be healthy, during and outside working hours, is very relevant for SmartWork System implementation. Previous studies performed in the context of the SmartWork project were focused on assessing individual/group motivation to be healthy (e.g. in the physical activity domain) and various factors impacting on office worker's performance (e.g. sleep quality) [21], which are out of the scope of the current work.
Motivated by the aforementioned challenges, the main contributions of this work are summarized as follows: implement long-term predictive models and data mining techniques to provide probabilistic prediction of specific risk indicators aiming at supporting decision making and intervention for T2DM, among other chronic conditions. A detailed description of the functional ability modeling components and rules manager is elaborated in Section 3.
• Although a multitude of potential prediction tasks for several chronic diseases have been elaborated in the system, the analysis here only concerns the long-term T2DM risk prediction. For this case, various ML algorithms are investigated for the selection of the best performing model to be integrated in the SmartWork system. In the scope of training the SmartWork prediction models about T2DM (and other chronic diseases), a subset of the ELSA longitudinal dataset is employed to train the supervised algorithms for the assessment of T2DM long-term risks. It is worth to mention that, the generated dataset may contribute to the prognosis of T2DM as we choose to monitor the features' values of users who, in reference waves, have not been diagnosed with diabetes. Note that, the diabetic or non-diabetic class label is indicated by the follow-up assessment after 2-years, as it is explained in Section 3.1.2.
• A comparative analysis of the trained models is performed in relation to different performance metrics such as AUC, Sensitivity (or Recall) and Specificity, to name a few. Remark that, the sensitivity of the model is quite important when comparing classification models, as in T2DM case indicates the percentage of correctly identified instances of diabetic class.
The remaining of this paper is structured as follows.
In Section II, we overview previous related studies.
In Section III, we introduce the proposed system architecture.
In Section IV, the design of the T2DM risk assessment system is described in detail. In Section V, the system performance is evaluated. Finally, concluding remarks and plans for our future work are provided in Sections VI and VII.

II. RELATED WORK
As regards the T2DM risk prediction, there are several representative works about the application of ML techniques and moreover suggestions of derived risk scoring systems that can be adopted on the early prognosis of diabetes. Furthermore, a number of intelligent systems have been developed that enable the remote (continuous) monitoring for diabetic patients, risk prediction and personalized health services, based on the data collected from smart body sensors which are given as input to ML models.

A. RISK SCORING AND MACHINE LEARNING IN T2DM
Up to date, an extensive research has been conducted from the scientific community for diabetes detection. To this end, several non-invasive risk score systems have been proposed, such as FINDRISC, Latin America FINDRISC (LA-FINDRISC) [22], Australian Type 2 Diabetes Risk Assessment Tool (AUSDRISK) [23], Risk Test from American Diabetes Association (ADA) [24], Leicester Practice Risk Score [25], Test2Prevent , which proved to be an effective screening tool to assess the risk of undiagnosed T2DM, especially in cases where confirmation tests data are not available. However, a significant constraint is that most of them were developed for particular populations and their performance was not satisfactory when applied to other ones. Assuming that fasting plasma glucose (FPG) or hemoglobin A1C (hbA1c) testing or an oral glucose tolerance test (OGTT) data is available, the diagnostic accuracy of the aforementioned risk score systems can be verified [26]. Liu et al. in [18] showed that the risk scoring systems can be combined with other ML models, constructing ensemble learners, to improve prediction performance. Machine learning methods have gained popularity in the research community for automating the risk prediction process of T2DM, more accurately and with reduced medical cost. Artificial neural networks (ANNs), Logistic Regression (LR), Naive Bayes (NB), k-Nearest Neighbours (k-NN), Random Forests (RFs), Decision Trees (DT), and Support Vector Machines (SVMs), [27], [28] are the most popular algorithms which can be utilized. Naz and Ahuja, in their work [29], explore several of these models on the PIMA Indians diabetes database, proposing a deep neural network (DNN) able to achieve an accuracy of 98.07%. The classifiers can be used either individually or as base classifiers for ensemble (namely, stacking, voting, bagging etc.) algorithms [30], [31]. Ensemble learning aims to reduce bias and variance, and thus, enhance the prediction performance.
The aforementioned models have been used in several decision support systems for medical applications demonstrating satisfactory predictive performance. The researchers, in order to automate in an intelligent and effective way the process of diabetes monitoring, resorted to solutions combining Information and Communication Technology (ICT) with biomedicine. Such solutions are presented in the following paragraphs.

B. SMART SYSTEMS IN DIABETES HEALTHCARE
In [32], an intelligent system consisting of smart devices and sensors, and smartphones for monitoring diabetic patients, by means of machine learning algorithms, is elaborated. The smart system collects data from body sensors and makes diabetes diagnosis using several classification models from supervised machine learning. As the experimental results show, the suggested algorithm, namely the sequential minimal optimization (SMO), behaves better in terms of classification accuracy, sensitivity and precision than other well-known algorithms, i.e., Naive Bayes, J48 [33], ZeroR, OneR, Logistic, Random Forests). Another intelligent system is suggested in [34] for the remote monitoring of diabetic patients health through smartphones and other smart portable devices. They designed a small portable device capable of measuring the blood glucose level for diabetics and body temperature which could be connected with a smartphone through a secure wireless mechanism.
Also, in [35] a smart health monitoring architecture is recommended for diabetic patients to monitor symptoms/signs regarding blood sugar level, heart pulse, food intake, sleep time and exercise. A sensors network is feeding continuously the input of the system with data which are then utilized as input to a neural network. The health risk levels range from low, medium and high to extreme, depending on patient's profile and health historical data. Moreover, if a patient's health status is at high or extreme risk, an automatic notification (such as, phone call and/or SMS) is being sent to his/her relative with information about his/her location. Besides, in case of very high risk, the system communicates with the nearest to patient hospital.
The scientific work in [36] suggests several new wearable devices, such as smart neck band, smart wrist band and a pair of smart socks -to continuously monitor the health status of diabetic patients. The sensors of these devices report patient's food intake, heart rate, skin moisture, ambient temperature, walking patterns and weight gain/loss. With the help of controllers, these devices transmit sensors data via Bluetooth to the Mobile App. Machine Learning is employed to predict the variations in patient health status and alert them.
Moreover, there are many proposals for remote health monitoring of older persons [37]- [39]. Understanding and improving age-friendly living and working environments is an enormous challenge that today's societies have only just begun to approach. As the number of older people who are active members of society and want to live independently continues to rise, the importance of this research area constantly increases. The overall objective of the SmartWork system [40] is to support office workers remain professionally active as they grow old, in a holistic way, by designing, implementing and validating the system in real-world settings.

III. THE SMARTWORK
In the core development of the system, a worker-centric AI module [41] supports the sustainability of work skills, combining unpretentious and ubiquitous sensing and flexible worker-status aware job support. In addition, the careful and systematic monitoring of personal health, lifestyle, cognitive and emotional state of the worker makes it possible to determine the likelihood of functional and cognitive decline. By combining all aspects of the older workers' profile, a decision support system will enable triggering personalized interventions in order to maintain the work ability of the user. More specifically, the automatic creation and maintenance of the personalized virtual user model considers adaptation levels that consist of two layers: initialization of the user profiles based on generic group modelling derived through the observation of common patterns and characteristics of populations (e.g. gender, age group chronic conditions), and personalized models based on the monitored characteristics of a specific user (e.g. stress, emotions, activity, nutrition etc.). Based on the synchronous and asynchronous analysis of the data collected by the Smartwork sensing system, the initial user profiles are evolving to personalized user models.
It should be pointed out, here, that the problem of interest in this work is to emphasize on Long-Term Health Risk Assessment related to diabetes that statistically affects people older than fifty, which may suffer from hypertension, high cholesterol or heart disease as well.

A. SYSTEM ARCHITECTURE
Considering that the whole system is dynamically capturing the evolving state of the worker and the context of work and working environment (e.g. work task resources requirements), the office worker profile aspects are constantly monitored and analyzed using various services and agents. In more detail, the AI software tools package consists of a set of modules ( Figure 1) dedicated to initialize the first user profiles, match them to lifestyle and behavioral patterns, continuously monitor, self-adapt and trigger interventions relevant for the work and health self-management of the office worker. In the following paragraphs, we will briefly elaborate on the different modules whose results are fed on to the module that performs Long-Term Health Risk Assessment in order to derive a predictive score reflecting the overall risk of the individual to experience the T2D chronic disease, which may result in early exit from the market labor.
The User Profile Initialization process takes place at the user's first contact with the SmartWork system, and it concerns collection of data regarding socio-demographic characteristics and lifestyle attitudes of the user, such as age, gender, marital status, education level, physical activity frequency, drinking and smoking status, etc. The user's history of diagnosed chronic conditions, including diabetes, asthma, high blood pressure, cholesterol and cardiovascular diseases is also assessed. Once the profile is completed, based on this initial data provided by the user, the prediction models are used to initialize the Long-term Cognitive Capacity Assessment and the Long-Term Health Risks Assessment modules.
Another important module is the Rules Manager Service (RMS), which is the software package implementing the different sub-modules needed in order to systematically monitor and activate the triggering of the SmartWork interventions in respect to the primitive or derived virtual user model data. The SmartWork continuously monitors a wide range of variables regarding the users' lifestyle, functional, cognitive and work ability status, which represent input for the RMS, either in the form of original raw data or as processed information generated by the SmartWork pre-processing algorithms, statistical analysis tools and ML-based prediction models. Although a series of physiological parameters are monitored, which are related to user's health status, it is important to mention that the SmartWork does not aim to provide any diagnostics, treatment or cure, but rather aims to provide the user with advice, guidance and suggestions that can lead behavioural changes aiming to improve his/her overall health and work ability in alignment with the principles of professionally active and healthy ageing. The basic sub-modules of the RMS are the Rules Manager Daemon, the Run-Time Expression Evaluator and the Rules Manager Control Interface. The Rules Manager Daemon (RMD) is the main micro-service around which the RMS is designed. In practice, the RMD acts as an integrated server that orchestrates the real-time monitoring and evaluation of the user model variables against specific rules in order to identify the accomplishment of conditions that may trigger associated interventions. At the core of the RMD micro-service algorithm, the run-time Expression Evaluator performs logical and arithmetic operations dynamically based on the virtual user profile variables, thus evaluating the accomplishment of triggering conditions in the defined rules, and providing the RMS with a higher level of abstraction and the ability to evaluate complex expressions based on the available input variables.
The Rules Manager Control Interface (RMCI) is a web application designed to provide a convenient solution for the generation and management of intervention triggering rule sets which are then passed to RMD micro-service to populate the Rules Table. It is a multi-user environment, able to administer different user privilege levels that can have specific access on each virtual user profile dynamic rule set settings. The RMS has a client-server architecture and the RMCI was built as a stand-alone client application which can be used by the end users through a web browser or as a desktop application.
The next sections provide the necessary background knowledge for the remainder of the paper. In following, useful definitions and notations will be recorded under the problem definition and formulation, with the most characteristic being the dataset preparation and Machine Learning components under the investigating issue.

IV. LONG-TERM DIABETES RISK ASSESSMENT A. PROBLEM DEFINITION
Chronic diseases are diseases that cannot be cured but can be controlled and thus they require continuous monitoring and acute care to avoid critical conditions. Diabetes is a chronic disease that occurs when the pancreas is no longer able to produce insulin, or when the body cannot make good use of the insulin it produces. Insulin is a hormone that lets glucose from the consumed food pass from the blood stream into the cells to produce energy. Not being able to produce insulin or use it effectively leads to raised glucose levels in the blood, also known as hyperglycemia. Over the long-term, high glucose levels are associated with damage to the body and failure of various organs and tissues. Although there is more than one type of diabetes (e.g. type 1 diabetes, type 2 diabetes, gestational diabetes), prevalence of type 2 diabetes amongst the older people is particularly high overall and in comparison with prevalence of other types of diabetes [42]. T2DM usually affects adults, but it can begin at any time in people life. The main risk factors [43]- [45] that are correlated to the occurrence of T2DM include: VOLUME 9, 2021 • Age: is one of the most important risk factors for diabetes, as older people have a higher risk to get type 2 diabetes.
• Obesity/ High Body Mass Index (BMI): increased BMI, and consequently obesity, is a top risk factor for type 2 diabetes.
• Impaired glucose tolerance, also known as prediabetes, is a milder form of type 2 diabetes, which is usually diagnosed with a simple blood test, and represents a high risk for the individual to develop T2DM.
• Ethnicity/Race: prevalence of diabetes is overall higher in the case of Hispanic/Latino Americans, African Americans, Native Americans, Asian-Americans, Pacific Islanders, and Alaska natives.
• Gender: male/female • Gestational diabetes: this short-term condition that may occur during pregnancy, raises a women's chances of getting type 2 diabetes later in life.
• Polycystic ovary syndrome (PCOS): women with polycystic ovary syndrome have a higher risk to develop T2DM.
• Family history: if a parent/sibling has diabetes, then risk of getting type 2 diabetes is increased.
• Physical Activity: sedentary persons are at higher risk of developing T2DM.
• Smoking: smoking is associated with a higher risk of T2DM.
• High Blood Pressure (HBP): it is a high-risk factor for developing T2DM.
• Alcohol: although moderate drinking is associated with a lower risk of, excessive alcohol intake is associated with an increased risk of type 2 diabetes.
Many studies aimed at long-term risk prediction for diabetes, including also different regression models for predicting glucose regulation for those already diagnosed with prediabetes or type 2 diabetes. However, the main goal of long-term diabetes risk prediction tools is to develop and validate a diabetes risk assessment score for healthy/undiagnosed participants based on main risk factors, including socio/demographic data, lifestyle, and simple anthropometric measures.
In SmartWork, a long-term risk prediction model for T2DM based on ML approaches is implemented, which takes into account a large number of risk factors which are usually employed by the screening tools used in medical practice, but also some factors which have shown high correlation based on our study with the ELSA dataset as shown in Tables 1 and 9. In order to test our model, we selected the FINDRISC [46], Leicester [25] Diabetes Risk Scores to apply it in parallel to the training and test dataset. The Leicester Practice Risk Score was developed by researchers within the Diabetes Research Centre at the University of Leicester and the score identifies people who may be at high risk of developing diabetes in the future (e.g. next 10 years) or currently having undiagnosed T2DM or prediabetes, taking into account the following risk factors: age, gender, BMI, ethnicity, family history of diabetes and diagnosis of high blood pressure or anti-hypertensive drugs use. In order to compare the results of the FINDRISC and Leicester risk classification to the ML prediction models, we fit Logistic Regression models to our data and estimate the probability an instance to be classified as ''Diabetics'' or ''Yes'' and ''Non Diabetics'' or ''No''.
The English Longitudinal Study of Ageing [47], which is a rich resource of information on the dynamics of health, social wellbeing and economic circumstances in the English population aged 50 and older, has currently reached wave 9 of longitudinal data collection (e.g. covering a period of 18 years) and it is designed to be used for the investigation of a broad set of topics relevant to understanding the ageing process. The database contains both objective and subjective data related to health, disability, and healthy life expectancy, with specific data being assessed by a nurse every four years. In the scope of training the Smartwork prediction models, the waves at which nurse collected data are available are of particular interest, as these include physical examination and performance data and blood tests (e.g. height and weight, waist and hip circumference, blood pressure, lung function, total and DHL-cholesterol, etc.). Note that, these waves are considered reference waves in Smartwork.

B. METHODS
We assume a training set TR of size M , a test set TS of size N and a categorical variable c which captures the class label of an instance i in ELSA Database. Under the investigating problem, it has two possible states, e.g., c = ''Diabetic'' or ''1'' or c = ''NonDiabetic'' or ''0''. The features vector of an instance i is denoted as Our aim is to achieve high sensitivity and Area Under Curve through the supervised machine learning, meaning that the Diabetic class can be predicted correctly. The proposed methodology for T2DM prediction consists of the following steps which are explained in detail below.

1) DATA PREPROCESSING
The raw data quality may be degraded either due to missing values and/or noisy and inconsistent data, so the final results-predictions quality may be low as well. Therefore, is necessary, processing, including redundant values reduction, feature selection and discretization of data to make it more appropriate for data mining and analysis.
In the proposed framework, missing or null values were dropped, rather than imputed by the mean values of the attributes as in [48], only for the specific features that are used for the fitting of FINDRISC [46] and Leicester [25] risk inspired models (see Section V), since it is impossible for the logistic regression to reasonably deal with missing values. However, in case of ML all of the rest of the selected features were considered as is, given that the missing values can be handled by them.
Also, data is not always in appropriate form to be fed into a machine learning algorithm, e.g. plain text feature values may cause problems during the learning process, or data may be represented in different scales. Hence, feature transformation from one format to another is necessary. Some relevant techniques include the standardization or Z-score normalization which re-scales the attributes for achieving standard normal distribution with zero mean and unit variance. Also, in this research work, several categorical and ordinal features are considered, further details concerning the ordering of the categories and the discretized values for each one are shown in Table 9. Also, another reason for applying features transformation is to reduce the dimension of the features to boost the training stage or improve the accuracy of a specific ML model.

2) FEATURE SELECTION
It is common knowledge that, the accuracy of the classifiers improves with the increase of the attributes dimension until the optimal number of features is reached. Adding more features on the same sized training dataset can often lead to classifier performance degradation, which is known as the curse of dimensionality. Ultimately, this indicates that the number of samples an ML model needs to achieve a given level of accuracy should grow exponentially with respect to the number of input features (i.e., dimensionality) to avoid overfitting (inability to generalize). Feature selection constitutes a core component in building accurate and reliable prediction models in machine learning, as it can highly impact the training of the selected model and thus, its performance. Feature selection is defined as the process of identifying the most relevant features in a dataset. This way the most significant or relevant ones are considered, namely, these ones which contribute much to the target variable, with the aim to improve or boost the model accuracy. Such methods can be classified as Filter, Wrapper and Embedded [49].
The Filter category includes information gain, chi-square test, fisher score, correlation coefficient and variance threshold. Among the traditional state-of-art filter methods, Pearson coefficient was selected. Its values vary between −1 (higher negative correlation) and 1 (higher positive correlation) that indicate the linear dependency between two features. Hence, if coefficient value is closer to 0 implies weaker correlation, while zero coefficient value implies no correlation. Pearson coefficient [50], denoted as p c , is defined as: where f im , f infim ,f in denote features m, n and mean values of them on dataset, respectively.
The feature selection depends on user defined threshold value about p c . For example, in diabetes case, haemoglobin help clinicians to estimate the average blood sugar levels over a period of weeks or months thus, p c is expected to be close to 1, implying that it is highly correlated with blood glucose.
From the Wrapper feature selection methods, a simple and often used is the forward/backward stepwise selection [51]. The former refers to a search that begins with an empty set of features and which are added one by one, while the latter works conversely, i.e., it begins with all features which are removed gradually, one by one. From the Wrapper methods, stepwise backward with Naive Bayes, Logistic Regression and Decision Tree ML models were investigated. Although it is more accurate than the Filter methods, it is computationally expensive, since it applies an iterative greedy search process.
Moreover, the Embedded methods include regularization based techniques with L1 regularization or LASSO (Least Absolute Shrinkage and Selection Operator) and L2 regularization or Ridge be the most representative. These methods have built-in penalization functions to reduce overfitting contrary to Ordinary Least Squares (OLS), which would overfit the data [52]. From the Embedded methods, in the experiments, LASSO method will be applied, due to its simplicity (lower complexity) and better interpretability than Ridge. Consider that, the aim of feature selection is not only to improve the accuracy, but also to increase the interpretability and reduce the complexity and training time of the ML model.
The LASSO or penalized least squares regression with L1-penalty function has the form of where y is the output (target) variable for the prediction, f 1 , f 2 , . . . , f F are the features that decides the value of y, a 0 is the bias, a 1 , a 2 , . . . , a F are the weights attached to f 1 , f 2 , . . . , f F , respectively and λ is the regularization parameter that controls the significance of the regularization term. The initial features, considered for the training of the ML-based models, included over 100 variables collected from those at the reference waves of ELSA dataset. Also, a group of variables related to the FINDRISC [53] and Leicester questionnaires were included such as, variables representing gender, age, race, physical activity (at least 30 min during the day), fruit and vegetable consumption as well as keeping a track of medical history including the history of antihypertensive drug treatment, history of high blood glucose levels. To evaluate the performance of ML models, feature importance was established using some of the feature selection techniques discussed in Section IV-B2. Moreover, Tables 1 and 9 describe the variables considered in the various feature selection methods.
The features in relation to LASSO, Correlation and Greedy Stepwise with Backward Selection under three different classifiers are listed below:  All selected ML models were trained with the same features (i.e., risk factors) derived from the GSW-NB feature selection method, excluding the irrelevant by the literature features fcntf and work. As the feature selection process is a highly empirical one, GSW-NB was selected as it shares the most common variables with the rest selection methods, which are also in line with the literature. In addition to these, we also included the variables shlt, hlthlm, mobilb, lgmusa, grossa, finea, hearte, psyche, itot, cfoodo1m, estwt, hdl, dias, eatVegFru and Gender as these capture risk factors or signs that are actually considered in diabetes detection by the literature. The resulting feature set was constructed by the above 34 features plus the ELSA derived class feature rYdiabe which indicates if a subject is actually diabetic.

3) MACHINE LEARNING MODELS
Let recall that, in the context of this work, we investigate the problem of T2DM prediction on ELSA database with various machine learning models. As a first approach, the problem is managed using single classifiers as independent entities. Then, ensemble learning based on majority voting, either weighted or not, and stacking is employed. All of them are compared in order to evaluate the appropriate one for diabetes prediction.
Some well-known classification methods, considered in this work, are Naïve Bayes, Decision Trees [33], Random Forests [54] and Logistic Regression [17], [55]. Finally two of them, with similar success according to AUC and Sensitivity metrics, are utilized as base-learners and their outputs are combined to define the final prediction score, adopting different ensemble learning approaches, namely majority voting, weighted voting and stacking. Here, it should be noted that, the key difference between voting and stacking lies in the final aggregation. Although in voting, appropriate weights are utilized to combine the classifiers predictions, in stacking the aggregation is performed by using a meta classifier. In following, useful information about the adopted models will be described.
a: SINGLE LEARNING 1) Naive Bayes: It is a simple but powerful algorithm for classification, since it is based on conditional probability. It is an appropriate solution for unbalanced data and missing values. It uses Bayes theorem to calculate the posterior probability [56] as: where c = 0, 1, P(c|f ) is the Posterior Probability, P(c) is the class Prior Probability, P(f |c) is the Likelihood and P(f ) is the Predictor Prior Probability. 2) Decision Trees: They build classification model in the form of tree structure by breaking dataset into smaller subsets and simultaneously developing the associated decision tree. The decision tree is a top-down structure with one root node, and it splits the branches which have parent-child relationship. The tree includes a root node, some leaf nodes that represent any classes and internal nodes representing test condition.

3) Random Forests: It constitutes a classification method
that creates many decision trees on different instances to perform prediction and regression. Each decision tree in RFs will export its own classification result and vote, and then the final output of the RFs will be the one that most trees agree. Moreover, it has a significant role in ensemble machine learning and is commonly applied in various research areas, such as bio-medicine. The final output is computed aŝ whereĈ stands for the final tree prediction; R is the total number of trees, r represents the index of the current decision tree and f is the training instance. 4) Logistic Regression: It is a classification algorithm, used for categorical variables in nature and especially when the output of the data is binary. The diabetes model has one binary output variable, in which p = P(Y = 1) denotes the probability an instance to belong in ''Diabetics'' class, so 1 − p = P(Y = 0) stands for the probability an instance to belong in ''Non Diabetics'' class. The linear relationship between log-odds with base b and model parameters β i is as follows:  ( Figure 2), can be outlined with the following equation: where c = 0, 1. The classification, based on majority voting, can be approached as either hard or soft voting. The former (hard voting) sums the predictions for each class label and predicts the class label with the most votes. The latter (soft voting) sums the predicted probabilities for each class label and predicts the class label with the largest probability. Here, soft voting is adopted. Nonetheless, since the base classifiers in an ensemble may not perform equally well, it would be more efficient to weight each classifier soft vote. As it will be seen next, weighted majority voting is compared with majority voting in terms T2DM long-term risk prediction. 2) Weighted Majority Voting: Given w 1 , w 2 , , . . . , w K , where w k ≥ 0 and w k ≤ 1 for i = 1, 2, . . . , K that represent the weight with which the corresponding classification model contributes to the final output, the final prediction class for each test instance is done based on the highest weighted soft votes.
where c = 0, 1 denotes the label of the corresponding class. The main issue in weighting schemes is how to appropriately determine the optimal weights of the classifiers, which can strongly influence the performance of the ensemble. In this study, the genetic algorithm NSGA-II for multi-objective optimization [57] is considered in order to determine the optimal weights and construct a prediction model with high both AUC and Sensitivity. 3) Stacking: It is an ensemble learning technique that employs multiple classification ML models and combines them in a meta-classifier. The base models are trained based on a complete training set, then, the meta-model is trained on the outputs of the base models as features. In the base level, different learning algorithms can be applied and, therefore, stacking ensembles are often heterogeneous. Such an approach is considered in this work. Specifically, the stacking ensemble will consist of Random Forests and Logistic Regression as base classifiers, whose predictions are combined by Random Forests as a meta-classifier.

A. TRAINING AND TEST DATASET
The training and test dataset for the T2DM risk prediction models is a subset of the ELSA database, which consists of reference waves 2, 4 and 6 as baseline and the respective waves 3, 5, and 7 for the 2-years follow-up assessment. Although the number of participants in ELSA waves selected as reference one (namely waves 2, 4, and 6) is very large, initially we drop out participants that already have diagnosed diabetes at reference waves and participants that did not take the interview at both, the reference and the corresponding follow-up wave. In Tables 2,3, the distributions of selected participants that satisfied the above criteria, per age group are presented. As shown in Table 2, the distributions of selected participants, however, correspond to an unbalanced dataset, as they do not relate to prevalence of diabetes for these age groups, as they have been reported at country level and at European level. The proportion of older people who have diabetes increases with age: 9% of people aged 45 to 54 have diabetes, but for over 75s the percentage increases to aproximately 24%. Taking into account these findings, we balanced the dataset using random undersampling [33] in order to reach a 9%, 12%, 15%, 18%, 21% and respectively 24% of participants with diabetes at the 2-years follow-up for the selected age groups.
The demographics and some health-related characteristics of the participants per age group and gender in the balanced dataset are summarized in Table 4. In addition, independent group t-tests were run wherever applicable, comparing the mean scores between the different groups. Of the 2009 participants, 53.4% were women of whom 13.8% identified as diabetic in the follow-up, the same indicator in males was 18.6%. Note that, 14.3% of participants had high education and just 11.8% had physical effort at work. Focusing on those who were diagnosed with diabetes in a follow-up wave, 29.2% are employed, 11.2% had physical effort at work, 79.8% stated that they were physically active and 64.0% were diagnosed with high blood glucose at least once. Moreover,  concerning diabetics and irrespective of gender, they had average BMI of 31.7 kg/m 2 and waist size of 106.46 cm. P-values showed that the difference between men and women was statistically significant at the level of 0.93 for age and 0.69 for BMI. Also, the statistical significance in terms of variables cholesterol, drinker and waist was at level of zero, 0.0022 and 0.001 for food consumption outside home and income variables, respectively. In comparison with nondiabetics, diabetics had higher overall means for age, BMI, waist and income characteristics, and the differences were significant at the 0 level for variables age, BMI, food outside home, cholesterol, drinker and waist except income.

B. T2DM MODELING AND RESULTS
The different single and ensemble classification models that were presented on the previous sections were compared in a series of exhaustive experiments in order to identify the most effective models regarding the classification of T2DM on the constructed dataset of 2009 instances as depicted in Table 3. Moreover, the comparisons included the four benchmark models, the logistic regression models based on the corresponding works of Leicester and FINDRISC score systems and two neural network models utilizing the architectures discussed in [29]. Furthermore, an optimized voting ensemble (WeightedVotingLRRFs) was also considered in the comparisons and is discussed on the last paragraphs of the section.
The experimentation methodology can be summarized by the following steps: • Data preprocessing as elaborated in Section IV-B. • Divide the constructed dataset based on ELSA database using the standard technique of stratified train-test split procedure with 10-times random repeat, thus preserving the class proportions of the original dataset and ensuring that the sub-datasets are representative (random samples) by the use of different seeds in the repeating process. The 70% and 30% of the data are chosen as training dataset and testing dataset each time.
• Application of the selected ML models, single and ensemble by either voting or stacking methods. These models use the selected features as independent variables and the diabetes risk status as output variable.
• Performance measures estimation. As regards the software tools that were employed for the implementation of the compared models, the Java Weka [58] library and the Python Statsmodels [59] were considered, as they are both open-source, making it possible to integrate VOLUME 9, 2021 the implemented models in the deployable solution in the context of the SmartWork project.
A number of measures are recorded for evaluating the performance of ML models. The most commonly used in literature [60]- [62] which will be considered as well in our analysis, are the following: Sensitivity (True Positive Rate) corresponds to the proportion of participants that have T2DM (e.g., positive data instances) that are correctly considered as positive, with respect to all positive participants.
Specificity (True Negative Rate) corresponds to the proportion of participants that don't have T2DM (e.g.,negative data instances) that are correctly considered as negative, with respect to all negative participants.
Positive Predictive Value (+PV) corresponds to the proportion of participants that have T2DM (e.g., true data instances) that are correctly considered as positive, with respect to all positively predicted participants.
Negative Predictive Value (−PV) corresponds to the proportion of participants that don't have T2DM (e.g., negative data instances) that are correctly considered as negative, with respect to all negatively predicted participants.
Positive Likelihood Ratio (+LR) is defined as the ratio of the true positive rate (sensitivity) to the false positive rate (1-specificity).

+LR =
Negative Likelihood Ratio (−LR) is defined as the ratio of the false negative rate (1-sensitivity) to the true negative rate (specificity). Another useful metric is Area Under Curve, which takes values in the range [0, 1]. The higher its value, the better is the ML model performance in distinguishing positive (Diabetics) from negative (Non Diabetics) class instances. In best (ideal) case where AUC equals 1, the ML model can perfectly distinguish all positive (Diabetic) from negative (Non Diabetic) class instances. In worst case where AUC equals 0, the classifier will predict all negatives as positives and vice versa. Also, the Younden Index was considered in combination with Receiver Operating Characteristic (ROC) analysis. This metric summarises the performance of a diagnostic test, it is defined for all points of a ROC curve, and its maximum value may be used for the selection of the optimum cut-off point.
The quantitative analysis of the two selected risk score systems showed that the best performing, according to AUC metric, is FindLogist with AUC equals 0.821 which proves that it performs better than LeicLogist with AUC 0.788 by 3.3% in the constructed dataset. Although the sensitivity and specificity of the selected risk score systems were not considerable better than others (Table 5), if combined with other existing ones, may improve the performance of the ensemble methods.
Moreover, the use of single classifiers and ensemble methods, such as voting and stacking, could overcome the limitations of risk score systems in order to build a single or combined reliable T2DM risk assessment system. Figure 3 and Table 5 summarize the performance metrics values for the diabetes prediction according to the adopted ML models described in Section IV-B3. Also, in the same figure and table, respectively, for the same metrics, the results of FindLogist and LeicLogist models have been recorded. Note again that, these systems apply Logistic Regression with specific features less than those considered in the ML models.
To further investigate the performance of the ML models, we compare the Youden indices and AUCs. The results unveiled that the selected voting methods performed not only the best but also considerably better than all the ML models and the two selected score systems. Among the different combination methods, the superiority of the two voting methods against stacking was revealed. Voting typically works well if the base classifiers perform the same task and have comparable success, although stacking works well for different types of first-level classifiers. A comparison of sensitivities and specificities for different ML models can be found in Table 5, while the exact hyperparameters of the models can be seen in 6.
The significance of the classifiers' AUCs was tested using the Wald test statistic [63]. In detail, the discrimination ability of each classifier is tested compared to a classifier with random chance discrimination ability (TPR = FPR i.e. AUC = 0.5). The utilized null hypothesis states that AUC = 0.5 and the alternative hypothesis that AUC = 0.5. The calculated p-values for all the models were equal to 0 (<0.05), thus clearly indicating that the calculated AUCs are significant using a level a = 0.05, with the lower AUC recorded being equal to 0.727.
Additionally, the receiver operating characteristic (ROC) curves for the ML models and the score systems are summarized in Figure 3. Focusing on the combination methods, we again conclude that the voting algorithms with the selected single models produced again the best performance (prediction result) against stacking method. Here, it should be pointed out that, the ROC curves produced by the voting algorithms are similar and are also positioned above the rest model curves.
As the results witness, Random Forests classifier is the best performing among the rest single classifiers with Logistic Regression's performance being closer, than the rest models. This lies in the fact that the Random Forests can learn a non-linear decision boundary and thus can achieve higher scores in all metrics. In other words, Logistic Regression poorly segments the Diabetes and No Diabetes classes while the Random Forests model learns a more flexible decision boundary for the discrimination of instances of the two classes [64].
Among the three different ensembling approaches, the weighted voting scheme boosts the performance of diabetes prediction. The optimal weights are calculated by running the NSGA-II algorithm on the constructed dataset. The optimization procedure aims to maximize both AUC and Sensitivity. The relevant Pareto Front behavior is depicted in Table 7. Note that the sensitivities reported in the first column of Table 7 were significantly lower than the final reported in the inductive results table due to the fact that the Youden criterion was not utilized during the optimization process, and the default cut-off point of 0.5 probability was set. All the weight sets were applied in the inductive VOLUME 9, 2021 experimentation setup of WeightedVotingLRRFs using the Youden optimal cut-off criterion (displayed in the last two columns of Table 7) and the weight set of [0.2733, 0.7266] was found to yield the best performance results in terms of AUC and Sensitivity, thus its performance was recorded in Table 5.
A more focused graphic analysis of the different evaluation metrics for WeightedVotingLRRFs is found in Figure 4, where its ROC curve, Sensitivity-Specificity and Distribution graphs are presented. In the first graph, the specific Youden optimal cut-off point is located on the ROC curve. In the second graph, the sensitivity and specificity curves are depicted showing the tradeoff for the different selections of cut-off points. The next two graphs, give a good overview of how well the Youden optimal cut-off point of 0.193 separates the two classes.
In addition to the inductive experiments, transductive learning [65] experiments were also employed. The aim of this learning approach is to exploit patterns that are hidden in the test samples by utilizing them as unlabeled data in the training phase, thus taking advantage of the information embedded in the test set by augmenting the training set [66], [67]. During the transductive experimentation, the partitioning of the dataset was kept the same as in the inductive experimentation, while the unlabeled set was used under a common self-training wrapper algorithm using the different prediction models that were compared in the inductive experiments. The performance results are summarized in Table 8 and Figure 5, while the exact parameters utilized in the self-training scheme can be found in Table 6. Similarly with the work of Triguero et al. [68], the transductive self-training wrapper uses as base classifiers the compared models which  are initially trained using the labeled set and are then used to predict the labels of the unlabeled set in order to repeatedly increase the labeled set, while in each iteration the base model is being retrained. A confidence probability threshold of 0.90 for the predicted labels is set to ensure that less confident predictions are not integrated in the retraining of the base model, and moreover the maximum iterations of the self-training scheme are limited to 10. By comparing the transductive AUCs against their inductive equivalents, it is concluded that the logistic models, while do not significantly decrease their performance, they gain no benefit from the exploitation of the unlabeled data. The same stands true for the single classifiers i.e. NB, DT and ANN. In contrast, the more complex models such as the RFs, DNN and the rest ensembles marginally improve their classification performance. Specifically, the proposed WeightedVotingLRRFs model scores an AUC transductive = 0.888 which is the highest that was recorded, suggesting that strict selection of unlabeled data (due to voting) can lead to possible performance increase of the model.

VI. DISCUSSION
In this research, several strengths and limitations are highlighted. In terms of the former, to our knowledge, it is the first to assess various ML models and provide participants with personalized long-term risk prediction of T2DM occurrence and appropriate guidance regarding lifestyle interventions. Also, the research findings were derived from a cross-sectional study on a representative English cohort (e.g., elderly office workers) with follow-up data; thus, we may identify causal and temporal associations between elderly lifestyle and T2DM.
Another positive aspect of this work is that, during the balanced dataset creation, we drew instances of the initially ''Non-Diabetics'' class from the reference waves, whose class label was finally defined in the follow-up waves. This approach may give us a view of features behaviour for participants diagnosed with T2DM in the follow-up examination, contributing to T2DM prognosis. Moreover, our study revealed the importance of different risk factors in T2DM prediction for elder persons. The results of feature selection techniques coincided with the corresponding literature about T2DM risk factors. The selected features for the ML models training and testing are among the symptoms/factors that doctors consider for quantifying long-term risk prediction or identifying its occurrence.  Featurewise, all models were trained using the selected 35 features as described in section IV-B2 except the LeicLogist and FindLogist models. Those two models were fitted using the constructed dataset based on the feature sets according to the original Leicester and FINDRISC score systems, excluding the feature that considers the family history of diabetes as it was not available in the ELSA database. Both logistic models where significant at a level of 0.05 and their analysis (supplementary Figures S1 and S2) confirmed that almost all the features from the original research works were still significant on the constructed dataset. Unlike existing researches [69], [70], for the training of the ML models, family history of diabetes and women with gestational diabetes were excluded from the features set. This may be a limitation of this study since these factors are among the important ones for T2DM risk prediction. Nevertheless, they were not available in the current dataset.
Moreover, contrary to previous works of [48], [71], [72], which use the Pima Indian Diabetes Dataset (PIDD) as benchmark dataset for their experiments, in this study the ELSA dataset is utilized, consisting of elder office workers' data. Furthermore, Perveen et al. [33] examined the Canadian Primary Care Sentinel Surveillance Network (CPCSSN), while Dalakleidi et al. [72] evaluated the suggested models on Hippokration dataset, which was granted from the General Hippokrateion Hospital of Athens.
As far as classification is concerned, k-NN, Decision Trees, Random Forests, Naive Bayes [73], ANN and DNN [74] are the most frequently applied for long-term risk prediction of T2DM. The ANN and DNN topologies presented in [29] were kept identical in order to draw useful comparison results regarding the performance of neural networks on the constructed dataset, with the exception of the insertion of dropout [75] in the DNN topology to reduce overfitting. The results were promising for the DNN model in both the experimentation setups, but were still lacking an approximate 3.7% in terms of AUC inductive due to significant underperformance in terms of specificity. Considering the performance results of the LeicLogist and FindLogist, the compared metrics suggest similar predictive ability with the rest single classifiers (i.e. NB, DT, ANN). Although, LeicLogist and FindLogist are based on logistic regression, they present far lower AUCs than the LR model trained using the 35 features, thus strengthening the argument that a more personalized approach on the T2DM modeling and prediction can be significantly better.
More to the point, in contrast with [48], Adaptive Boosting (AdaBoost) and Extreme Gradient Boosting (XGBoost) are left for future experimentation on the constructed dataset. Also, in [48], the weighted ensembling of different ML models is proposed where AUC is maximized during hyperparameter tuning using the grid search technique. However, in our analysis, a bi-objective genetic algorithm is applied;  the optimal weights are estimated to maximize AUC and Sensitivity of the ML based models simultaneously, under the weigthed voting ensemble. To identify the best performing model, different performance metrics such as sensitivity, specificity and the receiver operating curves were analysed. The proposed WeightedVotingLRRFs model provides a mechanism of more confident prediction probabilities due to the ensembling of its base models. It is known that an ensemble, such as the proposed, can produce steadily better predictive results than its counterparts under the condition that its base classifiers are accurate and diverse [76]. Both conditions hold true for the proposed model, while the experimentation results validate the assumption of increased predictive ability for the WeightedVotingLRRFs.
To our knowledge, it is the first paper to assess T2DM risk prediction on English cohort (namely, elder office workers and T2DM) from ELSA database. There is a lack of studies to fairly compare it with the previous research, in terms of ML models performance. Previous works in the same dataset mainly focus on diabetes risk factors analysis. Specifically, in [77], the authors found that T2DM diagnosis in older adults did not motivate them changing their health behaviour, other than smoking. Moreover, Hacket et al. in [78] demonstrated associations between sleep problems and daily cortisol levels in response to stress in a part of people with T2DM from ELSA. Moreover, the study in [79], aimed to build a predictive model using RFs, Deep learning and Linear models to accurately estimate health status based on sociodemographic characteristics, in aging populations using data from the ELSA database.
At this point, a limitation of this study is that the experiments have been conducted with a fixed size dataset consisted of a limited number of subjects amount to 2009, as shown in Table 3. It is worth noting that the performance of a ML model improves as the number of training samples increases, as was also observed by the transductive experimentation on the current dataset. To tackle this limitation, we aim to conduct similar research from a big data viewpoint focusing on more and different ML models, evaluating the impact of data volume on their performance in terms of T2DM risk prediction.

VII. CONCLUSION
In this study, we applied different ensemble algorithms to a dataset constructed based on the ELSA database, combining different families of ML models to predict the risk of T2DM, taking into account lifestyle variables of elder office workers. Moreover, an IoT enabled framework [80] was developed that integrates the long-term T2DM risk prediction model. It aims to provide personalized interventions according to the user's needs. Our empirical study showed that all investigated ML algorithms could produce satisfactory prediction results that are at significantly better than the existing simple score systems. In particular, the voting method could significantly increase the predictability in relation to any conventional risk score system.
It is worth to note that, we chose a multi-objective optimization based technique since it is more robust compared to the single objective one and constructs more efficiently the classifier ensemble (WeightedVotingLRRFs), as it optimizes more than one classification quality measures i.e. AUC and Sensitivity simultaneously, resulting in the highest compared AUC inductive = 0.884.
To sum up, according to our experimental analysis and results, ensemble methods constitute a useful tool for predicting type 2 diabetes. Overall performance attained by the investigated techniques shows the effectiveness and superiority of the multi-objective optimization based, weighted voting ensemble method in relation to single classifiers and risk score systems, while the better learning ability of Weighted-VotingLRRFs against its rivals was observed using inductive and transductive learning setups. Hence, embedding it in the recommended system, lifestyle or medication interventions can be implemented to participants at high risk in order to prevent and/or delay diabetes occurrence.
As future work, at first, it would be beneficial to apply different techniques for handling of missing values such as imputation using IRSSI [81] and experiment with more feature selection techniques. Moreover, it would be interesting to evaluate the impact of dimensionality reduction with techniques such as principal component analysis [82] in T2DM prediction performance under the ELSA-based constructed dataset. In addition, the comparison of state of the art techniques such as XGBoost, AdaBoost or high layer DNNs would probably provide better insights regarding the predictive limitations of the constructed dataset. Finally, the exploitation of semi-supervised and unsupserivsed methodologies in the training process could also be proven beneficial, as was also suggested by the AUC improvements observed during the transductive experimentation. The previous argument is strengthened by taking into account that there are plenty of unlabeled instances in ELSA that could be incorporated in the constructed dataset. During the latest years, he has been the coauthor of more than 240 articles in refereed journals, edited books, and international conferences. His main research interests include virtual, augmented, and mixed reality, 3-D geometry processing, haptics, virtual physiological human modeling, information visualization, physics-based simulations, computational geometry, computer vision, and stereoscopic image processing. He serves as a regular reviewer for several technical journals and has participated in more than 23 research and development projects funded by the EC and the Greek Secretariat of Research and Technology. He has also been a member of the organizing committee of several international conferences. He is a senior member of the IEEE Computer Society and a member of Eurographics. His research work has received several awards. He was the Coordinator of the GameCar H2020 Project and the Scientific Coordinator of the NoTremor FP7 Project. VOLUME 9, 2021