A non-invasive approach for total cholesterol level prediction using machine learning

Artificial intelligence techniques have been increasingly applied in healthcare to help in many areas, from assisting clinical diagnoses to preventing diseases. In this paper, a machine learning approach to predict cholesterol levels using non-invasive and easy-to-collect data is presented. Specifically, it uses clinical and anthropometric data gathered by nutritionists during weight loss intervention (dieting) periods. The prediction power analysis of different patient variables is aimed at improving both non-invasive diagnosis quality and screening of associated diseases. Moreover, a clustering analysis has been carried out to identify different groupings of patients that might share some characteristics that have so far remained inconspicuous but might contain a valuable diagnosis or prognosis information for clinical experts.The experiments show a mean absolute percentage error rate (MAPE) of 4.39% in cholesterol estimation via regression, as well as clustering of patients within four profiles in which variable values share commonalities among cluster members.


I. INTRODUCTION
Finding correlations between anthropometric measurements (AMs) and laboratory findings is of great interest in the medical field [1], as it would lead to less invasive means of patient exploration. Examples of this can be found in the literature: from atherogenic markers [2], or diabetes assessment [3], or cardiovascular risk [4]. In some occasions AMs can correlate to other AMs of patients bodies, which is useful for weight and height estimation from other measures, and helps in dose assessment for ICU patients [5].
Using a novel dataset including patient data from multimodal sources (anthropometric measurements, as well as body sampling, etc.) the aim of this paper is to estimate cholesterol levels accurately from these non-invasive means, as these can be more cost effective (no laboratories, experts, or reactives required), and can be used for screening purposes, i.e. avoiding more costly tests when not required.
In this paper, the cholesterol prediction level is related to the prediction of the total cholesterol level indicated as milligrams per deciliter (mg/dl). That is, the aim is to minimize the prediction error of the total blood cholesterol as much as possible, using the available gathered patient data. The dataset collected contains data from initially overweight patients (regardless of cholesterol levels), taken at several points in time during a dietary intervention.
The remainder of this paper is structured as follows: next, related work in the literature will be introduced in the motivation section (Sec. II); then, the materials and methods section (Sec. III) introduces how cholesterol level prediction works, as well as which data were collected from patients. First, a description of the dataset is provided (Sec. III-B); then, the regression-based prediction is introduced (Sec. III-C), as well as the experimental setup (Sec. III-E). After this, results obtained so far are presented (Sec. IV). Finally, some conclusions are drawn (Sec. V), and future work on fullyautomated estimation is further explained.

II. MOTIVATION
In the medical field of obesity, the use of indices such as body mass index (BMI), Body Adiposity Index (BAI), waist circumference (WC) and waist-to-height ratio (WHtR), have been recognized as simple and effective measures to diagnose and account a wide range of pathologies including cardiovascular diseases, hypertension, metabolic syndrome or dyslipidemia [6]. The potential of these indices in clinical practice is based on the simplicity and effectiveness of their use that allows to estimate the risk of obesity and mechanism involved in diverse pathologies associated with chronic inflammatory response caused by obesity such as high levels of blood pressure, visceral fat and central obesity and cholesterol [7], [8]. In addition, as compared to laboratory findings these indices are not only less invasive, but in some occasions can also be cheaper. Since medical devices used to process blood samples, experts' time, facilities, etc. have a higher impact in healthcare provision systems, than proposed AM-based alternatives.
Often times, these types of devices have limited use for medical purposes, however, Jiang et al. [9] propose to use RGB-D devices, to calculate BMI from 3D captured data, by estimating the height of each individual, along with an approximate weight extracted from the volume. The results of their study can find the BMI within an error of 2.54 kg/m 2 . Furthermore, Lu et al. [10] propose to use Kinect devices for 3D body reconstruction, in order to perform body composition analysis (fat, muscle, bone). Their reconstruction error is of 2.048 mm (RMSE) according to their experiments, with 82% accuracy for body composition analysis results.
Recent advancements in neural networks have enabled the loosening of some prior constraints, such as the possibility to use RGB devices, without depth sensors. An example of this is presented by Smith et al. [11], which introduce a method to recover 3D body data from 2D silhouettes from a pair of images. Another example is that of Trujillo-Jiménez et al. [12], which propose to perform precise anthropometry from handheld devices, using body2vec: a specially trained neural network that performs body segmentation (and background removal) prior to point cloud estimation and reconstruction from video. They match the estimated reconstructed models against two standards: a "silver" one consisting of LIDAR data, and a "gold" standard consisting of expertprovided AMs. However, they can obtain "useful" AMs from the videos, but are unable to reconstruct the body in full accurately.
Accurately retrieving the 3D body reconstruction of patients could open possibilities beyond the "classical" AMs that are taken today, as there could be other more inconspicuous AMs (or combinations and ratios, of AMs) that correlate better with certain health parameters that could otherwise only be obtained via invasive, more expensive means. Fur-thermore, the deployment of cameras has another benefit, which is that, given their initial acquisition and installation of the hardware, further services (i.e. algorithms, that is "software") can be developed and deployed with minimum or no change, making it possible for future broadening of the explorations or analyses that can be performed to patients from a single 3D capture session. Furthermore, a single capture of an individual allows the experts to take more measures of additional parts of the body that were not initially considered, which might be necessary for the refinement of algorithms at a later stage.
It is in this context, that the Tech4Diet project 1 aims to obtain 4D models of patients undergoing weight loss (WL) programmes; that is, by capturing the 3D body reconstruction of intervention participants along several sessions in time (fourth dimension). Once this digital model of the patient is obtained, other metrics of health assessment can be derived. Ideally, even some laboratory (blood) sampling findings can be highly correlated to digitally-estimated AMs, and other body composition variables.
Advanced digital 3D body reconstruction based anthropometry which is not limited to waist, wrist and hip measures, but many more that can be obtained from a 3D scan of the body, could also show better correlations with body fat and muscle composition (e.g. arm, forearm, thigh, calf, ankle, etc.), which in turn could be related to total cholesterol level. Initial work in this regard has already been fruitful [13] (see Fig. 2), and accurate wrist, waist and hip measures compared to those manually measured by an expert. However, the current method does not include a means to automatically determine the exact area where each measurement is to be taken [14], as depicted in Fig. 1 (i.e. where in the forearm is the wrist located, that is, which diameter around the 3D reconstructed forearm structure should be considered to be the wrist, exactly). This is currently under development as one of the aims of the Tech4Diet project, when completed, full body composition, as well as several anthropometric measurements will be automatically estimated, which will enable fully non-invasive, fully vision-based health assessment.
As part of these ongoing efforts, in this paper the focus is brought to the possible correlations that exist, and the regressions that can be made from manually annotated AMs, as well as other non-invasive variables taken from patients using a full body analysis, weight scale, total cholesterol level and triglycerides. Cholesterol has been considered as the most important sterol synthesized by human cells [15]. High levels of cholesterol and triglycerides are usually associated with an increased intake of saturated fatty acids and BMI, which contributes to the risk of chronic and neurodegenerative diseases [16]. The mechanism involved in the increased risk of diseases is caused by deposition of fibrous tissues and fat in the arterial walls [17]. Traditional indices such BMI, WC and WHpR can be used to determine and estimate some FIGURE 1: Example of digital anthropometry from 3D reconstructed bodies using computer vision with depth-sensing devices. Red discs represent slicing points for different body part measurements (waist, hip, neck, wrist, etc.).
indicators of obesity, but they do not account for differences between sexes [18]. New advances in research on human behaviour in obesity are being investigated on whether these indices can be used to diagnose different pathologies and risk of complications in obesity.

III. METHODOLOGY
The main objective of this research is to provide nutritionists with tools for better understanding of the relationship between patient variables and total cholesterol levels in patients.
One way this can be achieved is by helping predict the total blood cholesterol level using variables from patients in nutritional intervention. Another way, is being able to see the groups of patients that emerge from the data, to better determine commonalities of patients that have certain total cholesterol levels. That is, by clustering the patients into groups, some interesting common characteristics of this particular set of patients might be correlated to their total cholesterol outcomes, and these insights could be valuable to experts. This section introduces the materials and methods used for cholesterol prediction via regression and cholesterol profile clustering.

A. DATA COLLECTION
In the first place, a data collection was carried to obtain nutritional and anthropometric information from 84 patients for 6 months. During this period, each patient was intervened on 4 times (4.38 median value). All study participants were informed (including their right to withdraw at any point, for any reason) and gave consent to take part. Furthermore, the study was conducted according to all ethics regulations regarding studies with human subjects from the University of Alicante. A total of 24 variables were taken on each session with a patient. These are summarized in Table 1, and are divided into five groups: • Anthropometric measures, • body composition analysis, • lifestyle metrics, • capillary blood sampling, and • blood pressure. More precisely, anthropometric measures, i.e. wrist, waist and hip measures taken manually using a flexible measuring tape (0.1 cm precision, two measurements, then mean was used).
Body composition analysis performed by a Tanita® MC 780-P MA smart scales (Tanita Corp., Arlington Heights, IL, USA), including weight (0.1 kg precision), fat and muscle percentages for the full body, as well as separately for each limb and the trunk, and percentage of visceral fat. Height (0.1 cm precision) from a Seca® portable stadiometer 213 (Seca, Hamburg, Germany).
Lifestyle metrics, i.e. a physical activity score which was determined by using the International Physical Activity Questionnaire Short Version (IPAQ-SF). The IPAQ-SF [19] comprises 7 items assessing the frequency and duration of physical activity across three ranges of intensity-vigorous physical activity (VPA = 8.0 metabolic equivalents -or METs), moderate physical activity (MPA = 4.0 METs), and low physical activity (LPA = 3.3 METs)-undertaken across a set of domains including leisure time, domestic and gardening (yard) activities, and work related and transportrelated activities during a week of one's life. The final scores assigned in our study can take three possible values (1.2, 1.4 and 1.6, and are used as a multiplier for total energy expenditure -see below) and indicates the value of physical activity performed by the patient on a daily basis: value 1.2 refers to sedentary people, wheelchair users, etc. Value 1.4 refers to people who do little physical activity. Value 1.6 for people who practice sports daily on a semi-professional or professional basis.
For the blood sampling (capillary) measures, an Accutrend® Plus device (by Roche Diagnostics GmbH, Mannheim, Germany) was used to obtain glucose, triglycerides and cholesterol levels (from different fingers, using the Accuchek® Softclix® Pro lancing device (Roche Diagnostics GmbH, Mannheim, Germany).
Finally, blood pressure was obtained using an Omron® M3 blood pressure monitor (Omron Healthcare Europe, Hoofddorp, Netherlands) to get systolic and diastolic pressure values.
Regarding inclusion/exclusion criteria, the participants included 87 male and female Spanish volunteers with overweight and obesity recruited by advertisements on the website of the Tech4Diet project. The participants ranged in age from 22 to 63 years (x = 47.14 years; σ = 9.22 years). The inclusion criteria were (i) having a BMI greater than 25 kg/m 2 , (ii) being right-handed, (iii) being able to read and write fluently, and (iv) having Spanish as the mother tongue. The exclusion criteria were (i) currently being or having in the past year been on a dietary/nutritional treatment supervised by a nutritionist; (ii) the presence of endocrine-metabolic disorders including problems of the thyroid, pituitary gland, VOLUME 4, 2016

B. DATASET PREPARATION
After the initial collection, data preprocessing was carried out in order to remove outliers. There were two sources of outliers: first, some patient measurement sessions (i.e. data points) that had been created for testing purposes during earlier stages, to ensure correct system performance during the intervention period; second, statistical outliers were eliminated by applying z-score. Following this, an imputation of data was carried out. The reason behind this is that some variables had not been collected for some participants in a given session, and were missing; or in cases in which a variable had an implausible value. To perform the imputation, missing values were filled in using data associated to another session from the same participant. For this, a multivariate iterative imputer was used, with k-nearest neighbours (with k=10 neighbours) so that the data is completed using the most similar data points (likely from the same participant) around the data point with the missing value.
At the end of this process, the resulting dataset contains a total of 359 data points (feature vectors), with 26 variables (dimensions) each.
An extended version of the dataset can also include derived measures such as waist-to-hip ratio (WHR), for which all necessary measures are available: The basal metabolic rate (BMR), or basal energy expenditure (BEE) that requires weight (w), height (h), gender, and age (a) and depends on these variables for calculation is shown in Table 2. From the BMR, the total energy expenditure (TEE) can be calculated, using the activity score as a multiplier: When including these, each data point contains a total of 29 variables.

C. REGRESSION
After data preprocessing is done, the data is fed to 10 different regressors for training, these are show in Table 3, along with their most relevant parameters. In all cases, for reproducibility, the same random seed is used for all regressors. From this point onwards, regressors will be named using their acronyms, as per the table. The selection of the top for regressors on the table has been decided based on lowest error rates for prediction. Each regressor is provided with 324 samples for training and 35 more for testing.

D. CLUSTERING
As explained, another way in which cholesterol level prediction can be approached is by looking at cholesterol (and/or triglycerides) levels, and checking what other variables are relevant to the observed outcome, or in which way other variables correlate (or not) with the observed levels.
By looking at the patterns that emerge from grouping similar data together, it could be possible to gain valuable insight that can better assist nutritionist or other experts in estimating cholesterol levels by using indirect (non-invasive) means of analysis.
For this purpose, clustering techniques, in the unsupervised machine learning family of methods, is used. Clustering consists on finding a number of groups or clusters in which data can be divided based on similarity, or lack thereof. This is also sometimes referred to as distance.  Several clustering methods have been investigated, which are summarized in Table 4. For k-Means, the initialization step is performed using 'kmeans++', this selects initial cluster centres for k-mean clustering in a smart way to speed up convergence. In HDBSCAN, min size refers to the smallest size grouping that you wish to consider a cluster, and min samples refers to the number of samples in a neighbourhood for a point to be considered a core point.

E. EXPERIMENTAL SETUP 1) Regression experiments
Two sets of experiments were conducted to validate the presented approach. First, from all the variables available per participant, subsets were taken, including or excluding entire groups of variables or depending on their type (anthropometric, body composition, etc). The aim of this experiment was to determine which group or category of variables is most useful to determine total cholesterol levels (which could be interesting per se). On a second experiment, automated feature selection is applied, to determine the best subset of variables that yields the lowest error. In this case, the focus is on finding variables that are the most discriminant for the task.
Regarding the first set of experiments, Table 5 introduces the different variable sets used, namely: when using anthropometric variables only (A); or body composition only (BC); or a combination of anthropometric, and body composition (ABC); or both, including blood pressure (ABCP); or using only anthropometric and blood pressure (AP); or body composition and blood pressure only (BCP); or when all variables were used (All), and only total cholesterol is excluded as the ground truth, and the value to regress. That is, in all experiments the X set of variables excludes the total cholesterol, which is the y value to calculate from the regression function learned f (X, Φ) = y by means of adjusting the set of parameters Φ of the regressor.
The anthropometric variables set (A) includes, apart from the direct measures, the waist-to-hip ratio (WHR) from Eqn. 1. Furthermore, the body composition set (BC) includes the BMR (Table 2), the TEE (Eqn. 2), as well as the gender and age of the participant. The activity score is used to obtain the TEE, but is not fed directly to any model.
Regarding the second part of the experiments, with automated feature selection, three different feature selection schemes are applied. These schemes are based on the most VOLUME 4, 2016

Variable
Anthropo-Body Blood Blood set metric composition pressure sample relevant (discriminative) features from the random forest regressor using either recursive feature elimination (RFE), or the drop-column importance (DCI) of different variables, or the permutation feature importance (PFI). The RFE method works by recursively removing attributes from the dataset and building a model (Random Forest Regressor) on the attributes that remain. On each iteration, the least important feature, i.e. the feature with the lowest weight assigned by the estimator, is removed. In our case, the weight assigned is computed as the Gini importance. Instead, PFI measures the importance of a feature by calculating the decrease in the model score after a single feature has been randomly shuffled (i.e. the values have been randomly reassigned to other individuals). This procedure breaks the relationship between the feature and the target, therefore the drop in the model score is indicative of how much the model depends on the feature. Finally, the DCI method differs from PFI in that each feature is removed in each iteration instead of randomly shuffled among individuals.
In both cases, that is in both sets of experiments, parameter values to the regressors were left as per default, or were optimized via Grid Search cross-validation and distributed asynchronous hyperparameter optimization, i.e. HyperOpt [20].

2) Clustering experiments
Regarding clustering experiments, three different tests have been conducted to test which clustering method is best, as well as to check for data separability by performing a principal component analysis (PCA) beforehand.
One of the experiments entails applying each of the techniques in Table 4 to find possible clusters, and see how well each performs in terms of inter-cluster distances (cluster separation). For this, the two main principal components (from PCA) are used in 2D scatter plots. Furthermore, a t-test is provided to conclude whether the clusters are quantitatively different, i.e. with regards to the 10 most significant variables found during the regression experiments just introduced above.
Then, a series of variables are plotted against each other, to determine whether different cholesterol level grouping is present, showing a correlation with other metrics in the data.

IV. RESULTS
Following the same scheme presented in the experimental setup, this section is divided into regression and clustering, and each subsection showing the results for each of the experiments or tests performed. Table 6 shows the mean absolute percentage error (MAPE) scores for the first part of the regression experiments described above, i.e. those using manually picked subsets of variables, as shown in Table 5. From the results shown, it can be observed that body composition (BC) data alone is highly correlated with cholesterol outcomes, this is likely to be derived from the fact that fat mass percentages for each limb, the trunk, or the whole body are relatively important to assess dyslipidemia. For this, the Random Forest regressor (RFR) is the best-performing, specially when parameters are optimized. This is even better than using all variables, which include triglycerides, which are correlated to cholesterol outcomes too. These results are highly relevant, as the BC set of variables does not contain any blood sampling, that is, a cholesterol level can be regressed from fully non-invasive means of participant exploration. In all cases, hyperparameter optimization seems to lead to better results, as should be expected.

A. REGRESSION RESULTS
Regarding the second set of regression experiments, Table 7 shows the results for the different feature selection schemes used, with the last row showing the results when parameters are optimized, and Table 8 Table 7, it can be observed that the best result (lowest MAPE) without optimization (underlined) is obtained when using the top 10 features from random forest regressor (RFR) with the recursive feature elimination (RFE) using the extra trees regressor (ETR) to estimate the cholesterol levels. The bottom-most row presents the result when hyperparameters are optimized for the underlined result. As can be seen, in this case the error is reduced by 0.85% from 5.24% to 4.39%. This is an 16.2% reduction in error with respect to the non-optimized, and the overall best result so far. An expected-predicted scatter plot is shown in Fig. 3 for this best case.
From Table 8, some insights can be drawn from the set of selected features: BMR seems to be a good predictor for cholesterol values, also WHR tends to be in the top 10 features. TEE appears twice in the top 10, but not for the DCI method. Weight appears in the top 10, except for RFE. Systolic blood pressure (BP) appears in the top 10 in all cases, as do some fat measures (global, trunk, left leg), which makes sense, given the relationship between body fat makeup and blood lipid levels, as well as the metabolic syndrome (MS) in which high blood pressure, high cholesterol and high blood glucose are present. As expected, blood sample related features (glucose, triglycerides) are related to cholesterol, and therefore make it to the top 10 features as well, in most cases.
To end the regression part, Fig. 4 shows histogram plots for the top 10 most discriminative features of the random forest regressor (RFR) when using recursive features elimination (RFE), which is the set of features resulting in the best result for feature selection, and the best overall presented in this paper.

B. CLUSTERING RESULTS
With regard to clustering, PCA is first applied to obtain the principal components. Figure 5 shows the cumulative variance explained by each new principal component extracted from the data.
Next, different clustering methods are tested, to determine how well they perform (in terms of cluster separability, intercluster distances). Figure 6 shows the results of clustering with each tested algorithm using the results from the two principal components. At the top-left is k-Means clustering, showing four well separate clusters; top-right is agglomerative clustering, in which clusters 1 and 4 fall in the same area; bottom-left is HDBSCAN, which identifies two clusters, but shows much mixture between elements of the two; finally spectral clustering shows four clusters, some have clear boundaries such as 1 and 2, but 3 and 4 show overlap between them, and to a minor extent with cluster 1.
As can be observed, the best result is obtained when using k-Means, as the boundaries of each cluster are more clear, and there seems to be very little or no overlap, which makes it possible to split the data into four different groups of patients. A deeper analysis of cluster composition is performed next. To do so, each of the sets of variables shown in Table 5 (A, BC, P) is taken to find differences in variable values that are relevant for each cluster.
The first cluster (cluster 1) is conformed by 141 data points, being the largest cluster. With respect to the variables of the A set, these individuals have the lowest waist-to-hip VOLUME 4, 2016  Table 8) along with a curve fitted to each distribution.    (-3.38). However, these individuals are not the ones with the least amount of fat per body part (limbs/torso) or as a whole, but they are the ones with the least muscle mass. They are also the shortest (least height). All this makes this group also the group with the lowest mean weight of all clusters, and below the global average (-13.17). It also has the lowest BMR and TEE. Furthermore, regarding blood sample variables, this group has the lowest total cholesterol, and the same trend is observed with blood pressure. In cluster 2, we find the smallest set of data points, with a total of 68. Albeit having a hip circumference similar to the average, the value of waist circumference is elevated, the highest of all clusters. This makes this group also have the highest WHR. If observing BC variables, we find that individuals do not have a high fat percentage, but they do have the highest amount of muscle mass. This, as opposed to cluster 1, makes them the heaviest. They also have a high amount of visceral fat, BMR and TEE. Observing blood sampling values, this cluster stands out by the fact that its triglyceride scores are very low. Yet, blood pressure is the highest.
In cluster 3, looking at the A set of variables, it can be highlighted that the average hip circumference is much lower than the global mean, and therefore, the lowest of all clusters too. Observing BC variables, this cluster is the one with the lowest global and limb/torso fat values. The values of cholesterol are the lowest, even below the global mean (-6.78), and the glucose values are the highest. In this cluster, the diastolic blood pressure values are the highest, and are 8.53 points above the mean.
From cluster 4, it can be observed that the mean hip circumference surpasses all other clusters, and is 20 centimetres higher than the next cluster in this regard. Individuals in this cluster have the largest amount of fat of all clusters. And, although it has muscle mass scores that are average, the global muscle mass is the lowest of all clusters, i.e. -8.99 points below the global mean. This cluster has glucose and cholesterol levels that are close to the mean, but triglycerides are the highest.
Finally, to obtain a quantitative measurement of cluster disjointness, a t-test has been applied on pairs of the clusters obtained by k-Means. The test was performed on the 10 most discriminative variables used for regression before. The results from this test are shown on Table 9. The top 5 rows of each table show the best, and the bottom five rows show the worst p-values for each cluster pair. We conclude that, at the 5% significance level, the two clusters are significantly different from each other in terms of all 10 variables.

V. CONCLUSIONS
In this paper, we have presented a machine learning approach for total cholesterol estimation using regression from noninvasive patient variables consisting of anthropometric, body composition, blood pressure, lifestyle, and, optionally, capillary blood sampling. This information is useful for initial patient screening, resulting in reduced costs of operation for healthcare providers, when patients do not require further, more expensive tests. Additionally, clustering has been used to characterize the different groups or clusters of individuals that emerge from the data, which can provide valuable insights to clinical experts.
In the first set of regression experiments, variables have been manually divided into groups, according to their nature.  Several subsets of feature groups have been taken and results calculated. It has been shown that, by using solely body composition features, it has been possible to achieve an overall error of 4.58% in total cholesterol (TC) estimation, when applying hyperparameter optimization.
In the second set of regression experiments, automated feature selection has been performed. When using the top 10 best (most discriminative) features, the error can be further reduced to 4.39%, after applying hyperparameter optimization. However, it needs to be taken into account, that, due to the performed feature selection, the feature set contains some blood sampling features (glucose and triglycerides). Therefore, this is not fully non-invasive. Nonetheless, capillary blood sampling is less invasive than venous blood extraction, and uses fewer resources, as no laboratory equipment, experts, and expensive reagents are necessary.
Finally, four clustering methods have been compared, and using principal component analysis (PCA) it has been possible to split the data into four patient profiles, each showing characteristic ranges of values in their anthropometric (A), body composition (BC), blood pressure (P), or blood sampling variables. As future research lines, our next steps in the short and medium term are aimed at reducing the error by including: a) digital anthropometry metrics, b) more participants in the study, and c) further variables, especially those that are part of an automated body composition analysis, given the good results of the first set of experiments conducted in the present study. More accurate cholesterol levels can also be obtained employing a venous blood sample analysis, rather than a capillary blood sample. This could improve the results, given the lower error range of laboratory blood analysis versus the capillary blood sample device used. Finally, in the long term, we are aiming for complete digital anthropometry from 4D models (3D + time) of the patient body to extract further measurements that reveal more information or have a higher correlation with other health indicators. JORGE AZORÍN-LÓPEZ received a degree in Computer Engineering in 2001 and a Ph.D. degree in Computer Science at the University of Alicante (Spain) in 2007. Since 2001, he has been a faculty member of the Department of Computer Technology at the same university, where he is currently an Associate Professor and the Academic Secretary. His current research interests include 3D computer vision, computational intelligence, machine learning, deep learning, ambient intelligence, human activity analysis, and visual inspection. In these lines of research, Dr. Azorín has worked in 20 research projects (5 of them as coordinator) funded by national, regional, and local public and private entities. He has authored more than 100 contributions in several journals, conferences and book chapters. VOLUME 4, 2016