Efficient Driver Drunk Detection by Sensors: A Manifold Learning-Based Anomaly Detector

This study presents an effective data-driven anomaly detection scheme for drunk driving detection. Specifically, the proposed anomaly detection approach amalgamates the desirable features of the t-distributed stochastic neighbor embedding (t-SNE) as a feature extractor with the Isolation Forest (iF) scheme to detect drivers’ drunkenness status. We used the t-SNE model to exploit its capacity in reducing the dimensionality of nonlinear data by preserving the local and global structures of the input data in the feature space to obtain good detection. At the same time, the iF scheme is an effective and unsupervised tree-based approach to achieving good detection of anomalies in multivariate data. This approach only employs normal events data to train the detection model, making them more attractive for detecting drunk drivers in practice. To verify the detection capacity of the proposed t-SNE-iF approach in reliably detecting drivers with excess alcohol, we used publically available data collected using a gas sensor, temperature sensor, and a digital camera. The overall detection system proved a high detection performance with AUC around 95%, demonstrating the proposed approach’s robustness and reliability. Furthermore, compared to the Principal Component Analysis (PCA), Incremental PCA (IPCA), Independent component analysis (ICA), Kernel PCA (kPCA), and Multi-dimensional scaling (MDS)-based iForest, EE, and LOF detection schemes, the proposed t-SNE-based iF scheme offers superior detection performance of drunk driver status.


I. INTRODUCTION
The number of traffic accidents keeps increasing and causing more damage to society even with the advanced intelligent transportation systems. As reported by the World Health Organization, since 2016, traffic accidents are becoming among the top 10 causes of death [1]. Moreover, according to the WHO, about 1.3 million deaths each year are due to car crashes [2]. The risk of traffic accidents could be significantly increased when driving under the impact of alcohol and any psychoactive substance or drug. The WHO declared that approximately 40% of road traffic accidents are mainly caused by driving under the influence of alcohol [3], the fifth The associate editor coordinating the review of this manuscript and approving it for publication was Chao Tong . most common on-the-roads death cause [4]. In addition, driving drinking not only causes road traffic injuries but also causes financial losses of up to 500 million $ per year worldwide [5]. Therefore, accurate detection of drunk drivers is vital to mitigate road traffic accidents.
Automatically and accurately detecting car drivers under excess alcohol is essential for reducing road traffic accidents. Over the last decade, increasing interest in developing advanced technologies for detecting driving drinking. Generally speaking, there are two categories of driver alcohol detection: obtrusive-based and unobtrusive-based detectors [6]. Detecting drunk driving via the obtrusive-based techniques is carried out using physiological state changes of a driver, including blood alcohol concentration (BAC), breath alcohol concentration [7], electroencephalogram (EEG) signals [8], VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and electrocardiogram (ECG) signals changes [9]. However, acquiring these data types (e.g., EEG signals and heart rate) is not accessible, particularly in driving. In addition, drivers may be troubled because of the surrounding environment with intrusive equipment. On the other hand, the unobtrusive techniques for detecting drunk driving are based on vehicle-based features and driving behavior. Different vehicle-based measures are generally used to detect drunk driving, including vehicle speed, acceleration, steering wheel movements, and lateral position. Other unobtrusive techniques employed image-based features to monitor the driver's face and state [10], [11]. For instance, authors in [12] introduced a breath-based alcohol detection system to control the ignition of the engine alcohol if the driver is drinking. This embedded system can be employed to prevent drunk driving and thus enhance traffic safety by reducing traffic accidents due to drunk driving. For instance, in [13], an Internet of Things (IoT)-based drunk detection strategy is introduced to prevent traffic accidents due to drunken driving. To this end, this IoT system is equipped with a set of sensors, including Heartbeat rate, Facial recognition, and alcohol concentration detection sensor. Driving with excess alcohol may result in severe traffic accidents and serious injury, even deaths for the drivers and the public. Accurately detecting drunken driving is vital to improving traffic safety and helping avoid traffic incidents. Most of the developed detection approaches for drinking driving detection are generally designed using shallow supervised methods that require labeled data in training [14], [15], [16]. However, getting labeled data is not obvious and timeconsuming. Thus, this study aims to design a semi-supervised data-driven detector for driving drinking detection that does not require labeled data. Unlike supervised algorithms, semisupervised anomaly detection algorithms only employ the data of normal events to train the detection model, making them more attractive for detecting drunk drivers since it is not always easy to get accurately labeled data. Of course, the contributions of this study are summarized as follows.
• This study introduces an innovative approach for driving drinking detection by combining the advantages of the t-distribution stochastic neighbor embedding (t-SNE) model and isolation forest (iF)-based anomaly detection scheme. We used the t-SNE model to exploit its capacity in reducing the dimensionality of nonlinear data by preserving the local and global structures of the input data in the feature space to obtain good detection [17], [18]. Essentially, the original data are projected into the optimal low-dimensional space via the t-SNE, and then the iF detector is applied to the extracted features to realize anomaly detection. The key characteristic of the iF-driven anomaly detection scheme is its capacity to uncover anomalies without considering any distance or density metrics, reducing computational costs [19]. At first, the t-SNE-based iF detector is constructed based on training data (normal driving behaviors) and then used to detect drunk and driving behaviors. We assessed the effectiveness of this approach by using experimental data provided in [15] for alcohol detection in drivers by sensors and computer vision (i.e., physiological, biological, and visual characteristics). Specifically, three sensors are used for driver data acquisition. An MQ-3 gas sensor, which is sensitive to different gases and rapid to integrate into the system, is employed to sense the presence of ethanol. An MLX90621 temperature sensor is used to determine the facial thermal change of the driver. Also, the Raspberry Pi Camera is employed to compute pupil ratio. Of course, the multivariate data contains alcohol concentration and temperature in the car environment, face temperature, and pupil ratio. The remainder of this paper is organized as follows. Section II highlights literature reviews on the related works. Section III briefly describes the preliminary materials, including the tNSE and the iF anomaly detector. Section IV presents the proposed drunk driving detection approach. In Section V, we present the used data and the obtained results. Finally, we offer conclusions in Section VI.

II. RELATED WORKS
Driving with excess alcohol can result in severe road traffic crashes to drivers and the public. Over the last decade, many researchers and engineers have developed data-driven methods to improve drunk driving detection for intelligent transportation systems [14], [20]. For instance, the authors in [16] introduced an approach for drunk driving detection using support vector machines (SVM) classifier. The SVM is applied to the extracted driving characteristics (i.e., lateral position and steering angle) to decide the state of the driver state (normal or drunk). Driving with excess alcohol could influence the slopes of steering angle and the slopes of vehicle lateral position. This study is conducted using a fixed-base driving simulator. Results showed that the SVM classifier obtained an overall accuracy of 80% in discriminating drunk driving. In [21], principal component analysis (PCA) has been employed for features selection, and SVM is applied to distinguish normal driving from drunk driving. The results showed that the SVM classifier achieved an accuracy of 70%, which still needs more improvement. In [22], Random Forest (RF) is employed to detect drunk driving based on driving behavior data collected from a driving simulator. After selecting the important features using the RF algorithm, SVM, AdaBoost, linear discriminant analysis (LDA), and RF have been applied to detect drunk driving under different road conditions. Results showed that RF and AdaBoost achieved the best classification performance based on seven features. Specifically, the classification accuracy reached by the RF and AdaBoost is slightly greater than 80%; while, the LDA and SVM achieved an accuracy of 75.93% and 74.07%, respectively.
The authors in [23] focused on developing driver behavior states detection strategy to discriminate three driver states: normal, drowsy, and drunk driving using vehicle-based measures. This study is conducted using a simulator, which enables obtaining data difficult to collect under real driving conditions, such as drowsy or drunk driving. Importantly, three models are constructed to discriminate the three behavior states: normal, drowsy, and drunk driving. An experiment with free-road driving is performed to get information about the drowsy and normal state, and another experiment is implemented under road driving to obtain information about drunk driving and normal driving. The data used for the detection is based on acceleration, velocity, yaw rate, and steering. Essentially, the first model aims to separate drowsy behavior from the normal one; the second model is used to discriminate drunk from drowsy states using features from the free-road data, and the last one, constructed using event-road driving data, focus on detecting abnormal events. Of course, each model is used to separate two states. The states identification is treated as a supervised classification using a machine learning model, namely Random Forest. In [24], a two-stage data-driven approach based on Markov models together with Recurrent Neural Networks is presented to detect drunk driving using onboard vehicle sensors. Specifically, several sensory data are collected and processed by Recurrent Neural Networks to predict the longitudinal acceleration in a supervised manner. This approach achieved an overall detection performance of 79%, which makes it very promising to prevent drunk drivers from driving.
Recently in [25], a two-stage deep learning approach is proposed to detect drunk driving using a Convolutional Neural Network (CNN). At first, the simplified VGG (Visual Geometry Group) network, a standard CNN, is applied to estimate the driver's age, and then the simplified Dense-Net for identifying the facial features of drunk driving for alcohol test discrimination. An accuracy of about 86.36% is achieved in the age discrimination step. The overall accuracy of 88.53% is obtained for the drunk driving detection stage. Authors in [26] address the abnormal driving detection using a stacked sparse autoencoders approach (SdsAEs) to model driving behavior features, specifically a softmax layer is considered for a classification task. Results showed the superior performance of the SdsAEs approach in detecting abnormal driving behavior compared to softmax regression, SVM, and a back-propagation neural network. Authors in [15] and [27] proposed a strategy for in-driver drunk status detection based on two inputs, a visual via image processing and sensors data. Specifically, the following input variables are used to classify normal driving from drinking dring status: the facial temperature of the driver, the pupil width, and the concentration of alcohol in the car environment. The problem of drunk detection is addressed via supervised classification techniques combined with a features selection, using machine learning models, such as SVM, k-nearest neighbors (kNN), Decision Tree, and Neural Network. The authors in [28] introduced an approach to identify the driver state by using physiological sensors and a capacitive hand detection sensor. They use cellular neural networks for monitoring the driver's stress level. Results showed promising performance of this approach in recognizing the driver states (i.e., stress or no stress) by providing detection accuracy of 92%.

III. MATERIALS AND METHODS
This section presents the materials needed to design the proposed drunk driving detection approach: the t-SNE and the isolation forest methods.

A. T-DISTRIBUTED STOCHASTIC NEIGHBOR EMBEDDING
The t-SNE is a nonlinear dimensionality reduction technique originally introduced by van der Matten and Hinton in 2008 to visualize high dimensional data in lower-dimensional space [17]. It is characterized by its capacity to capture much of the local structure in the high-dimensional data while also retaining global structure. More explicitly, if the original data contain numerous clusters, the t-SNE enables revealing the presence of these clusters in the low dimensional space. In recent years, the t-SNE has been widely employed in many research fields for visualizing high dimensional features [29], [30], [31], [32], [33], [34], [35].
Lets denote D = d 1 , d 2 , . . . , d l a high dimensional datasets, and S = s 1 , s 2 , . . . , s l the corresponding visual space. At first, the t-SNE calculates the dissimilarity separating the observation in the input space. To this end, the similarity between sample data points d i and d j is quantified using the Gaussian distribution in Equation (1), P ij , with σ i denotes the standard deviation of the Gaussian distribution centered on d i , It is worth pointing that in t-SNE, we set P(i|i) = 0 because only pairwise closenesses within data points are of interest. The joint probabilities of the high-dimensional points, which is a symmetrized version of the conditional similarity because it has the property that P ij = P ji for ∀i, j, is expressed as: (2) VOLUME 10, 2022 Using conditional or joint probabilities results in similar results, but optimizing the joint model is less computationally expensive [17]. For the lower space, the student-t probability distribution with one degree of freedom has been employed to compute the similarity between sample data points s i and s j , as in Equation (3).
Indeed, the student-t distribution has heavy tails than the Gaussian distribution, making it more suitable for discriminating crowded points in the inputs. Crucially, Student-t distribution is appropriate for representing dissimilar points in the input space by a larger distance in low-dimensional space. Then, the Kullback-Leibler divergence (KL) is applied to quantify the distance between distributions of data in original space and low-dimensional space. The KL distance is minimized to get coordinates of the data points in lowerdimensional space. The objective function L is defined as follows [36]: P(j|i) represents the similarity between d i and d j while Q(j|i) is used for y i and y j . Indeed, P (data distribution of the input data of higher dimension) equation (3), while Q (data distribution of the output data of low dimension).
The cost function L is minimized based on a gradient descent algorithm; the t-SNE stochastic gradient descent is achieved as follows: After that, s i is updated by the following equation: where s t i represents the solution at iteration t, η denote the learning rate and α refers to momentum at iteration t. The learning rate decides the step size used at each iteration to optimize the objective function L, while a relatively large momentum term could be introduced for accelerating the optimization procedure and avoiding poor local minimums.
Note that in the t-SNE approach, the most important hyper-parameter is the perplexity, which defines the effective number of neighbors. In other words, the t-SNE output generated depends on the select values of its input, especially the Perplexity parameter. The value assigned to the Perplexity P is proportional to the σ 2 i , which means a small value will correspond to a small distance between to data points d i and d j . The perplexity is expressed as: With E(P i ) denotes the Shannon's entropy of P i [17].
There is no automatic way to choose the optimal perplexity value. Larger values of the perplexity leads could eliminate small-scale structures in the manifold; however, smaller perplexity values could falsely generate several sub-manifolds by using a small number of nearest neighbors. The optimal value of the Perplexity can be obtained by minimizing the cost function, L, with respect to the Perplexity. The authors in [17] recommend choosing a perplexity value with the interval of [5,50]. The time complexity of the t-SNE model is O(N 2 ), where N denotes the number of data points [18]. In 2014, an improved t-SNE version, called Barnes Hut SNE, was developed to enhance time complexity and reduce it to O(NlogN ) [18]. More details about the t-SNE could be found [17], [37].

B. ISOLATION FOREST-BASED ANOMALY DETECTION
The Isolation Forest approach was primarily designed by Lui in 2008 [19] and improved later in 2011 [38] to deal with anomaly detection problems where only normal observations are available. Importantly, it is an unsupervised anomaly detection approach since it is designed without the need for labeled data. The essence of the approach is founded on the principle of the Decision Tree algorithm, and it identifies anomalies by isolating outliers from the data [38]. The iF is based on the well-known Random Forest, which consists of a set (ensemble) of decision trees constructed during the training phase [39]. Isolation Forest can be considered an ensemble learning approach to deal with classification and regression problems [40], [41]. For instance, in [40], a similarity-measured isolation forest is considered to detect anomalies in machine monitoring data. In [41], a combined approach using principal component analysis with the iF algorithm is introduced for partial discharge detection. Importantly, PCA is adopted to reduce the feature space to 2-D space, and the iF is applied to discriminate multi-source partial discharge signals. Figure 1 illustrates the basic structure of the iF algorithm, which consists in building an ensemble of trees for a given data set. Essentially, the iF algorithm recursively splits the data by constructing an ensemble of trees until isolating all samples. Anomalies can be characterized by a short average path length on the trees. In other words, shorter paths are indicators for potential anomalies because a few numbers of anomalies lead to a smaller number of partitions [19].
Implementing the iF-based anomaly detection approach demands only two parameters specified: the number of trees and the size of sub-samples used for the splitting operations to build the forest. In [19], it has been shown that the detection performance of the iF approach can converge fast based on a small number of trees, and it only needs a small sub-sampling size to reach high detection accuracy. In the iF approach, anomalies in a dataset can be detected by analyzing the path lengths for the anomaly data points, with the splitting process being short, which mean that anomalies require few splits in isolation Trees to be isolated [42]. Furthermore, the anomaly score is computed from the mean path length across all the isolation trees in the forest.
In such an anomaly detection framework, anomalies are scored depending on the leaf depth and isolated after a few splits in a tree. Of course, anomalies are identified by fewer splits or shorter path lengths in the tree. A score is measured by assigning a score to detect anomalies using isolation susceptibilities of a given data point. Therefore, high susceptibilities (anomaly score) indicate potential anomalies, while data points with low anomaly scores are considered normal observations or inliers. Note that the iF approach is trained in an unsupervised manner, and it performs better for anomaly detection when the training dataset does not contain anomalies [38].
Lets denote l(d) is the path length of a given data point d, and D a dataset composed of N data points. The minimum depth of a used decision tree is equals to log(N ) while the maximum depth is N − 1. Essentially, the anomaly score is computed based on the path length of the trees within the forest. The anomaly score, A, can be computed using the following formula [19]: where E l(d) denotes the the expected path length of a given data point d from a collection of isolation trees, and α(N ) is the average path length, expressed as [19]: where λ(i) is the harmonic number, which can be estimated as follows: With is the Euler Constant, i.e., = 0.5772156649. Overall, the anomaly score of d, A(d, N ), is obtained by iTree from the training data of N samples, and the range of A(d, N ) is within [0, 1]. It is worth pointing out that the anomaly score is oppositely proportional to the path length. The smaller the anomaly score, the higher the depth is, which indicates the higher the probability that the data point belongs to normal points. Finally, the anomaly detection is performed as follows. N ) is close to 0.5 (11) Noteworthy, an anomaly is flagged if A(d, N ), while when A(d, N ) is less than 0.5, then the data point is likely typical. In the final determination of drunk driving, when A(d, N ) is close to 0.5, then a driver is considered under normal status.
The IF is intuitive, not time-consuming, and sensitive to an outlier in data, making it particularly suited for applications where low latency is necessary. The computational cost of IF in training and testing are is O(t log ) and O(nt log ), respectively. Here, refers to the subsampling size of the dataset [43], n denotes the size of the dataset, and t is the number of trees in the forest. Interestingly, needs to be small and constant across distinct datasets to reach a more satisfactory detection performance.

IV. THE T-SNE-BASED ISOLATION FOREST APPROACH
This study addresses the problem of drunk driving detection as an anomaly detection problem. Specifically, the goal is to identify the state of the monitored driver (normal or drunk) based on the collected multivariate time series data. A datadriven approach for drunk driving detection is presented by amalgamating the advantages of two unsupervised machine learning algorithms: manifold learning (i.e., t-SNE) and a decision-tree-based ensemble learning technique (i.e., Isolation Forest). The general framework of the proposed t-SNEbased iF detector is schematically illustrated in Figure 2.
At first, after the acquisition of driver data, the t-SNE is applied and projected the normalized data to feature space with a lower dimension than the input space, usually for 2D or 3D for visualization purposes. The input of t-SNE is the normalized dataset X is transformed in feature space as, T = tSNE(X , Components, Perplexity).
The t-SNE features, T , are used as input to the Isolation Forest detector to identify if the driver's drunk status. Note that the iF detector is trained based only on t-SNE features without anomaly (i.e., data from a driver under normal status). Then, it is used to decide if the new T is anomaly-free (no alcohol) or contains anomaly (driver under the impact of alcohol).
As mentioned above, the Isolation forest training is performed based on transformed data without anomaly (no alcohol), and all decision tree's depth is deeper than anomalies with a shorter path length accounting from the tree root. This structure of isolation trees is suitable for detecting alcohol cases (anomaly) from normal cases during the testing phase. The transformed testing data via the t-SNE are passed through the already built iF scheme in the testing stage. Specifically, the path depth is estimated to compute the anomaly score, then compared to a decision threshold for anomaly detection. If the computed anomaly score is greater than 0.5, VOLUME 10, 2022 an anomaly is declared (i.e., drinking driving); otherwise, the driver is under normal status (no alcohol). The proposed t-SNE-driven iF detection procedure is summarized in Algorithm 1. In this study, five statistical scores are employed to quantify the performance of the studied methods computed using a 2 × 2 confusion matrix: Accuracy, Precision, Recall, F1-score, and Area under curve (AUC) [44]. For a binary detection problem, the number of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) are used to compute the evaluation metrics.

A. DATA DESCRIPTION
This part is devoted to assessing the efficiency of the proposed approach in detecting drunk driving. The experiments are accomplished through actual data from a publicly available database provided in [15]. Three types of sensors are used to collect this data: a sensor of concentration of alcohol in the environment (physiological), a sensor that measure the temperature of the defined points on driver's face (biological) and another one that allows to identify and recognize the thickness of the pupil (visual characteristics). The dataset is relatively small with 390 data points (217 for no alcohol presence 173 for alcohol presence with different concentration). Five variables are collected to decide between drunk and normal driving behaviors: alcohol concentration in the car environment in ml/L, car environment temperature in degrees Celsius, face temperature min in degrees, face temperature max in degrees Celsius, and pupil ratio. Figure 3 illustrates the distribution of the five considered attributes, which indicates that these datasets are non-Gaussian distributed. Those empirical historical data would challenge traditional dimensionality reduction methods, such as PCA and MDS, that typically require linear and Gaussian distributions. Thus, nonlinear techniques designed without restricting the data distribution to be Gaussian, such as tNSE and KPCA, could be promising.

B. EXPERIMENTS AND SETTINGS
Three main experiments are conducted in this study: 1) At first, we evaluate the standalone anomaly detection schemes, including iF, EE, and LOF, in detecting drunk driving. 2) Then, we evaluate the performance of the t-SNE-based iF approach to detect drunk driving. 3) After that, we optimized the performance of the t-SNE-based iF approach detection performance based on different values of the perplexity parameter. 4) Finally, we compared the performance of the proposed approach with five commonly used dimensionality reduction-based approaches: PCA, ICA, IPCA, KPCA and MDS-based anomaly detection. In the first experiment, we applied three standalone anomaly detection methods, isolation Forest, Elliptical Envelope (EE) [45], and Local Outlier Factor (LOF) [46]. The parameters setting of these three detectors is listed in Table 1. We used the Grid Search approach to determine the optimal values of hyper-parameters. The three anomaly detectors are applied to the original data with dimensionality reduction. In the LOF detector, an anomaly score is computed for each observation by measuring the local divergence of the density of a given sample compared to its neighbors. In this study, the number of neighbors used in LOF is 20. In the EE detector, which aims to fit an ellipse around the data using a minimum covariance determinant (MCD), the proportion of points to be included in the support of the raw MCD estimate is 0.05. The detection results of the three detectors (i.e., iF, EE and LOF) are listed in Table 2. Results reveal that the iF detector dominated the EE and LOF detectors by obtaining an AUC of 0.9452 and F1-score of 0.9448. It is followed by the EE detector, which showed a satisfactory detection accuracy with an F1-score of 0.9375 and an AUC of 0.9377. The LOF gives the lowest detection performance with an AUC of 0.64.   The second experiment is dedicated to verifying the performance of the proposed t-SNE-driven iF anomaly detection approach in detecting drunk driving. Detection results of the t-SNE-driven iF detector, under different perplexity values between 5 and 100, are listed in Table 3. To visually show the impact of the perplexity parameter on the final output of t-SNE, Figure 4 provides visual results of t-SNE applied to the alcohol dataset using different perplexity values. Results in Table 3 indicate that the t-SNE with a perplexity of 30 improves the alcohol detection using the iF detector by achieving a higher F1-score and AUC of 95.81 and 95.37% respectively. It can also be observed that perplexity 10 and 20 recorded AUC > 0.9, which is a good result.
Detection results based on t-SNE-based LOF and EE schemes under different perplexity values are reported in Table 4 and Table 5, respectively. The results show that t-SNE-based LOF and EE schemes with a perplexity of 20 can satisfactorily identify drunk driving from normal driving with an AUC of 93.81% and 93.99%, respectively. These two approaches provide almost comparable detection results.
In the last experiment, as benchmark methods, we assessed the performance of five dimensionality reduction techniques, namely MDS, PCA, ICA, IPCA, and KPCA in detecting drunk driving. These multivariate techniques are widely used in the literature by projecting multivariate data into a lowdimensional space, where most of the variability in data can  be maintained [47]. Generally speaking, linear techniques, including PCA, IPCA, MDS, and ICA, reduce data dimensionality by determining a linear combination of the original variables. They are suitable for handling data that is inherently linear. Nonlinear techniques, such as KPCA, permitted modeling and revealing of nonlinear relationships among multivariate data [47]. Similar to the t-SNE-based approach, we applied the considered linear and nonlinear dimensionality reduction techniques to the multivariate input data for feature extraction and applied the anomaly detection schemes (i.e., iF, EE, and LOF) to the extracted features for anomaly detection. These models are constructed using anomaly-free data and then used for anomaly detection. The values of the parameters of each model are listed in Table 1. Table 6 reports the detection performance achieved by PCA, IPCA, MDS, ICA, KPCA, and t-SNE-based iF, EE, and LOF detection methods when applied to detect drunk driving.
Drunk detection results using MDS, PCA, ICA, IPCA, and t-NSE-based iF, EE, and LOF methods are reported in Table 6. The proposed t-SNE-based iF detector offers superior driver drinking status discrimination performance by achieving an averaged accuracy of 0.9537, F1-Score of 0.9581, and an AUC value of 0.9537. This could be because the t-SNE preserves the local and global structures of the input data in the feature space. In addition, the t-NSE is an efficient nonlinear dimensionality reduction technique embedding multivariate data in a two-dimensional plane. Results in Table 6 indicate that the coupled t-SNE-based iF scheme provides better performance than that of the standalone detector (iF, EE, and LOF) for drunk driver detection. This confirms the benefit of using the t-NSE model in providing more relevant features. We observe that the KPCA-based EE detection scheme achieved the second-best result with an F1-score and AUC of 0.9466 and 0.9493, respectively. Linear dimensionality reduction-based detection schemes (PCA, MDS, ICA, and IPCA) follow it, as shown in Table 6. Figure 5 displays the barplot of AUC values to visually aid the comparison of achieved results by the considered twentyone detection schemes. Results show that the t-SNE-based iF detector obtains the most accurate drunk driving detection with an AUC = 95.37%. Overall, the detection accuracy is  enhanced when using the t-SNE features compared with the original features. In other words, the t-SNE-based iF scheme outperformed the standalone iF, EE, and LOF anomaly detector in detecting drunk driving. Furthermore, as observed in Figure 5, using a nonlinear dimensionality technique (i.e, the t-SNE) for alcohol detection delivers improved detection performance with AUC = 95.37% compared to the approaches using linear dimensionality reduction techniques for features extraction; i.e., the PCA and KPCA-based EE achieved AUC = 94.93%, and the MDS-EE obtained an of AUC = 93.20%. It could be attributed to the capacity of the t-SNE in capturing nonlinear features in data and the sensitivity of the iF detector in uncovering abnormal observations. In short, the obtained results demonstrate and reveal the promising performance of the combined t-SNE with isolation forest in detecting drunk drivers detection. We observe that the PCA-based approach requires a lower runtime requirement than the nonlinear dimensionality methods. But, its simple structure cannot capture non-Gaussian and nonlinear features. ICA-based iF scheme follows it, as it is a linear dimensionality reduction method without restricting the data distribution to be Gaussian. Both linear methods (PCA, ICA, and IPCA) achieved lower computational costs than nonlinear methods (KPCA and t-SNE), but they are unsuitable for nonlinear processes. MDS is computationally expensive.
In summary, this study showed that drunk driving detection using the t-SNE-driven iF anomaly detection approach is feasible and effective. It could be attributed to the ability of the t-SNE technique in preserving local geometry and global information of the multivariate data after dimensionality reduction, which is not the case with the linear dimensionality reduction techniques (i.e., PCA, MDS, ICA, IPCA) that may not capture the nonlinear structure in the data. Thus, the detection accuracy of drunk driving using the t-SNE method is better than the PCA, ICA, IPCA, and MDS-based methods. Also, this approach outperformed KPCA-based schemes in detecting drunk drivers. This is because the multivariate data collected to detect drunken driving is non-Gaussian and nonlinear. The t-SNE technique bypasses the data distribution problem by transforming the data distance problem into a probability distribution problem. Moreover, the use of the iF anomaly detector (a sensitive to uncover anomalies in multivariate data) based on the t-SNE features improved the drunk driving detection process. It is found from the results that the perplexity values within [5,50] could provide good recognition performance, which is in concordance with the literature. The best detection performance is obtained with a perplexity of 30, so there is no need to take a large number of neighbors in the t-NSE. Furthermore, this study revealed the good detection capacity of the t-SNE-based iF approach to deal with a relatively small-sized dataset.

VI. CONCLUSION
Accurately detecting drunk driving is undoubtedly necessary for reducing traffic accidents and improving road safety. In this study, a data-driven methodology to detect drunk drivers is introduced. Importantly, to enhance drunk driving detection, this merges the extended capacity of the t-SNE nonlinear dimensionality reduction as a features extractor and the discrimination ability of the iF in anomaly detection. After normalizing the input data, the t-SNE is employed to extract the characteristics of collected multivariate data. Then, the iF detector is to t-SNE features to detect potential drunk driving. The major advantages of this approach are its assumption-free on data distribution and no need for labeled data in its design to perform anomaly detection. The detection effectiveness is assessed on actual public data collected by sensors and a digital camera. We compared the proposed t-SNE-iF approach with several semi-supervised detection approaches, t-SNE-based EE and LOF schemes, PCA, MDS, ICA, and IPCA-based iF, EE, and LOF methods, and the standalone anomaly detection schemes (i.e., iF, EE, and LOF). Results demonstrated the superior detection performance of drunk driver status based on the proposed approach. Thus, this study revealed the promising performance of the t-SNEbased anomaly detection approach for alcohol detection in drivers.
Despite the improved detection performance greater than 95%, future works will improve its capacity to discriminate drunk from normal driving by associating other sources of input like visual data (facial images) and driver behavior. The t-NSE-based model is relatively computationally demanding, hence parallel computing could provide possible solutions. Notably, more computational resources are needed when a more complex model structure is adopted. A more computationally-efficient t-SNE version, Barnes Hut SNE, has been developed in [18]. Another potential amelioration may rely on applying optimization techniques, such as Bayesian optimization, to select the optimal value of the perplexity during the training stage. Furthermore, another direction of improvement consists of using data augmentation techniques to generate large-sized data, which improves the construction of models and thus enhances the detection process. Also, it will be interesting to investigate the detection capability of this data-driven anomaly detection methodology in engineering applications, such as photovoltaic systems monitoring.
ABDELKADER DAIRI received the Engineering degree in computer science from the University of Oran 1 Ahmed Ben Bella, Algeria, the Magister degree in computer science from the National Polytechnic School of Oran, Algeria, and the Ph.D. degree in computer science from the University of Oran 1 Ahmed Ben Bella, in 2018. From 2007 to 2013, he was a Senior Oracle Database Administrator (DBA) and the Enterprise Resource Planning (ERP) Manager. He is currently an Assistant Professor in computer science at the University of Science and Technology of Oran-Mohamed Boudiaf. He has over 20 years of programming experience in different languages and environments. His research interests include programming languages, artificial intelligence, computer vision, machine learning, and deep learning. Research Group, which works on developing statistical models and methods for complex data to address important environmental problems. She has made original contributions to environmental statistics, in particular in the areas of spatiotemporal statistics, functional data analysis, visualization, computational statistics, with an exceptionally broad array of applications. She received two prestigious awards the Early Investigator Award in Environmental Statistics presented by the American Statistical Association and the Abdel El-Shaarawi Young Research Award from the International Environmetrics Society.