Assessment of Vehicle Handling Performance of Drivers Using Machine Learning Algorithms

Reliable assessment of an individual’s driving skills is very important. The growing population of unskilled drivers will result in disastrous and fatal road accidents. Current situation demands better and accurate automated assessment technique for driver performance. This research aims to development a portable independent assessment system which can be installed in any vehicle. This system will monitor and understand the driver’s skills based on the data from sensors and access their performance using machine learning algorithm. Machine learning based classifiers with more than 90% accuracy are developed for classifying the driver skills. This proposed system will ensure one from possibilities of beaching the formal process and getting the license without qualification, it is a suggestion to the conventional method of assessment. This system could help in building better-skilled driver on the road for a healthy road environment.


I. INTRODUCTION
The growing population and road traffic demands well qualified drivers for safety and wellbeing of everyone including drivers, passengers and pedestrians. The ability of the drivers to safely maneuver the vehicle normally comes with experience and practice. Young and inexperienced drivers are more likely to be involved in accidents [1]. It is inferred that India ranks first in the greatest number of causalities on road accidents around world. A report states that nearly the 29% of total accidents caused are by in-experienced drivers of young age group as per report by Ministry of Road Transports and Highways, India in 2019 [2].
The drivers are normally tested for their driving performance and ability to handle the vehicles during their licensing phase. A suggested way of improving safety and performance of new drivers is to have better and continuous assessment during their initial or training stage itself [3]. Novice drivers are very anxious and nervous which increase the probability occurrence of driving error [4]. Isler et.al in [5] reports that The associate editor coordinating the review of this manuscript and approving it for publication was Giovanni Pau . young drivers who subjected to higher order assessment are shown better improvement on vehicle handling in road. Sundström in [6] discussed about are various techniques is assessing a driver skill. The most used form of assessments are Practical Driver Competence (PPDC) and Self-Assessment of Driving skill (SADS). Lonero in [7] quoted these types of techniques are mostly relaying on questionnaire, it was an un-systematic research approach. It is very important to take research concern in this area, investing in developing systematic evaluation for better qualified driver is in must need situation. Usage of driving simulators for training the drivers are very common in recent times. These driving simulators provide an easier way for accessing the performance of drivers. The effect of driving simulator on individual assessment is contradictory, as the drivers react different with simulator on higher order assessment [8]. So, driver assessment through on road test will be more justified in nature. Also, the difference in simulator and on road assessment shows the significant level of error on assessment especial in young drivers [9]. Sundström in [10] summarized that driving skill is measured by judging the individual's skill in relevance specific aspects of driving skill and therefore the subjective skill expounds the evidence of reliability and validity can more easily be obtained. The self-assessment of driving skill is compared to the performance of simulator to test the accuracy of driving skill was made to under-stand the in-consistent drivers [11]. White et.al in [12] suggested that for less experienced young drivers, accountability interventions can be helpful.
Over the years many researches are working to develop sensor-based infotainment systems for analyzing driving skills. The data from different sensors are normally collected and analyzed for obtaining performance of the drivers [13], [14], [15], [16]. The vehicle-based measurements for driver performance assessment are categorized into three major groups [17] as listed below.
a. driver input to the vehicle (e.g. steering, braking), b. vehicle response to driver input (e.g. Velocity/ acceleration, jerk) c. vehicle state relative to the environment (e.g. Headway distance, time to lane change) The first two categories can be directly measured by sensors mounted inside the vehicle, whereas the third category requires information regarding the driving environment. Artificial intelligence and machine learning are playing bigger role in solving complex problems. Assessing behavior of a person through machine learning approach would result in accurate judgement consistently. The driving skill's data used for frame a pattern recognition which can be used for adopting then vehicle parameter in future [18]. Elassad et.al in [19] have provided a detailed frame work for analyzing drivers' behavior using machine learning. Sysoev et.al in [20] developed a model to predict the influence of the user's environment and activity information to the driving style in standard automotive environments. Aksjonov et.al in [21] developed a driver model, which is capable to predict each individual driver normal driving on a specific road segment with a reasonable degree of accuracy. Halim et.al in [22] developed a machine leaning model for profiling drivers based on their driving features. Chandrasiri et.al in [23] and Kim et.al in [24] have used machine leaning based models for predicting driver behavior pattern in curved loads and lane changing respectively. All these literatures use complex information from multiple sensors which are normally interfaced with vehicle infotainment system for predicting the driver performance.
The aim of the current work is to develop portable independent system based on machine learning for assessing the vehicle handling skill of drivers based on information from only two sensors. The MPU6050 gyro sensor and ADXL335 tri-axial accelerometer are mounted on the steering wheel of the automobile. The data from these sensors are used to validate the steering and acceleration-deceleration behavior of the driver. Independent machine leaning models are trained and tested for steering assessment and accelerationdeceleration assessment modules. Assessment is based on predefined criteria, if an individual's skills met that value, their profile will be suggested for licensing with the overall report of their activities. This research would eliminate any uncertainty in assessing an individual on driving.

II. DRIVER PERFORMANCE ASSESSMENT SYSTEM
The driving performance of an individual is largely influenced by their ability of handling the vehicles in accordance to the conditions. The parameters that are normally monitored include driver commands to the vehicles such as steering, braking, accelerating, gear shifting, etc., vehicle response to the driver such as vehicle speed, jerks, etc., and vehicle state relative to its surroundings such as headway, time to collision, etc. In the current study, only two parameters are measured using sensors namely steering angle and acceleration/deceleration of the vehicle. The data are acquired, logged and analyzed. Statistical features are extracted from these data and machine learning models are developed based on these data. The overall methodology of the present work is shown in Fig. 1.

A. ACCESSING PARAMETERS
To determine the driving skill of a driver two parameters steering skill, acceleration and deceleration skill are consider after in depth selection. Steering skill is one of the most important attributes in driving, the ability of skilled steering would take time and work to get it right for novice driver. So, considering steering skill as one of the parameters is rightful on judging one's driving skill. The steering wheel data is collected throughout the trip, based on the pattern identified from skilled driver's data the model is made, the skill is determined by the difference in the pattern.
Acceleration corresponds initial action done on moving a vehicle this attribute is considered for assessment because it contributes to power control skill of driver similarly, deceleration corresponds to braking skill that is one's ability to judge the mark of stopping. Based on acceleration force, individuals accelerating and starting skill could be determined, with respect to deceleration force, individuals braking skill could be determined. Both acceleration and deceleration would be directly related to harshness in driving.

B. HARDWARE SETUP
The hardware setup consists of master controller, sub controller, separate sensor for steering assessment and acceleration & deceleration assessment. Steering assessment is done VOLUME 10, 2022  by sensing angular rotational velocity using MPU6050 sensor. The MEMS gyroscope in the MPU-6050 provide angular rate range of ±2000 • /sec integrated with 16-bit ADCs with I 2 C communication. Acceleration and deceleration assessment uses tri-axial accelerometer ADXL335. The accelerations in x and y axes of ADXL335 with ±3.6 g range are commutated at bandwidth of 1600 Hz. The placement of sensor is mount on setup with steering mounter the position is shown in Fig. 2.
The components are interconnected as shown in Fig. 3. A Raspberry pi is used as a master controller. The data from sub controller is stored in pi and later the data is processed and assessed with predefined machine learning model. Arduino nano is used as sub controller for each module in order to as the data acquisition system for receiving data from sensor on constant rate without any data loss. The data is collected at  rate of 100 samples per second using serial communication. The data collected from both module and stored in a file in master controller.

C. TEST CONDITIONS
Real time data is collected from the sensors while the vehicle is driven in a closed controlled environment. The vehicle is bound to a speed limit of 40km/hr in a class A type road. The tests are conducted with drivers from different categories as listed in Table 1. All the licensed drivers belonging to groups A, B, C and D are categorized as skilled drivers while group E drivers are labelled as unskilled drivers. The tests are conducted on controlled environment with all precautions. Every participant has contributed to all five test conditions with a minimum of 50 numbers of trips each.

III. MACHINE LEARNING CLASSIFIER DEVELOPMENT A. FEATURE EXTRACTION
The first step in the development of the machine learning system is to extract features from sensor data. The following statistical features are extracted from the collected data.
• Mean -The mean is the average or the most common value in a collection of numbers. Meanx of n data samples can be obtained by (1).
• Median -The median is the center number in a sorted, rising or slipping, list of numbers. If n is odd, then the median is given by (2).
If n is even, then the median is given by (3).
• Mode -The mode is the value that shows up most as often as possible in a set. Mode of dataset is defined by (4).
where L is the lower limit of modal class, f m is the frequency of modal class, f 1 is frequency of class preceding the modal class, f 2 is frequency of class succeeding the modal class and h is size of class interval.
• Standard Deviation -Standard deviation σ is the measure of dispersion of a set of data from its meanx as given in (5).
• Variance -The term variance is a statistical measurement of the spread between numbers in a data set. The variance σ 2 for n dataset is given as (6).
• Kurtosis -Kurtosis K is a measure of the combined weight of a distribution and the center of the distribution as given by (7).
• Skewness -Skewness is the degree of distortion from the symmetrical bell curve in a probability distribution. Skewness g for n dataset is given as (8).
• RMS -The root mean square is a measure of the magnitude of a set of data. RMS value of a dataset is given by (9).

B. FEATURE SELECTION
The method of Recursive Feature Elimination (RFE) is a technique used for feature selection before training a model. The recursive feature elimination is type of predictors selection which build the model by set of important predictors and computes the score and rebuilt it with eliminating least important predictors. The process starts with elimination of highly correlated variables and zero correlated variables. The control variable decision tree is used for naming the control variable with rank in order to contribution to process variable. The recursive feature elimination method on steering assessment feature had elimination highly correlated feature and zero correlated feature from initial set of features, overall feature elimination and ranking was stated in Table 2. Fig. 5 graphically shows the correlation between the features.
Same process is also used for acceleration and deceleration assessment feature selection, the worked elimination and ranking was in Table 3. The same is represented graphically as heat map in Fig. 6. The recursive feature elimination ranking gives the feature ranking in accordance to the process variable. Based on rank, model will be trained with ascending order in rank.
Based on the RFE analysis, the list of statistical features selected for developing machine learning models for steering and acceleration/deceleration assessment is listed in Table 4. These features are used as input parameters is VOLUME 10, 2022   Classification learner app in MATLAB is used to find best model for acquired data. Classification learner is used to find the best classification model for the proposed driver assessment system. Based on feature ranking the control variables are selected. The model with best accuracy from considering any number of features in order of ranking will used. The best model for steering assessment is drawn from using the highest ranked feature, the model accuracy is first checked highest ranked feature then if accuracy with adding feature in rank is checked. Likewise, all combination is checked of accuracy and model type in classification leaner. The best model for steering assessment was obtained considering the highest ranked feature. The best models obtained for both steering and acceleration deceleration assessment model are compared.
Similar research on driver skill related to machine learning approach has been reviewed [19], found that 72% in this research are had used bayes, support vector machine, ensembles and neural network for the purpose of forming a decision-making algorithm. The purpose of the current study is different and considering the data that have been acquired, the best model from classification learner is developed and compared. This is approach is used because it allows to check all possible model with respect to type of our feature and data using classification learner.
The best models obtained from classification learner are compared and descripted. The top four type of model our feature is classified are.
• Naïve bayes • Support vector machine • K-nearest neighbor • Ensembles The naïve bayes learner are based on probability dependence of the feature, this model is purely based on bayes theorem of probability. There are various operators in executing this Bayesian method such as gaussian, kernel and other. In all naïve model the input variable is independent to each other. In naive Bayes, several methods for improving the conditional independence assumption have been implemented. The Gaussian naive Bayes classification is an example of naive Bayes classification is a system based on the assumption of a Gaussian distribution [25]. The biggest difficulty in the naive Bayes model is class conditional probability estimation, and the solution is to extend the kernel density estimation method to handle unknown data [26]. Support vector machine is classification algorithm used separate groups. This classifier is one the most popular algorithm in classification, it works by separation of data point through feature by hyperplane separation. There are various types of sup-port vector machine in general with respect to their kernel function. SVM is used to perform a twodimensional classification of a collection of data that was previously one-dimensional. A kernel function, in general, projects data from a low-dimensional space to a higherdimensional space.
The K-nearest neighbor is type on non-parametric classifier, this predictor works great in case of new data point. Classifier assumes locally constant class conditional probabilities and ignores the fact that the sample's closest neighbor should contribute more to classification. KNN is type of analytic classifier which consider its neighbor and based on their bias their classification in done. There many types of KNN based on their working function.
The ensemble classifier is hybrid type of classifier, it is obtained by using the results two or more classifiers this way new classifier is formed. Ensemble learning is a technique for generating multiple base classifiers from which a new classifier is extracted that outperforms any constituent classifier. Ensemble model are generally used for complex type of data classification that cannot be solved by any single type of classifier in great extent. There are many types of ensemble model, stacking, blending, bagging and boosting, these advanced classifiers are capable of solving complex multi-dimensional data through collective approach.

D. 10-FOLD CROSS VALIDATION
Cross-validation is used to evaluate the performance of machine learning models. The data set of the selected features as mentioned in Table 4. for all the trips are randomly shuffled. Then the data set is equally split into 10 groups. For each unique group of data, the machine learning models are trained with remaining 9 groups of the dataset and cross validated with the selected dataset. This ensures results to be less biased on data selection or less optimistic estimate of the model. Two separate machine leaning classifiers are developed for assessing steering handling and vehicle acceleration and deceleration based on the statistical data set from the different sensors for different drivers and for different test conditions.
The following parameters are used for comparing the performance of different ML algorithms.
• Accuracy -Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. • Precision -Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
• Recall (Sensitivity) -Recall is the ratio of correctly predicted positive observations to the all observations in actual class.

A. STEERING ASSESSMENT MODEL
The features obtained from the steering sensor data are used to train the eight machine learning models to classify (skilled or unskilled) for accessing steerability of the drivers. The ML algorithms trained include Gaussian Naïve bayes (GNB), Kernel Naïve bayes (KNB), Support Vector Machine -Linear (SVM-L), Support Vector Machine -Cubic (SVM-C), K-nearest neighbor -fine (KNN-F), K-nearest neighbor -weighted (KNN-W), Ensemble-Subspace (ES) and Ensemble-Bagged Tree (EBT). The classifier models are developed using different dataset for 10-fold validation of their performance. The box plot showing the data distribution for classification accuracy, precision and recall for 10-fold cross validation is shown in Fig. 7, Fig. 8 and Fig. 9 respectively.
The average performance metrices for all 10 trails in 10-fold validation for steering data is given in Table 5. The gaussian naïve bayes is extension of regular probability distribution. The kernel naïve bayes is function where the probability theorem is calculated on the basis of weights distribution. The gaussian distribution offers simplicity in model formation and reduces complex calculation. The naïve bayes model for steering assessment gives accuracy of 93.30% and 97% for gaussian and kernel type respectively.
The linear kernel type support vector machine classifier works in linear plane separation on classifying. The cubic type support vector machine classifier uses polynomial plane function. The support vector machine classifier working well with linear type kernel say the control variable is highly VOLUME 10, 2022   contributing towards process variable. The support vector machine model for steering assessment model gives accuracy of 96.70% and 95% for linear and cubic type respectively.
The K-nearest neighbor on fine type work with relation to distance of near data point. The k-nearest neighbor is type of weighted model that works by considering the nearest neighbor's weights is considered rather than distance. The K-nearest neighbor model for steering assessment gives the accuracy of 97.3% and 98.1% with fine and weighted type respectively.
The ensemble bagged tree model work principle of bootstrap aggregation. The model aggregates the resultant of model form n number classifier before giving out the prediction, these types of models are very efficient in producing better accuracy than normal tree type of classifier. The ensemble sub-space model work function recursion type of feedback with their classifiers. The ensemble model gives the best accuracy of 96.7% and 98.3% for steering assessment model with subspace and bagged tree model respectively.
From the comparison of all the best model, ensemblebagged tree classification gives the best accuracy for steering assessment as shown in Table 5. It is observed that most the model obtained for steering assessment is excellent in their precision it implies that model created was formed in utilizing all the feature efficiently. The best model for steering assessment is ensemble bagged tree it works at the accuracy of 98.3% with clocking a precision of 98% and recall value of 98%. So, Ensemble-bagged tree model is selected as the best model to assess the steering ability of an individual.

B. ACCELERATION-DECELERATION ASSESSMENT MODEL
The features obtained from the accelerometer sensor data (in both x and y axes) are used to train the different machine learning models to classify (skilled or unskilled) for accessing vehicle handling ability of the drivers. The same eight ML algorithms as used in steering assessment systems are developed for acceleration-deceleration assessment system. The classifier models are developed using different dataset for 10-fold validation of their performance. The box plot showing the data distribution for classification accuracy, precision and recall for 10-fold cross validation is shown in Fig. 10, Fig. 11 and Fig. 12 respectively.  The average performance metrices for all 10 trails in 10-fold validation for steering data is given in Table 5. The naïve bayes model was obtained at accuracy of 78.7% and 83% for gaussian and kernel type respectively. The support vector machine model for acceleration deceleration assessment model gives accuracy of 81.3 % and 83.7% for linear and cubic type respectively. The K-nearest neighbor model for acceleration and deceleration assessment gives the accuracy of 87.3% and 90.7% with fine and weighted type respectively.
The ensemble model gives the best accuracy of 88.7% and 90% for steering assessment model with subspace and bagged tree model respectively.
From the comparison of all the best model for acceleration and deceleration assessment the KNN model gives the highest accuracy of 90.7% as shown in Table 6. This KNN model is used to assess the acceleration and deceleration ability of an individual. The best model for acceleration and deceleration assessment is KNN weighted model it works at the accuracy of 90.7% with clocking a precision of 91% and recall of about  90%. This model is used in driver assessment system to assess the acceleration and deceleration ability of an individual.

C. OVERALL INTEGRATION AND ANALYSIS
The overall driver performance assessment system includes two independent classifiers models for steering and acceleration-deceleration assessment systems. The ensemble model for steering assessment is drawn as best model with accuracy of 98.3% and the KNN-weighted model for acceleration-deceleration assessment with accuracy of 90.7% is the best model for assessing a novice driving skill. The model with highest accuracy for both steering and acceleration deceleration assessment is used in a driver assessment system's operation. With the resultant model, customized linear input of value was given to make a simple decision tree in addition to the obtained model to mark the level of the driver.
Further analysis has been carried out for determining the influence of driver experience with the prediction accuracy. As listed in Table 1, the drivers are classified into different category based on their driving experience into five different categories -A, B, C, D and E. For machine learning model development, the data from all the licensed drivers belonging to groups A, B, C and D are categorized as skilled drivers while group E drivers are labelled as unskilled drivers. The confusion matrices of selected classifier models (Ensemble-Bagged Tree classifier for steering assessment system and KNN weighed classifier for acceleration-declaration assessment system) for different   classes of drivers are shown in Fig.13 and Fig. 14. It is inferred that classification error is very low for high dataset from experienced drivers compared to least experienced drivers.
The influence of the data acquired from different test conditions as shown in Fig. 4 in classification performance of the machine learning system is analyzed. Fig. 15 and Fig. 16 shows the classification accuracy for different test conditions of selected classifier models (Ensemble-Bagged Tree classifier for steering assessment system and KNN weighed classifier for acceleration-declaration assessment system). The classification accuracy is better for straight road conditions compared to curved roads. This supports the claim made by Chandrasiri et. al in [23] that the feature prediction algorithms must be improved for predictions in curved roads.
The driver performance assessment system using developed ML classifiers is deployed as python code in the Raspberry pi microcontroller. A graphical user interface is created as a front-end for better user interaction. The system will provide assessment results after every trip of the driver. If an individual result is above preset threshold condition their profile will be suggested to licensing or they will be suggested to take more trips to attain results of that level. This system will be useful for both self-assessment of driving performance in development stage and also for authorities for issuing driving license.

V. CONCLUSION
In this paper, a portable system for assessing an individual driving skill using machine learning algorithm has been presented. The assessing parameters and conditions are stated, similar approach can be made for developing a system for assessing other parameters with all possible conditions. Developing this portable independent system would help in assessing an individual's driving skill. Ensemble-Bagged Tree model developed for assessing steering ability performs satisfactorily with accuracy of 98.30% and accelerationdeceleration ability was assessed by KNN-Weighted model with accuracy of 90.7%. Overall system's operation is integrated to a portable setup, this system will examine and ensures each individual's driving skills. This system will help in building a skilled driver on the road. Further this research can be extended to assessing other driving parameters in future.