Enhancing Road Safety Through Accurate Detection of Hazardous Driving Behaviors With Graph Convolutional Recurrent Networks

Car accidents remain a significant public safety issue worldwide, with the majority of them attributed to driver errors stemming from inadequate driving knowledge, non-compliance with regulations, and poor driving habits. To increase road safety, several studies proposed Driving Behavior Detection (DBD) systems that can differentiate between safe and unsafe driving behavior. Many of these papers used the sensor information retrieved from the CAN (Controller Area Network) bus to construct their models. According to the existing literature, using public sensors reduces the detection model’s accuracy while adding vendor-specific sensors into the data increases the accuracy. However, the earlier techniques’ utility is limited by the use of non-public sensors. As a result, this paper presents a reliable DBD system based on Graph Convolutional Long Short-Term Memory networks in order to improve the detection model’s precision and practical usability for public sensors. Additionally, non-public sensors were utilized to assess the model’s effectiveness. The proposed model achieved an accuracy of 97.5% for public sensors and an average accuracy of 98.1% for non-public sensors, which shows that the proposed model can produce consistent and accurate results for both scenarios. The proposed DBD system deployed on Raspberry Pi at the network edge to analyze the driver’s driving behavior locally. Drivers can access daily driving condition reports, sensor data, and prediction results from the DBD system through the monitoring dashboard. A voice warning from the dashboard also warns drivers of hazardous driving conditions.


I. INTRODUCTION
Unsafe driving is an abnormal pattern of social behaviors such as speeding, tailgating, rude gesturing, honking, improper lane changing, and distracted driving, which constitute a serious threat to public safety [1].According to data collected in the year 2019 from the AAA foundation, approximately 80 percent of drivers in the United States of America engaged in unsafe driving behaviors at least once in the 30 days before the survey [2].Evidently, Driving behavior is one of the important factors involved in driving safety [3].Many studies have implemented to develop a driving behavior detection (DBD) system which can detect the drivers' actions and help them to drive safely.Most studies used machine learning or deep learning algorithms to solve this problem.The support vector machines (SVM), hidden Markov models (HMM), knearest neighbors (KNN), artificial neural networks (ANN), recurrent neural networks (RNN), and convolutional neural networks (CNN) are the common algorithms used in previous papers for building a model that can detect the driving behavior accurately and efficiently.
The collection of driving behavior data is an important aspect of a DBD system.Currently available techniques for monitoring and collecting the driving data are based on various technologies, such as Global Navigation Satellite Systems (GNSS), Global Positioning Services (GPS), In-vehicle sensors data extracted from Controller Area Network (CAN) bus, vision systems, and recently utilizing the smartphone's sensors [4].We decided to build our IoT-enabled lightweight DBD system to analyze in-vehicle sensor data extracted from the CAN bus in order to compare our data model accuracy on OBD-II compliant and vendor-specific sensor data.OBD-II compliant are used for diagnostics, and addressed using parameter identifiers (PIDs) listed in the SAE J1979 standard.The OBD-II compliant PIDs are similar in all cars produced after 2001 [5].The PIDs are used to extract the sensor data by diagnostic scan tools [6].On the other hand, some of the sensors are vendor-specific, and their protocol details are not publicly available.It is a costly task to extract the data for these sensors [7], which may vary on different car models.In this study, we consider the OBD-II compliant and vendor-specific sensors as public and non-public sensor data, respectively.
[5] used SVM and ANN algorithms to classify the drivers' driving behavior using in-vehicle sensor data.The authors discovered that the accuracy of the proposed algorithms decreased while using only the public sensor data.However, the combination of both public and non-public signals increases the accuracy.[7] also evaluated the performance of different algorithms such as Random Forest, Gradient Boosting, SVM, Decision Tree, and KNN on both public and non-public sensor data.The results show that using only public data decreases the accuracy of proposed classification models.Although Using both types of data obtain good accuracy, it limits the practical use of the existing methods in many applications [5], [7].For this reason, we made the decision to build our DBD system based on graph convolutional neural networks and long short-term memory networks (GConvLSTM) to increase the accuracy of public signals proposed in [5].Additionally, in order to compare the effectiveness of both strategies, we train and assess our model for the combination of public and non-public subsets of sensor data described in the same study.The main contributions of this study are: • We propose a reliable IoT-enabled DBD system using GConvLSTM to provide a robust solution that can differentiate between safe and unsafe driving behavior based on the in-vehicle sensors data extracted from the CAN bus through the OBD-II connector.• We propose to enhance the accuracy of unsafe driving detection presented in [5] for public signals.
• We propose to deploy our DBD system at the edge to process and analyze the driving behaviors locally.• We propose to develop a dashboard that enables the driver to monitor the prediction results, in-vehicle sensor data, and daily reports of driving conditions.Additionally, the dashboard alerts drivers of unsafe driving conditions through voice notification.

A. Controller Area Network and On Board Diagnostics
Nowadays, cars are equipped with various electronic control units (ECUs).These ECUs are microcomputers connected to multiple sensors around the car to monitor, control, and optimize the car systems.The ECUs communicate through the CAN, a revolutionary standard communication bus introduced by ROBERT BOSCH GmbH in the early 1980s [8].When any ECU broadcasts data packets over the CAN bus, other ECUs can check the broadcasted packets and decide whether to ignore or receive them.Data extraction directly from the CAN bus for diagnostics and reporting is difficult because all the sensitive data transferred through the CAN bus and interaction with them could be risky and very complicated.For this reason, higher abstraction level protocols like OBD (On Board Diagnostics) were introduced to the market [9] to serve as a self-diagnostic system in vehicles.The OBD works on top of the CAN standard, and the generated OBD messages are encapsulated in CAN messages.In the automotive field, OBD is the main contributor in carrying information and logs exchanged among the ECUs.The OBD-II is the second generation of the OBD protocol, which introduced the universal OBD-II connector for self-diagnostics, reporting and analysis.The OBD scanner is connected to the OBD-II connector to interact with ECUs through reading and writing OBD messages on the CAN bus.Although any car with an OBD-II connector provides access to the data packets generated by ECUs, external diagnostic tools cannot monitor all of them.The users only have access to a subset of monitoring signals such as car speed, engine load, fuel, air pressure, engine revolution speed, etc.The signals from OBD-II are identified by PIDs (parameter identifiers) described in the SAE J1979 standard.The PIDs listed in SAE J1979 are public data and are common in most cars.Some car manufacturers specify additional PIDs to their cars to provide access to more sensors such as steering angle, brake pressure, wheel speeds, etc.Although these signals provide more information about the car's performance, they may vary for different cars, and their protocol details are not publicly available.Hence, using nonpublic data limits the practical use of the DBD systems.In this study, we aim to increase the accuracy of signals with public PIDs using graph neural networks (GNN), which can make our DBD system more reliable and efficient than previously proposed systems.In addition, we also train and evaluate the proposed GNN model using both public and non-public signals to compare their impact on classification accuracy.

B. Deep Learning and Edge Computing
Delivering the data generated from IoT devices to the cloud servers for processing is a common practice in the IoT space.However, any delay in exchanging data between the cloud servers and IoT devices can produce unacceptable results in latency-sensitive IoT applications such as autonomous driving, smart healthcare systems, robotics, connected vehicles, etc.The bandwidth cost, latency, and privacy are the challenges that make the cloud computing paradigm inefficient in processing and analyzing the data involved in IoT applications [10], [11].Edge computing has been introduced to tackle these challenges efficiently [12].Offloading computational tasks from cloud data centers to the network edge reduces latency, network traffic, and computational cost in IoT projects.Deep learning, on the other hand, is crucial in IoT applications.It outperformed conventional machine learning algorithms in mining raw data generated by devices deployed in complex and noisy IoT environments [13].Therefore, several studies proposed approaches to implement deep learning methods at the edge.[14] deployed a novel spatial-temporal dynamic network (STDN) for traffic forecasting at the edge server using CNN and Long Short-Term Memory (LSTM) networks.[15] implemented a neural network-based model at the edge server to analyze the energy consumption of a building, and [16] deployed a deep learning-based food recognition system at the network edge to improve the response time.An edge computing strategy can considerably boost the performance of IoT devices, according to prior studies.In order to increase project efficiency, we chose to implement the proposed GNNbased DBD system at the network edge.

III. RELATED WORK
Several approaches have been proposed to differentiate between safe and unsafe driving behaviors to improve road safety.This section highlights the methods used for the driving behavior detection task.[5] proposed an objective methodology to distinguish between safe and unsafe driving behaviors using the SVM and ANN algorithms based on the data extracted from the CAN bus.The SVM and a feedforward neural network have been trained and evaluated over a publicly available dataset containing more than 26 hours of total driving for ten different drivers.Experimental results show an average accuracy of 90% for the proposed model in identifying unsafe driving conditions.[17] proposed an approach to detect the distinct patterns of aggressive driving based on vehicle sensor data and metadata of each trip.Using the KNN-based model resulted in discovering 4 different aggressive patterns inside the dataset, such as stop-and-go driving, abruptly changing lanes, driving too fast for road conditions, and not properly maintaining lane (or zigzagging) with an average accuracy of 81%.[18] utilized HMM and attention-based LSTM networks to predict aggressive driving behavior based on multivariate-temporal feature data such as driver characteristics, environment, and in-vehicle sensor data.Using the proposed model resulted in an average accuracy of 80%.[19] built a deep learning model through stacked denoising sparse autoencoders (SdsAEs) for abnormal driving detection based on the normalized driving behavior data.The proposed SdsAEs model obtained a good performance for detecting abnormal deriving with an accuracy of 98.33%.[20] utilized CNN and LSTM networks to build driving behavior detection models and compare them based on the data extracted from actual two vehicles (V1 and V2).The driving behaviors have been divided into three rankings (A, B, C) for this study, where A, B, and C indicate merit, pass and fail, respectively.The achieved prediction percentages of CNN and LSTM for all driving behavior rankings are higher than 80%.
Some other studies analyzed the driver's driving behavior for driver identification based on in-vehicle sensor data.[21] implemented a study to distinguish an impostor from a car owner through five classifiers; J48Consolidated, RandomTree, J48graft, RepTree, and J48 provided by Weka.Based on the results, J48 and J48graft had better performance in terms of Precision and Recall in identifying the correct class (owner or impostor) for feature vectors belonging to each driver extracted from CAN bus data.[7] compared the impact of using the sensors with/without the non-public PIDs on the performance of the driver identification task.The achieved result indicates that the accuracy of models trained with public parameters decreased by nearly 15%, where the combination of both public and non-public signals increased the accuracy for all algorithms.Furthermore, the random forest obtained higher accuracy than the decision tree, gradient boosting, SVM, and KNN for both public and non-public data.[22] used fully convolutional networks (FCNs) and LSTM for driver profiling and identification to enhance the security in connected cars.The FCN-LSTM gained higher accuracy than other models after testing on different datasets.Similarly [23] implemented an end-to-end deep learning framework using CNNs and RNNs with additional attention mechanisms.The in-vehicle sensor data is fed into convolutional layers where the outputs are used as input for recurrent layers with an attention mechanism for time series extraction.Finally, the output layer after recurrent layers is employed to determine the possibility of class distribution for DBD.The DeepConvGRU-Attention and DeepConvLSTM-Attention obtained higher accuracy than existing DeepConvGRU and DeepConvLSTM without attention units.
Most of these studies managed to make a DBD system with good detection accuracy while sacrificing the practical use of the proposed models.As mentioned earlier, using the sensors with/without non-public PIDs affect the accuracy and practical usage of the classification models.In addition, previous studies show that deep learning methods offer a lot of promise for implementing a DBD system based on time series data.In this study, we focus to enhance the practical applicability of our deep learning model by increasing the accuracy of sensors with public PIDs.Furthermore, we investigate to what extent the accuracy of our model can be increased by adding nonpublic data in the process.

IV. METHODOLOGY
The purpose of this study is to design and develop an IoT system that can locally detect the unsafe driving behavior based on the CAN bus data using edge computing and the GConvLSTM algorithm.The output of this study is a DBD system that can be deployed at the edge, where it can process and analyze the drivers' behavior locally to warn them of any unsafe driving conditions.Providing a reliable and efficient DBD system using in-vehicle sensors depends on various factors as follows: Significant feature selection: Each sensor is a physical entity in our IoT-enabled DBD system and has a column in the driving data set.Thus, the columns in the dataset are virtual entities.Columns are called features of a dataset.The values of these features are used in analyzing driver behaviors.Although all features can be used for DBD, selecting the most significant and relevant features improve the accuracy in classification models and reduce the training time [24].We use 3 subsets of features to evaluate our deep learning model. 2 subsets are similar to selected features in [5], and the third subset contains 30 features that are positively or negatively correlated.
Classification technique: In order to classify the driving behaviors, we can use different approaches such as rulebased programming, statistical model, machine learning, or deep learning algorithms.Deep learning algorithms achieved better performance in terms of accuracy and adaptation than other approaches [3].In this study, We decided to use graph convolutional recurrent neural networks to classify driving behaviors into safe and unsafe classes, because they have recently achieved high quality and accuracy in spatio-temporal predictions.
Deployment environment: DBD system as an end-to-end IoT solution could be deployed using different layers of computing; cloud, fog, or edge.Compared to cloud computing where the computation services and resources are centralized at large data centers, fog, and edge computing are involved in bringing the computation units closer to the data sources.The edge computing approach processes the data such as sensor data close to the logical edge of the network.Some studies like [22], [25] deployed the DBD system at the edge layer to improve response time, security, privacy, and save bandwidth.In this work, we also attempt to deploy our DBD system using an edge computing approach.

A. Dataset
We use the OCSLab driving dataset [26] in this study.This dataset contains 94,401 records with 51 features collected from in-vehicle sensors every second.Each record has been labeled with alphabet letters from "A" to "J".The size of the dataset is 16.7MB in total.They extracted the driving data using the OBD-II connector from a KIA Motors Corporation car in South Korea while the drivers were driving the car.The number of drivers who participated in this experiment is ten.

Safe driving conditions
Unsafe driving conditions Fig. 1.The safe and unsafe driving conditions divided using Equation 1 [5].
The length of the driving paths is 23 km and includes three types of motorway, city way, and parking space.All drivers completed 2 round trips for accurate and reliable classification.

B. Data Labeling
Each supervised classification algorithm requires data associated with proper labels to be used in model training and evaluating tasks.Records in OCSLab driving dataset have been labeled with alphabet letters from "A" to "J".However, our DBD system aims to differentiate between safe and unsafe driving behavior.For this reason, we need to label the data set again.In order to label the records of the dataset based on our system objectives, we followed the work presented in [5].The Equation 1 has been used in that study to mark each time window with safe and unsafe labels.
which describes a quadratic relationship between acceleration and speed that demonstrates a tolerated acceleration decrease when speed increases [5]. Figure 1 shows how this function separates the safe and unsafe areas from each other on the (V, |ā|) plane.The points above the curve are considered unsafe driving areas, and below the curve represent the safe driving area.We also use the same approach to label the records in the dataset using Equation 1.Then, 69% of the data have been labeled as "safe" and the rest 31% as "unsafe".

C. Feature Normalization
The numerical features of the dataset have different ranges of values, and they need to be normalized.The normalization process changes the value of numerical features to a common scale, and all the features will be treated equally in the learning algorithm.In addition, it improves the deep learning model performance and stability.The method that we chose is Min-Max scaling which scales the data into a range of [0,1] based on the following equation where min is the minimum value and max is the maximum value of the feature x i , respectively.

D. Graph Structured Data
The OCSLab driving dataset has been constructed from rows and columns.Rows are the observations recorded every second and the columns represent their features.On the other hand, our DBD system is built based on graph neural networks, which only accept graph-structured data.The Graphs are constructed using nodes and edges.We consider the sensors and relationships between them as nodes and edges, respectively.In order to compute the relationships among the invehicle sensors, we use the same approach presented in [27].Each weighted graph has been represented using a weighted adjacency matrix constructed based on Pearson correlation coefficients (PCCs).They used the PCCs to compute the relationship between the sensors, and it is formulated as Where σX and σY are standard deviations of X and Y , Cov(X,Y ) is the covariance of X and Y.The computed ρX, Y is a value in a range of [-1,1] that represents the relationship between two variables.Then the weighted adjacency matrix is expressed as Based on the learned function, the model can predict the label ŷi ∈ {0, 1} for S i at certain time steps after T .The x t is the data at time step t defined as x t = (x 1 t , x 2 t , . . ., x D t ) T , where D shows the dimensionality of x t .According to the justification given in the preceding subsection the x t can be expressed as a weighted graph G = (V, E, A), in which V is a set of finite nodes |V | = n, E represents the set of edges and A ∈ R n×n is a weighted adjacency matrix.Consequently, the S i is a sample of multivariate time series where each time step of the observations is a weighted graph associated with the label y ∈ {0, 1}. Figure 2 shows the relationship-based graph constructed based on observations at time t.We considered a time window of 10 seconds with 50% overlap for this study.
2) System Design and Data Flow: Our IoT system consists of two main layers; the Data source layer, and the edge layer.The following describes each layer and its functionalities in detail.
Data source layer: This layer has only one component that is responsible for regenerating a subset of the in-vehicle sensors data selected from the dataset.This component has been developed using Python language, and it has two modules.The first module generates sensor values every second.Another is an MQTT client for publishing the generated values to the MQTT broker, where the data analytics engine can consume the sensor's data.In a real-world scenario, the data should be extracted using an OBD-II scanner in real-time.Because of the limitations in implementing this project, we only use this data generation layer to generate values every second to test the performance of our deep learning model, and the monitoring dashboard features locally.
Edge layer: The edge layer contains three components with specific functionalities.The data processing component provides two services, collecting and preprocessing the sensor data.This component consumes the sensor data from the MQTT broker by subscribing to the predetermined topic.Then, the preprocessing module converts the collected data into graph-structured data.The second component is our DBD model developed based on graph convolutional long short-term memory networks (GConvLSTM) algorithm.The component receives the graph-structured sensor data for the previous ten seconds and classifies the driving behavior for the next ten seconds as "safe" or "unsafe".The third one is the monitoring component which helps us monitor the in-vehicle sensor data that is updated every second.In addition, the monitoring dashboard alerts the drivers of their unsafe driving behavior through a voice notification.Figure 3 illustrates the system architecture and data flow for our proposed IoT-enabled DBD system.
3) Spatio-temporal Prediction on Graph Data: In order to implement Spatio-temporal modeling to classify driving conditions, we decided to use GConvLSTM.This is the first study on unsafe driving behavior detection that used the GConvLSTM algorithm.The following sections describe this approach in detail.
Graphs: a graph is a structure that represents a finite set of data objects in which some objects are related to each other.Formally, a graph is expressed as G = (V, E), where V is a finite set of nodes (objects) and E is a set of edges that shows the relationships between the nodes.Each edge is denoted as (u, v) ∈ E which shows the relationship between node u ∈ V and node v ∈ V .Edge (u, v) ∈ E can also be written as e uv , uv and generally e [28].A convenient and common way to represent a graph is using adjacency matrix A ∈ R |V |×|V | .The adjacency matrix is used to order nodes of a graph where each node indicates a particular row and column.If there is an edge between two nodes or a node and itself, we fill the matrix with 1 and 0 otherwise.For instance, the presence of an edge between u and v is expressed as A[u, v] = 1 and otherwise A[u, v] = 0.The A will be a symmetric matrix if the graph is undirected, this is not required for a directed graph.If the graph contains weighted edges, the entries in the adjacency matrix will be real values rather than 0,1. Figure 4 illustrates a simple weighted undirected graph with its adjacency matrix.Graph structured data has rich relation information among objects which can be used to represent complex problems in various areas including social science, natural science, knowledge graphs, and many other domains [29].The great expressive power of graphs in representing complex problems motivates researchers to apply deep learning methods for graph data.
Deep Learning on Graphs: conventional machine learning approaches require considerable effort and attention to transform the raw data into suitable feature vectors or representations using feature extraction algorithms before applying the learning algorithms [30].On the other hand, deep learning methods allow a machine to automatically discover representations from raw data through multiple processing layers that can use non-linear modules [31].The representations are transformed from one layer into another, and the composition of enough transformations can deal with complex data such as time series and graphs [32].Recently, substantial studies have been implemented to exploit the deep learning methods on ubiquitous graph data which can be categorized as graph convolutional networks, graph recurrent neural networks, graph reinforcement learning, graph encoders, and graph adversarial methods [33].In this study, we focus on graph convolutional and recurrent neural networks to explore multivariate time series classification.
Graph Convolutional Networks: graph convolutional networks (GCNs) can be considered as convolutional neural networks (CNNs) for graph data.Multiple processing layers, local connections, shared weights and pooling layers are key ideas behind the CNNs architecture [31], which allow a machine to extract and compose the spatial features to construct meaningful representations from euclidean data such as images.These operations are also essential in dealing with graph data while inspecting neighboring nodes to discover the representations from one layer to another one.The difference in implementing the CNNs and GCNs is that CNNs are built for Euclidean or regular structured data while the GCNs are designed for non-Euclidean structured data.Hence, the convolutions filters and pooling operations on graphs are not as well-defined on Euclidean data such as images [34].The convolutional operations on graph-structured data are categorized as spatialbased GCNs and spectral-based GCNs.We use spectral-based GCNs to implement the proposed driving behavior detection model.
In spatial-based methods, graph convolutions are defined based on nodes' spatial relations as analogous to convolutional operations in CNN models on images.The CNN models are designed with some implicit assumptions about the structure of input data.For instance, images are grid-like data where each pixel has a fixed number of neighbors, and the spatial order for scanning the images is naturally determined [34].However, in arbitrary graph data, the number of neighbors for each node and the spatial order of the nodes is variable.Some studies have been proposed to perform spatial-based GCNs on graph data [35], [36], [37].
The receptive fields used in spatial approach are spatially constructed directly on graphs while the convolution operations in the spectral domain rely on spectral graph theory, where the graph signals need to be transformed into the spectral domain through eigenvectors of graph Laplacian.The first framework for spectral convolution on graph data has been implemented through the graph Laplacian matrix proposed by [38].The important issue of this approach is eigenvalue decomposition which has O(n 3 ) computational complexity.[39] facilitated this drawback by offering linear complexity O(|E|) and providing a strictly localized filter.[40] developed a novel model for semi-supervised classification on graphs based on first-order approximation of spectral convolutional filters proposed by Defferrard.In this study, we use a model proposed by [41] based on the Defferrard technique, where the spectral formulation of convolution operation is defined in the Fourier domain as where is an element-wise Hadamard product.It follows that a signal x is filtered by g θ as where g θ (Λ) = diag(θ) is a non-parametric kernel and θ ∈ R n is a vector of Fourier coefficients.The normalized graph Laplacian is a matrix representation of a graph and it is defined as L = I n − D −1/2 AD −1/2 , in which I n is the identity matrix and D ∈ R n×n is the diagonal matrix of node degrees with D ii = j (A i,j ) [41].The normalized graph Laplacian matrix can be diagonalized through orthogonal matrix U such that L = U ΛU T , where U = [u 0 , u 1 , . . ., u n−1 ] ∈ R n×n is a matrix of eigenvectors and Λ ∈ R n×n is the diagonal matrix of eigenvalues of the normalized graph Laplacian.The graph signal x is filtered through the element-wise multiplication of g θ and U T x, which is the graph Fourier transform of x.However, the learning complexity of a non-parametric filter is in O(n), and computing the eigenvalue decomposition of L for large graphs can be expensive.To overcome this issue, Defferrard parameterized the filter g θ by Chebyshev polynomials T k expressed as where the parameter θ ∈ R K is a vector of Chebyshev coefficients and T k ( Λ) ∈ R n×n is the Chebyshev polynomial of order k evaluated at Λ = 2Λ/λ max − I n , the value of Λ is in range of [-1, 1].The graph filtering operation can be defined as where T k ( L) ∈ R n×n is the Chebyshev polynomial of order k evaluated at the scaled Laplacian L = 2L/λ max − I n .
Long Short-Term Memory Network (LSTM): LSTM introduced by [42] and it is a special type of recurrent neural network (RNN) that is extensively used in time series forecasting.It is difficult for RNNs to carry information to current time steps from earlier ones, which leads to information loss.In order to keep the influence of earlier inputs on the prediction, the LSTM adds a unit to RNN for storing the long-term states, which are managed through the gates.As a result, the LSTM helps in avoiding the vanishing gradient problem.In this study, we follow the LSTM variation described by [43] and a LSTM cell is formulated as each cell of the LSTM unit is considered a memory unit with a state C t at time t.The f , i and o indicate the forget gate, input gate and output gate, respectively.Reading or modifying this memory unit is performed through these gates.
In a nutshell, they learn and decide which information in a sequence is essential to be kept or discarded.At each time step, the LSTM cell receives inputs from two sources: the current input x t and previous hidden states h t−1 .The weights are represented by In the formula, is the Hadamard element-wise product and σ(.) denotes the sigmoid function.This LSTM model use Peephole connections introduced by [44] which means that each gate has a peephole connection with the cell state.
Convolutional LSTM (ConvLSTM): ConvLSTM is a form of LSTM for spatio-temporal prediction introduced by [45] which replaces internal matrix multiplications with convolution operations at layer transitions.The following are the key equations of ConvLSTM, where * denotes the convolution operator and the Hadamard product Graph Convolutional LSTM (GConvLSTM): As previously stated, traditional convolutional operations are unsuitable for non-Euclidean structured data.We use the graph convolutional model proposed by [41] in order to implement multivariate time series classification on graph data.They generalized the ConvLSTM model presented by [45] for graphs data through replacing the 2D convolution operation * by the graph convolution operation * G This model is an implementation of GConvLSTM, where the graph convolutional kernels are defined using Chebyshev coefficients as W x ∈ R K×d h ×dx , W h ∈ R K×d h ×d h , the K indicates the number of parameters which is independent from number of nodes n.The W xi * G x t performs the graph convolution operation on x t using d x d h filters which are the functions based on graph Laplacian L parameterized by K Chebyshev coefficients as mentioned in Equations 7 and 8.

A. GConvLSTM
We used GConvLSTM to implement unsafe driving behavior detection.The input of the GConvLSTM model is 10 time steps (10 seconds) of in-vehicle sensors data, and the output of the model is a label/class predicted for the next 10 time steps.The predicted label/class can be {0:"safe driving condition" } or {1:"unsafe driving condition"}.We considered two hidden layers for our GConvLSTM model, where the first hidden layer has 32 neurons and the second hidden layer contains 16 neurons.The dropout technique is implemented before the output layer to prevent the model from overfitting.The probability value for dropout is 0.5.Dropout randomly sets the elements on the input layer to zero based on the specified probability only during the training stage to decrease overfitting.The output layer labels the representations produced by the previous layer through the sigmoid activation function.The model is trained by minimizing the binary crossentropy loss (BCELoss) and optimizing through the Adam optimizer.The loss function helps to evaluate the candidate set of weights which leads the model to learn to decrease the error in prediction.The BCELoss is expressed as where, y is the label (0 or 1), p i is the probability of class 1, and 1 − p i is the calculated probability of class 0. The Adam optimization method is considered an extension for stochastic gradient descent to update the weights of the network iteratively in training time.Adam's method is effective and efficient because it can achieve good results and needs less memory for the problem with a huge amount of data.In this study, the learning rate for the Adam optimizer is 0.001 and the number of epochs to train the model is 20. Figure 5 shows the structure of the GConvLSTM model.

B. Evaluation
The training dataset contains 80% of the information, and the rest 20% of the driving dataset is assigned as the test dataset.We used the training dataset to train the model and the test dataset to validate the model.The performance of the developed GConvLSTM model has been evaluated through the accuracy, precision, recall, and F1-score metrics expressed as P recision = T P (T P + F P ) , where T P (true positive) shows the number of positive samples classified accurately, T N (True Negative) indicates the number of negative samples that are classified accurately, F P (False Positive) indicates the number of negative samples that are classified as positive and F N (False Negative) shows the number of positive samples that classified as negative.
According to previous studies, using only sensors non-public PIDs decreases the accuracy of the classification algorithms.However, the combination of both public and non-public signals increases the accuracy [5], [7].Although utilizing the signals without public PIDs enhances the performance of classification models, it limits their practical usage.In this study, we focus on improving the practical applicability of our deep learning model by increasing the accuracy of sensors with public PIDs.Furthermore, we investigate the impact of the non-public data on our proposed model.

C. Edge Server
We developed our edge server using the Python Flask framework on Raspberry Pi (RPI).The RPI used for this study is an RPI 4 model B with a 64-bit quad-core Cortex-A72 processor and 8GB LPDDR4 RAM.

D. Results
The results show that the GConvLSTM model has a better performance compared to the previous study [5] in detecting   unsafe driving behavior.We only compare our results to this study because we used the same data labeling approach and dataset as they did.Table II compares the performance of our model with SVM and NN proposed in [5] on subset A which contains public signals.Our methods developed based on GCovnLSTM achieved the best results for all evaluation metrics.
Table III shows the performance comparison of the models on subset B, which contains both public and nonpublic signals.The results show the impact of the non-public signals on classification accuracy for all models.The performance for SVM and NN increased significantly, while there is a slight improvement in our method.Therefore, GConvLSTM is more stable and accurate in driving behavior detection for both public and non-public sensor data.Moreover, we also evaluated our model performance with subset C, which has 30 features.Subset C contains all positively and negatively correlated signals from the driving dataset.Operating Characteristic) curve and AUC (Area Under the Curve).Figure 6 shows the ROC curve for the "unsafe" class for subsets A, B, and C. The AUC scores are very close, and the strong overlap of ROC curves implies that our model can produce stable results for both public and non-public invehicle sensor data.

E. Monitoring Dashboard
Figure 7 shows the monitoring dashboard developed to track the performance of our DBD system.The dashboard has five components built using the Python Dash framework.The first component shows the status of the DBD system.If every part of the system works accurately, the state indicates active.Otherwise, it displays inactive.The second component shows the current date, time, and outside temperature.The third one is the graph representation of the sensor data updated every second.The user can touch or hover each node to get more information about that sensor.The position of the sensors is not accurate, and they are selected randomly.The edges are created based on the calculated correlation of the sensors.The fourth represents the predicted label for the next 10 seconds.The Fifth component is a live pie chart that shows the daily report of predicted labels.The dashboard also alerts the driver using voice notification for any detected unsafe driving behavior.

VI. CONCLUSION AND FUTURE WORKS
In this study, we proposed an IoT system capable of detecting unsafe driving behavior using in-vehicle sensor data.We utilized the OCSLab driving dataset, which contains sensor data extracted from the CAN bus via an OBD-II connector, to evaluate our approach.Our proposed model, based on the GConvLSTM algorithm, improved the accuracy of unsafe driving behavior detection using publicly available sensor data, making our approach more practical than existing methods while achieving an accuracy of 97.5% for public sensors.Furthermore, we investigated the impact of non-public data on the GConvLSTM model, which resulted in a slight improvement.The average accuracy for the combination of public and non-public sensors was 98.1%, highlighting the reliability and efficiency of our proposed approach for detecting unsafe driving behavior in both scenarios.
We deployed our proposed DBD system on an RPI 4 Model B to enable the local detection of unsafe driving behavior.We also developed a monitoring dashboard that displays sensor data, prediction results, and daily reports on driving behavior conditions.The system alerts drivers via voice notifications of any detected unsafe driving behavior.As a lightweight DBD system, the developed approach can enhance road safety when implemented in vehicles.
For future research, we plan to create a graph dataset based on the dynamic nature of in-vehicle sensor networks and investigate how this can improve the accuracy and robustness of GConvLSTM in real-world driving behavior detection tasks.We also intend to incorporate additional features into the monitoring dashboard, such as personalized driving behavior recommendations based on individual driving performance, to assist drivers in improving their driving skills.
In conclusion, our proposed IoT system using the GConvL-STM algorithm can effectively detect unsafe driving behavior with high accuracy using publicly available sensor data.The system is lightweight, practical, and can be deployed locally in vehicles to improve road safety.Our research findings can contribute to the development of more reliable and efficient driving behavior detection systems, potentially reducing the incidence of accidents caused by driver errors.

E. Proposed Framework 1 )
Problem Formulation: This paper focuses on implementing the multivariate time series classification.Given samples of time series observations S = {S 1 , S 2 , . . ., S N } ∈ R N , in which N is the number of time series samples, and corresponding labels Y = {y 1 , y 2 , . . ., y N } ∈ R N , the model aims to learn a function f : S → Y that map S to Y based on the proposed model.Each sample S i contains time series measurements from in-vehicle sensors x t in a time window of size T associated with a label y ∈ {0, 1} -{0:safe, 1:unsafe}.

Fig. 2 .
Fig. 2. Relationship-based graph for observations at time t.

Fig. 5 .
Fig. 5.The overview of implemented GConvLSTM model in proposed DBD system.
[5] driving dataset used for this study contains 51 signals.Some signals are not compliant with the OBD-II standard and have non-public PIDs.The public and non-public subsets used for training and evaluating the model are defined in TableI.Subset A contains public signals, and subset B has both public and non-public signals.Both subsets A and B are same as the selected features used in[5].In addition, we used subset C to check to what extent the accuracy of the GConvLSTM model is influenced by non-public signals.Subset C contains all public and nonpublic features that are positively and negatively correlated.
Engine speed, Engine load, Throttle position, Steering wheel angle, Brake pedal pressure Subset C (Public+non-Public) Fuel consumption, Accelerator pedal value, Throttle-position-signal, Short term fuel trim bank1, Intake-airpressure, Absolute throttle position, Engine speed, Engine torque after correction, Torque of friction, Flywheel torque (after torque interventions), Current spark timing, Engine coolant temperature, Engine idle target speed', Engine torque, Calculated load value, Flywheel torque, Torque converter speed, Engine coolant temperature.1,Wheel velocity front left-hand, Wheel velocity rear right-hand, Wheel velocity front right-hand, Wheel velocity rear left-hand, Torque converter turbine speed -Unfiltered, Vehicle-speed, Acceleration speed-longitudinal, Master cylinder pressure, Calculated road gradient, Acceleration speed-Lateral, Steering wheel speed, Steering wheel angle Table IV shows the performance results on subset C. Compared to subset A which contains public signals, the accuracy improved by 1.2%.It is a good achievement.It implies that our model is more stable and reliable on both public and non-public signals.Therefore, our model is more practical than previous models for signals with only public PIDs.Furthermore, we compared our model performance on subsets A, B, and C using ROC (Receiver