Innovative Framework for Distracted-Driving Alert System Based on Deep Learning

Distracted driving is the most common cause of traffic accidents. According to a World Health Organization report, the number of traffic accidents has been increasing in recent years. To address this issue, distracted-driving recognition is an important area of traffic safety research. However, distracted behavior may be a part of a driver’s regular tasks. For example, sometimes a delivery person must use his/her phone while driving. The use of walkie-talkies is required for container-truck drivers because they improve unloading efficiency and reduce the time cargo ships spend in a port, resulting in cost savings. While driving on the highway, it is sometimes necessary to tune the radio to receive an update on road conditions. Furthermore, drinking water is permitted while waiting for a traffic signal for an extended period of time. Therefore, for the distracted-driving alert system, the driving scenario is important. To address this issue, we present a novel framework herein that combines driving perception and driver behavior recognition to provide the driver with appropriate warnings. By combining driver perception and behavior recognition, our proposed framework can reduce false alerts. We also define different time-to-collision standards to achieve humane and effective warnings.. We try to study various behaviors and define various time-to-collision standards for making safety-level decisions. For driving perception and driver behavior recognition, a modified convolutional neural network is used, which alerts the driver immediately.


I. INTRODUCTION
Because distracted driving is the leading cause of traffic accidents, distracted-driving recognition is a popular research topic in the field of intelligent vehicles. According to a World Health Organization survey, traffic accidents are the eighth leading cause of death worldwide, killing ~1.3 million people each year. Between 20 and 50 million people suffer from nonfatal injuries each year [1]. According to statistics obtained from the National Highway Traffic Safety Administration (NHTSA), there were 23,000 deaths in the United States due to distracted driving between 2012 and 2019, accounting for ~10% of all deaths [2]. Texting on the phone, talking to people in the vehicle, drinking, eating, and adjusting the audio or navigation systems are among the common distracted-driving behaviors. Therefore, concentration is essential for safe driving. These are classified as manual distractions and can cause traffic accidents. The time-to-collision (TTC) standard is an important criterion for safe driving to overcome this problem. The TTC is a standard for calculating relative collision time based on vehicle speed, relative object speed, vehicle position, and relative object position. Nevertheless, the TTC standard only considers the vehicle and its surroundings, ignoring the behavior of the driver. Conversely, distracteddriving recognition systems only consider the driver's behavior while ignoring the vehicle and its surroundings. Figs. 1(upper) and 1(lower) present situations considering only TTC and only distracted behavior.  The formula of TTC can be denoted as follows: Here, XL is the position of the leading vehicle, XF is the position of the following vehicle, and Lleader is the length of the leading vehicle. Fig. 2 shows the aforementioned positions and velocities. For calculating the TTC, the velocity and length of the leading vehicle must be known. The 2-s TTC is widely adopted, and the rule is known as the two-second rule [3]. This rule not only considers the safe braking distance but also considers the human reaction time. The driving state is safe under the two-second rule, as indicated in Fig. 1(upper). However, when the driver reaches behind the seat to retrieve something, he/she will require more time to return to the driving position. Thus, the TTC as well as the driving behavior must be considered in this situation. The system recognizes the behavior presented in Fig. 1(lower) as dangerous, but because the driver is waiting for pedestrians, the vehicle's velocity is 0 km/h. Drinking water is permitted in this situation. The consideration of both types of scenarios is important for developing safe-driving alert systems.
Distracted-driving research has focused on various topics. A frequent research issue is the impact of cell-phone use. Pushpa et al. [4] focused on phone use while driving, indicating that 87% of drivers receive phone calls, 41% read messages, and 95%listen to music while driving. Phone usage is common during driving when the driving scenario allows it. According to Katie et al. [5], ~43% of distracted drivers use mobile phones in a handheld manner and 57% apply handsfree while driving. Oscar et al. [6] suggested that when drivers use mobile phones, the probability of an accident increases by 70%. According to Sergey et al. [7], 2 million civilian workers have been banned from texting while driving on government business. Neha et al. [8] suggested that cell-phone use causes 33% of car accidents among young people. Their study also indicates that 16.67% and 27% of drivers text during driving in Australia and the United States, respectively. Robert et al. [9] suggested that the use of electronic devices is a common distracted behavior during driving. Their report also indicates that device usage is most prevalent in the evening. Particularly, at this time, 70.7% of drivers look away from the roadway while using an electronic device.
Another crucial component in driving safety is the interaction between the driver and passengers. Pnina et al. [10] presented statistics on various distracted-driving behaviors that have resulted in accidents, indicating that interaction with passengers often causes accidents among teenager drivers. Fangda et al. [11] investigated the influence of passengers on distracted driving, indicating that passengers aged 16-20 years have the highest negative impact on the driver. According to Ida et al.'s [12] report, 70% of parents have to feed children during driving and 40% of parents have to pick up their children's toys.
As mentioned before, distracted driving has a considerable impact on traffic. All the previous studies consider the driver's state. However, the gap between the vehicle and obstacles, elative object velocity, and vehicle velocity are important factors in driving safety.
To evaluate the factors contributing to safe driving, TTC is a valuable measure that considers the aforementioned factors. Previous studies suggest that TTCs ranging from 0.9 to 5 s afford different driving scenarios. For urban driving, Maretzke and Jacob [13] propose a 5-s TTC standard while other researchers suggest a 4-s TTC standard for the activation of a collision-avoidance system [14], [15]. In [16], a 3-s TTC threshold for rear-end collision-avoidance systems is proposed. Bella and Russo [17] define TTC standards of 2.5 and 3 s to distinguish between safe and unsafe car following to prevent rear collision. In addition to the aforementioned research, several studies have suggested a 3-s criterion for TTC threshold [18], [19], [20], [21]. Some studies propose lower TTC thresholds. In [14], [22], and [23], it is proposed that having TTCs of <1.5 s is crucial.
As mentioned before, TTC is an important standard to prevent rear-end and other collisions and promote safe car following distance. However, another crucial factor in drivingsafety systems is the driver state. Considering these aspects, we propose a framework that combines TTC and driver state to differentiate between safety levels.
The following are our major contributions:  To make the safety level decision more human and appropriate, we propose a framework that combines driver state and TTC to provide appropriate warnings for drivers.  For autonomous vehicles, three-dimensional (3D)object detection should be both accurate and fast. To this end, we modify a keypoint feature pyramid network (KFPN) for real-time 3D-object detection.  The identification of distracted driving should be both accurate and fast. To enhance the detection accuracy of MobileNetV3 for distracted-driving behavior recognition, we propose a residual weighted squeezeand-excite (RWSE) module.  It is difficult to collect data on both distracted drivers and the driving environment. To address this problem, we propose a simple method that combines the driver states from the State Farm dataset and 3D object information from the KITTI 3D object-detection dataset to simulate driving scenarios.

II. RELATED WORK
Various approaches have been applied to driving-safety warning systems as well as to analyze driving safety. Distraction recognition is an important factor in driving safety. Singh et al. [24] used textile sensors to analyze driver distraction. Ramírez et al. [25] used head-mounted inertial sensors for distracted-driver analysis. Furthermore, electroencephalography (EEG) approaches can be used to analyze driver distraction. Do et al. [26] used EEG to analyze the driver's brain for detecting distracted driving. Wang et al. [27] used EEG to analyze brain activity patterns for detecting distracted driving. To aquire different information from drivers, multisensor approaches are also popular methods for achieving good performance. Zhang et al. [28] used a long short-term memory (LSTM)-deep convolutional neural network (DCNN) complex model and combined EEG and electrooculography (EOG) features for distracted-driving detection. Liu et al. [29] used different sensors to detect vehicle dynamic, driver's eye movement, and behavioral actions. Then, they used semisupervised learning combined with the Laplacian support vector machine (SVM) to discriminate driver states. Gjoreski et al. [30] combined different types of information from devices, including eye tracker, video camera, thermal camera, and physiological sensors, to provide comprehensive data for driver state description. Then, they used different classifiers such as knearest neighbor (KNN), neural networks, and XGBoost to provide a comprehensive analysis. Chan et al. [31] combined ECG and PPG, infrared (IR) LED array, light-dependent resistor, and smartphone for conducting a comprehensive distracted-driving analysis. Nevertheless, some sensing methods, such as ECG, EOG, PPG, and head-mounted inertial sensing, are not convenient for normal consumers. Thus, certain approaches use mobile phone sensors for driver distraction detection. Park et al. [32] conducted a driving behavior analysis using the sensors inbuilt in the smartphone. Sun et al. [33] used the sensors inbuilt in the smartphone to analyze driver's behavior and prevent accidents. Ahmed et al. [34] used the smartphone's accelerometer and gyroscope sensors to detect distracted-driving behavior during smartphone use. Their project had 16 participants, and tests on smartphone use during driving were conducted using a realistic driving simulator. They also tested different metrics using an accelerometer, a gyroscope, and an accelerometer combined with a gyroscope. Kashevnik et al. [35] conducted a driver behavior analysis and proposed an accident prevention method using a smartphone's camera and built-in sensors, such as an accelerometer, a gyroscope, a GPS, and a microphone. However, the smartphone-based detection methods will fail when drivers forget to bring their smartphones. When drivers use electronic devices or get distracted by a scenery, they look away from the road. Accordingly, some studies focus on gaze tracking to estimate driver distraction. Alam et al. [36] used a DCNN facial landmark detector to detect the position of the eye and mouth.
Then, they calculated eyes aspect ratio and percentage of eyelid closure over the pupil over time for drowsiness detection, yawning frequency for fatigue detection, and gaze direction for detecting distraction. Their study used facial landmarks to provide drowsiness, fatigue, and distraction information. Yao et al. [37] used an eye tracker and a driving simulator to analyze eye movement while performing secondary tasks, such as navigation, tuning the radio, responding to a text message, replying to a voice message, and making a phone call. Hirayama et al. [38] analyzed neutral and cognitive distraction based on peripheral vehicle behavior during driver's gaze transition. Vicente et al. [39]. used a camera and an IR illuminator to detect when the driver's eyes are off the road. The IR illuminator has a better tracking ability in dark scenarios. Fan et al. [40] combined facial landmarks, head pose, and iris center to provide exhaustive gaze information to focus on gaze tracking and estimate driver distraction. However, gaze information may not provide sufficient information for distracted driving. In addition to gaze monitoring, posture tracking is a key decision factor for detecting distracted drivers. Billah et al. [41] proposed a distracted-driver recognition method by tracking the forehead, lip, and hand movements, followed by the use of SVM or KNN to recognize distracted driving behaviors. Xing et al. [42] used a Kinect sensor to detect 42 body features and different classifiers to analyze distracted-driving behaviors. Zhang et al. [43] combined semantic segmentation for human parsing and to further classify driving behavior using neural networks. Yan et al. [44] recognized driver's behavior by following hand and behavior postures. Hand and behavior postures were detected using neural networks in this approach. However, some postures may appear identical, resulting in detection errors. For example, when a driver turns his/her head to check the rearview mirror, his/her posture may appear as if talking to the passenger. Besides posture analysis, distracted-driving detection can be regarded as a task in deep learning, such as classification and object detection. With the aim of recognizing driving behavior, distracted-driving behavior classification is an important research area in distracted-driver detection. ADNet [45] is a state-of-the-art classifier that uses a channel attention module and a spatial attention module to better extract features for improved classification. This classifier achieves 98.42% accuracy regarding the testing using the AUC dataset [46]. Gumaei et al. [47] proposed a distracted-driver behavior detection technique that combined Raspberry Pi 3 camera and cloud computing for realizing realtime performance. The proposed neural network named CDCNN achieves 99.64% of accuracy using the State Farm dataset [48] while achieving 32 FPS. Aljasim et al. [49] proposed a novel method named E2DR that combines different two features from two different state-of-the-art classifiers. By combining two features from ResNet50 [50] and VGG16 [51], the E2DR achieves 92% accuracy using the State Farm dataset while achieving 69.2 FPS. Alkinani et al. [52] proposed HSDDD that uses the features from AlexNet [53], VGG16, InceptionNetV3 [54], ResNet50, and HOG and then processes the features via principal component analysis to minimize the total feature size while keeping key features. Finally, to acquire the best detection result, an SVM or a KNN was used. Shahverdy et al. [55] combined gravity, acceleration, engine revolutions per minute, vehicle speed, and vehicle throttle signals and converted the data into images. Then, driver behavior detection could be considered as an image classification task. Ou et al. [56] proposed a novel strategy that combined GAN to generate images of various scenarios to improve distracted-driver detection. Jegham et al. [57] proposed a novel LSTM-DCNN complex model that can process consecutive frames and different views to detect distracted-driver behaviors. Liu et al. [58] proposed a multitask learning framework for recognizing driver distraction that combines raw image, positive sample, and negative sample to acquire more features for discriminating different behaviors. MobileVGG [59] is a state-of-the-art classifier that uses separable convolution to accomplish accurate and real-time classification. In Section IV, the aforementioned approaches which use the State Farm dataset as a benchmark will be compared with our proposed framework.
In addition to the interior of the vehicle, the exterior is an essential factor of driving safety. TTC is the main criterion used for analyzing the vehicle status for safe driving. Kim et al. [60] proposed a collision risk assessment algorithm based on probability models and TTC, while Das et al. [61] proposed a definition of TTC for nonlane-based traffic, providing valuable references for TTC-based warning systems. Weng et al. [62] used TTC to determine safe vehicle-merging behavior. Kilicarslan et al. [63] proposed a TTC prediction method using a single video camera. Wang et al. [64] proposed inverse TTC as a measure for determining the safety of automated vehicles in the longitudinal direction. In terms of pedestrian safety, TTC is an important criterion for collision prevention to improve pedestrian safety. Wang et al. [65] used attention LSTM to predict pedestrian crossing and TTC to estimate time gap between pedestrian and vehicle for improving pedestrian crossing safety. Jiang et al. [66] combined TTC, video, and local cultural information for pedestrian midblock crosswalk safety analysis. Zhang et al. [67] used TTC and video data for pedestrian safety analysis.
To analyze the driving situation more precisely, some approaches combine TTC and driver behavior, such as braking and accelerating. Furthermore, our study combines TTC and driver behavior, although we focus on distracted-driving behavior and the creation of a more humane warning system. Fig. 3 outlines the proposed framework. Our system considers both driving scenarios and driver behavior to remind the driver that they should drive safely. This section is divided into three parts. Section III-A introduces our modified 3D objectdetection method, Section III-B introduces our modified MobileNetV3, and Section III-C introduces the decision algorithm.

A. MODIFIED KFPN
RTM3D [68] is a state-of-the-art monocular 3D objectdetection method. Reference [69] used the same KFPN as RTM3D. Reference [69] used LiDAR bird's eye view (BEV) to achieve a faster detection speed than RTM3D while maintaining high accuracy. We modified the KFPN architecture to further improve the performance of 3D-object detection while maintaining a real-time inference speed.    Figs. 4(a) and 4(b) show the original KPFN and our modified KPFN. We removed the max-pooling in ResNet18 and replaced it with a convolutional block because max-pooling retains only the maximum value in the feature map and loses other detailed features. To improve the performance of the original KPFN, we added a channelwise attention mechanism, called an effective squeeze-and-excite (eSE) module [70], in the convolution block. In the original KFPN, ResNet18 was used as the backbone. The convolution block (ConvBlock) in ResNet18 [50] does not have a channelwise attention mechanism. Our modified KFPN uses ConvBlock-eSE, containing the eSE module. The comparison of the original ConvBlock and our modified ConvBlock-eSE is shown in The comparison of ConvBlock and ConvBlock-eSE is shown in Fig. 5. In addition to inserting the eSE module in ConvBlock, we replace the ReLU activation function with PReLU [72] activation function. The formula of the ReLU activation function can be presented as follows: (2) When the input x ≤ 0, (2) is applied. The activation function's output will be 0. This could prevent the parameters in the neural network from being updated. We use PReLU as an activation function to overcome this problem. PReLU's formula can be written as follows: In (3), is the learnable parameter. When input x ≤ 0, the activation function has a nonzero output. It can prevent the updation of parameters for the neural network. After solving the nonupdated parameters problem, we insert the eSE module into ConvBlock to enhance channelwise attention to improve the performance.    6 shows the architecture difference between the SE and eSE modules. The eSE module has only one weight from the convolutional layer and does not compress the information, whereas the original squeeze-and-excite (SE) module [73] has two weights from two fully connected layers and compresses the information in the first fully connected layer. As shown in Fig. 6(a), the channel attention feature map ASE ∈ R 1×1×C of the SE module can be presented as follows: ASE(Xi) = σ(WC(δ(WC/R(FGAP(Xi))))).
(4) As shown in Fig. 6(b), the efficient channel attention feature map AeSE ∈ R 1×1×C of the eSE module can be written as follows: AeSE(Xi) =δP(WC(FGAP(Xi))). (5) In (4), σ is a sigmoid activation function and can be denoted as follows: Equation (6) can limit the input value to 0-1. When the input x is a large negative number, the input value will be close to zero. To solve this problem, we adopt the PReLU activation function δP in the eSE module. WC is a neural network layer with C input channels, δ is a ReLU activation function, WC/R ∈ R 1×1×C/R is a neural network layer with C/R input channels, and R is the channel reduction rate. The reduction rate here is set to 4. FGAP ∈ R 1×1×C is the global average pooling function and can be denoted as follows: (7) Here, Xi,j ∈ R H×W×C is the input feature map. As shown in Fig.  6(a), the feature map with C channels is reduced to one with C/R channels. This causes information loss and slows down the computing efficiency in the SE module. The output XCWA of the SE module can be obtained using the following equation: To solve these issues, we adopted the eSE module in our modified KPFN, for which the channelwise attention can be denoted as XeCWA = Xi ⊗ AeSE(Xi)).
(9) Here, XeCWA ∈ R H×W×C denotes the output of the eSE module and ⊗ is the Kronecker product. The eSE module has good performance in channelwise attention. To further improve the feature map attention for 3D-object detection, we adopted the KFPN. In contrast to the original FPN [74], the KFPN uses a novel keypoint feature selection procedure, which can be denoted by the following function: where ⊙ denotes the elementwise product and f denotes the feature from different convolution blocks. As denoted in (10), KFPN uses the softmax function to calculate the feature map scores. It then multiplies and adds all the corresponding features.

B. MODIFIED MOBILENET V3
The MobileNet series [75], [76], [77] exhibits high performance in image recognition. Particularly, MobileNetV3 [77] shows high accuracy in pattern recognition while maintaining real-time inference speed, making it suitable for distracted-driver behavior recognition. To improve its accuracy, we propose an RWSE module for MobileNetV3.
Hsigmoid can limit the output value from 0 to 1. But calculate faster than sigmoid. As shown in Fig. 7, the efficient channel attention feature map AeSEh ∈ R 1×1×C of the eSE module can be denoted as follows: AeSEh = σH( WC( FGAP( Xi))).
The efficient channelwise attention feature map XeCWAh ∈ R H×W×C of the RWSE module can be denoted as follows: We add spatial bias to compensate for the spatial feature, which comprises a 1 × 1 convolution and a Hsigmiod activation function and can be written as follows: The output of the RWSE module can be denoted as follows: where⊕ denotes elementwise addition.  Table I shows the configuration of MobileNetV3 small. As shown in Table I, the convolutional blocks have different settings. There are ten types of the inverted residual block (IRB) in MobileNetV3series. We only modified the convolutional blocks which use the SE module. We do the same thing in MobileNetV3 large. We replace the original SE module with RWSE in the IRB as shown in Fig.8. As shown in Fig. 8, some of the convolutional blocks use the Hswish as an activation function. The formula of Hswish can be written as Hswish can limit the output value from 0 to infinite, but it is faster than ReLU. Compared with Hswish, the ReLU activation function has a better nonlinear description.
Therefore, in the MobileNetV3 series, ReLU is used in the shallow layers and Hswish is used in the first layer and deeper layers.

C. DECISION ALGORITHM
TTC is an important indicator of driving safety. As mentioned before, driving behavior is also an important indicator of driving safety. For calculating TTC, the velocity and length of the leading vehicle must be known. Another factor that affects safe driving is distracted-driving behavior. To combine these two factors, we propose the following decision algorithm for safety-level classification. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.

IV. EXPERIMENT
In the following sections (IV-B-IV-E), we demonstrate our modified KFPN, provide an RWRE module, propose a dataset that incorporates driver status and 3D object identification, and present our safety-level decision algorithm, respectively.

A. SYSTEM SETUP
Our system was evaluated using a computer with an NVIDIA GeForce RTX3070, CUDA 11.3, Pytorch 1.11, NVIDIA driver 495, Ubuntu 20.04, and Python 3.8. Both 3D-object detection and distracted-driving behavior recognition are realized using adam optimizer and both tasks are trained on 400 epochs.

B. 3D-OBJECT DETECTION FOR DRIVING SCENARIO
To achieve high-performance 3D-object detection while maintaining real-time inference speed, we modified the original KFPN. Our KFPN, in particular, has a 90-FPS inference speed. To demonstrate the performance of the modified KFPN, we used the KITTI 3D-object detection dataset [78] as a benchmark and compared it with other approaches. We also trained our model using the dataset. The dataset has three categories: cars, cyclists, and pedestrians. The dataset has 6000 training data, 418 validation data, and 7518 testing data.  This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and  Fig. 9 shows the BEV results of the original and modified KFPN. The pedestrian presented in Fig. 9(a) was not detected. Our modified KFPN solves this problem, as shown in Fig. 9(b). Hence, our modified KFPN is slightly better than the original KFPN.

C. DISTRACTED-DRIVING RECOGNITION
As a training and testing benchmark for distracted-driving recognition, we used the State Farm dataset that includes 10 categories (C0-C9): normal driving, right-hand texting, righthand phoning, left-hand texting, left-hand phoning, tuning the radio, drinking, reaching behind, hair and makeup, and talking to a passenger. Our testing and training dataset were the same because Kaggle no longer provides the labels after distracted driver detection competition. To test the robustness of our modified models, we adopt the same k-fold cross-validation method from MOGA-optimized deep MKL-SVM [84] and set k to 10. As shown in Fig. 10, the entire dataset is split into 10 parts and each part will be a validation dataset by turns. Distracteddriving behavior recognition in real-time applications requires real-time inference speed and excellent accuracy. To achieve this, we used MobileNetV3 for recognition. To further improve the accuracy of MobileNetV3, we used our proposed RWSE module to enhance its channelwise attention module.  The eSE module improved the inference speed of MobileNetV3-S/L. Our modified eSE-MobileNetV3-S also improved the accuracy and inference speed of the original MobileNetV3-S. Furthermore, our proposed RWSE module improved the accuracy of both MobileNetV3-S and MobileNetV3-L. First, we tested the models with all the data. Subsequently, for testing the robustness of all models, we use k-fold cross-validation for testing. As shown in Table IV, our modified RWSE-MobileNetV3-L had good accuracy for C0, C4, and C5 in both testings. Table IV shows that classifier categories can be chosen on the basis of the particular focus and the required detection speed. Fig. 11 shows that talking to the passenger is allowed while for some situations, such as waiting for a traffic signal. This is not distracted-driving behavior because the car is stationary. To address this issue, we offer an integrated dataset and a decision algorithm that consider the driving environment and driver state when issuing appropriate warnings.

D. INTEGRATION DATASET FOR DRIVING SCENARIO AND DRIVER STATE
Both the driving scenario and driver state should be considered by a driving-safety alert system. To integrate the situations inside and outside the vehicle, we propose an integrated dataset that includes the results of both our 3D-object detection and distracted-driving recognition approaches. Our proposed integrated dataset combines the driver behavior from the State Farm dataset and the results of KITTI 3D-object detection via our modified KFPN. Our 3D-object detection results are classified via closest object distance. The KITTI 3D object-detection testing dataset comprised 7518 data. First, we separate the data in terms of distances. For distances below 20 m, we classify them into the same group because drivers should keep a safe distance above 20 m. For distances above 20 m, we classify them into different groups. After getting the results of the testing dataset, we manually selected 1771 data. We removed the data that contained objects, but some of them were missed. Object distances, object categories, and BEV information are all included. For distracted-driving behavior recognition, we use whole data from the State Farm training dataset.

E. SAFETY-LEVEL CLASSIFICATION WITH DECISION ALGORITHM
As discussed in the section I, considering both the driving scenario and driver state is important for determining the driving safety level. We used our proposed integrated dataset mentioned in Section IV-D for an experiment regarding the differentiation of driving safety levels. Our proposed algorithm was designed to comply with different situations. For calculating TTC, the velocity and length information of the leading vehicle is required. However, the KITTI 3D object-detection dataset does not contain vehicle motion or length information and some of the objects may not be vehicles. Hence, we assumed that all the relative objects have a speed of 0 km/h and replaced the vehicle's position and length of the leading vehicle with the obstacle distance D.   Fig. 12 shows a selection of the experiment results. Our safetylevel decision exhibits three results: safe driving, warning, and dangerous. Fig. 12(a) shows the result when talking to a passenger (C9). When the velocity of the vehicle is 0, the TTC is infinite, which means the driver is waiting for pedestrians to cross the road and talking to the passenger is allowed. For some behaviors, we define more strict TTC standards to prevent collisions. As shown in Fig. 12(b), we raised the TTC standard to ensure driving safety. The situation presented in Fig 12(b) is defined as a warning. The condition shown in Fig.  12(c) is considered dangerous even if compliant with the standard TTC.

V. FUTURE WORK AND CONCLUSION
In this study, we propose a simple framework for developing a driving warning system that combines vehicle state and driver behavior. In real applications, several additional aspects should be considered, with the environment being an important aspect. For example, the highway and urban streets have different speed limits that influence driving safety. Our system does not consider the driving environment. This can lead to misjudgment when TTC and distracted behavior are compliant, but the vehicle is speeding in the environment. For surrounding vehicles, their directions, lane-line positions, and conduct of nearby vehicles are also important for driving safety. Our system cannot predict the driving behavior of surrounding vehicles. Accidents can easily occur when the TTC and distracted-driving behaviors are compliant but the surrounding vehicles suddenly apply brakes. The weather is another important factor that influences driving safety; however, our system does not consider it. For example, when the road covered with ice, the grip of the tire will decrease. The TTC must also be adjusted for different distracted-driving behaviors. As mentioned in the introduction, the passenger is another influencing factor. We only consider drivers conversing with passengers, but other behaviors, such as taking care of children, must also be considered. A comprehensive driving warning system should consider a wide range of factors. The system proposed in this study only considers the TTC and driver behavior, which is insufficient. Overall, our TTC and driver behavior definitions are quite rudimentary. Finding studies on TTC during distracteddriving behavior was difficult for obtaining TTC definition. Therefore, the TTC restrictions were set as per our intuition, which may or may not be correct. There should be a more stringent TTC limit definition for driving a vehicle while engaged in a secondary task. In our experiment, we used the open State Farm dataset for distracted-driving behavior recognition. The State Farm dataset has only 10 categories. In real situations, there are other behaviors in which the driver could be engaged. For distracted-driving behavior recognition, we should thus consider more possibilities regarding distracted-driving behavior types influencing driving safety.
Our proposed system only considers 10 distract-driving behaviors. When a driver's behavior falls outside of the 10 categories, our system may make a mistake in assessing the situation. Our system has some limitations when detecting 3D objects. Our system will easily create an error scenario judgment if objects are not detected. The vehicle speed should below 81km/h to ensure our sytem can detect objects by a powerful GPU. However, it has the problem of real-time detection for embedded systems. At the same time, our proposed framework is simple to implement. We used two open datasets, i.e., the KITTI 3D object-detection dataset and the State Farm dataset, for 3Dobject detection and distracted-driving behavior recognition, respectively. We expect that future research on driving warning systems will be more comprehensive and applicable to real-world driving scenarios. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.