Risky Traffic Situation Detection and Classification Using Smartphones

Behind many traffic accidents, there are more frequent minor incidents (risky traffic situations) that may lead to severe accidents. Analyzing such minor incidents effectively reduces accidents, but the challenge is to design a method to collect and analyze such incident information. In this paper, we propose a novel platform that aggregates behavioral data from pedestrians and drivers using their smartphones and recognizes risky traffic situations from the aggregated data. We design a two-stage approach where the smartphones of pedestrians and vehicles act as local anomaly detectors for triggering the event detector and classifier in the post-stage at the cloud server to suppress the processing and communication overhead. We also introduce an unsupervised learning system to cope with unseen risky situations enabled by joint utilization of the autoencoder-based anomaly detector and the risky situation classifier. The evaluation is conducted through both simulation and real experiments. The simulation result shows the risky situation detector achieves an F-measure of 0.89. We also collected real data at a car driving course to evaluate the risky situation classifier. From the results, we have confirmed that the proposed method succeeded in classifying three risky traffic situations involving pedestrians and/or vehicles with an accuracy of 89.3%.


I. INTRODUCTION
D UE TO the recent progress of safety driving support technologies, the number of traffic accidents in Japan has decreased.However, approximately 300,000 traffic accidents still annually occur [1].Behind such traffic accidents reported and recorded officially, it is well known that there are a number of unrecorded risky traffic situations which are significant signs for future serious accidents in the same or similar situations [2].Actually, the detection, collection, and analysis of such situations are not straightforward.Therefore, our effort should be made to collect and analyze the data of The review of this article was arranged by Associate Editor Margarida Coelho.the situations, which would be useful for driver education by telling them about typical risky situations, improving the visibility of traffic signs, decisions on installation locations for road-side traffic mirrors, warnings via car navigation systems, etc.
For example, in Japan, there exists a near-miss database [3] that collects videos from drive recorders installed in business vehicles such as taxis.Those drive recorders can detect unusual stops (e.g., severe deceleration) of vehicles by builtin acceleration sensors.After the detection, the video clips of several seconds before and after the events are stored in the local storage.Operators manually classify videos and store them in the database if recognized as risky traffic situations.However, it is reported that about 70% of such videos are false positives such as deceleration due to bumps.Therefore, a considerable amount of human resources is required in data selection.
Besides, drive recorders do not always capture all the scenes, as they record only the front views.Risky traffic situations often occur due to pedestrians' unusual behaviors (e.g., a sudden appearance of pedestrians from driver's blind spots).However, from the recorded scenes, the pedestrians' trajectories are unknown, and a deep understanding of the cause behind the risky traffic situations is impossible.There are more complicated cases where multiple entities (vehicles, bikes, and pedestrians) relate to each other to cause risky traffic situations while the drive recorders may capture only a part of the scenes.
There is also a requirement for the communication infrastructure if we rely on crowdsourcing mechanisms to collect data.People tend to avoid consuming their communication resources due to battery limitations, monetary costs, and so on.Therefore, collecting video clips from crowds is not a promising solution.Also, most current consumer-level drive recorders do not have communication functions.To collect a certain amount of data widely from crowds, a more lightweight way that naturally encourages people to join the system is preferable.
We should also consider not only risky traffic situations but also many traffic situations with potentially high risks.Examples of such situations are (i) pedestrians often aggressively jaywalk even with heavy traffic, and (ii) some drivers with bad manners drive through narrow community roads at high speeds to avoid traffic jams even if many children walk such community roads to their schools.To realize a safe and secure traffic environment for pedestrians and vehicles, detecting and analyzing such traffic situations with potential risks is important.Surveillance cameras are widely used to monitor traffic, but they are limited in coverage [4].Therefore, we leverage smartphones as ubiquitous sensors for wide coverage at low cost.
This paper introduces a smartphone-based risky traffic situation detection and a convolutional neural network (CNN) model for situation classification.Our method enables understanding the various risky traffic situations based on continuous sensing via pedestrians' and drivers' smartphones.First, a particular behavior detected by a smartphone is used as a trigger for further analysis of the situation by the anomaly detection model using an autoencoder with surrounding smartphones' behavior and the location map (street and intersection structures).Then, we collect each smartphone's location information and inertial sensor data via cellular networks to a cloud server.If a risky traffic situation is detected by the anomaly detection model, the situation is classified into one of the predefined classes.As discussed, these situations have not been recognized unless they lead to accidents.We collected data from three risky traffic situations in the real world to build the classifier.Moreover, we intentionally reproduce risky traffic situations in a simulation environment to build a dataset containing many risky traffic situations so that we can detect a variety of risky situations.Thus, the proposed method enables us to quantitatively understand various traffic situations related to traffic safety and traffic manners.
We have collected data from detailed traffic simulation and achieved an F-measure of 0.89 for anomaly detection.Particularly, we have collected data in a dedicated field in Kobe with 20 volunteers and achieved accuracy of 89.3% for the classification of three risky traffic situations.
Our contributions are summarized below.
• We design a novel architecture to recognize risky traffic situations.The smartphones of pedestrians and vehicles act as local anomaly detectors for triggering the event detector and classifier in the post-stage at the cloud server to suppress the processing and communication overhead.We also introduce an unsupervised learning system to cope with unseen risky situations.This is enabled by the joint utilization of the autoencoderbased anomaly detector and the risky situation classifier.This architecture enables us to recognize risky traffic situations which are difficult to understand by using video analysis.• We have confirmed the effectiveness of the proposed method through both simulations and real experiments.Specifically, we have leveraged a realistic traffic simulator (VISSIM) to generate a training/testing dataset for the anomaly detector.Also, we conducted a real data collection experiment in a car driving school.Through the experiments using both datasets, we have shown that the detection and classification performance (F-measure) is around 0.89.

II. RELATED WORK A. NEAR MISS DATABASE
To prevent or reduce traffic accidents, a huge effort has been made to collect various traffic information [5], [6].
For instance, the International Road Traffic and Accident Database (IRTAD) [7] provides road crash data collected in more than 40 countries.Also, the National Highway Traffic Safety Administration (NHTSA) [8], a part of the U.S. Department of Transportation, has published fatal accident statistics and accident cases.The Traffic Accident Analysis Center [9] analyzes the traffic accidents that occurred in Japan and their causes to reveal the background and causal association of traffic accidents.Also, Girotto et al. recommend collecting data on dangerous situations that include no damages or injuries as well as that of traffic accidents [10].Tokyo University of Agriculture and Technology has been building and analyzing a database of videos taken by drive recorders installed in taxis since 2005, and currently has classified over 140,000 videos into several categories [3].
The SAFETY MAP project [11] collects acceleration data from vehicles when hard braking, and derives a map based on the database by sharing locations where customers feel safe or dangerous.Such a near-miss database is expected to contribute to improving traffic safety awareness and reviews of road designs for preventing potential accidents.ICEBIKE, a bicycle manufacturer, encourages cyclists to ride safely by indicating the danger of a traffic accident [12].Sanders [13] investigated how the experiences of near misses affect the perceived and actual traffic risk.They found the usefulness of improving safety awareness and suggested reviewing road design based on the near-miss database.Jia et al. [14] investigated the correlation between opensource point-of-interest (POI) data and traffic accidents.This study showed that the three attributes of hospitals, banks, and residential areas significantly affect traffic accidents.Hannah et al. [15] designed a model that provides a safe walking route by calculating the risk of each sidewalk based on traffic accident statistics and map information.They also demonstrated that the proposed model accurately estimates pedestrian casualties using real-world city-data.
To reduce human labor for analyzing risky traffic situations, we leverage smartphones for data collection and provide a machine learning-based approach to automate the detection and analysis processes.

B. ABNORMAL DETECTION OF PEDESTRIAN AND VEHICLE BEHAVIORS
For preventing accidents, several methods are proposed to detect abnormal behaviors that may lead to serious situations [16], [17].Zhou et al. [18] proposed a method to detect abnormal behaviors in crowded scenes based on a trajectory model.Rasouli et al. [19] proposed a collision avoidance method that analyzes pedestrians' behavior observed by cameras when pedestrians walk at crosswalks.This method models interactions between drivers and pedestrians in several locations and under different weather conditions.It also identifies factors that influence pedestrian decisionmaking at the point of crossing.Li et al. [20] proposed a joint detector of temporal and spatial anomalies in crowded scenes.Zhang et al. [21] collected sensor data such as speed, engine rotation, and steering from more than 29,000 vehicles.They also made a driving model based on the data and estimated anomaly states which are not observed frequently.Tselentis and Papadimitriou [22] reviewed the Artificial Intelligence and Machine Learning approaches for driver profile and driving pattern recognition such as aggressiveness mainly based on speed and acceleration.Hu et al. [23] proposed a model to detect abnormal driving behavior due to fatigue, drunk and reckless driving by analyzing normalized driving behavior.Yang et al. [24] proposed a framework for classifying driving behavior safety level in real time based on machine learning using data collected by a driving simulator.Aloul et al. [25] proposed a method that can detect collisions by acceleration sensors in a smartphone mounted in a car.It also enables notification of collisions to a nearby police or hospital so that the response time for emergency calls is reduced.While acceleration sensors are useful for understanding driving behavior, they are susceptible to vehicle vibration, e.g., bumpy roads.Therefore, Hamdy et al. [26] proposed a method to accurately distinguish between normal driving behavior and road abnormalities by using data obtained from smartphone sensors.Although they collected data from either vehicles or pedestrians, some situations are hard to be identified.
Compared with the existing approaches, we focus on a more detailed analysis of risky traffic situations by aggregating sensor data from pedestrians and drivers via smartphones.This design enables the quantification of risks in traffic situations where the behaviors of pedestrians and vehicles influence each other.

C. CAMERA-BASED RISKY TRAFFIC SITUATION DETECTION
Several practical studies for risky traffic situation detection often use drive recorders [27], [28].Kodaira et al. [29] trained the image recognition model by deep learning based on the videos of the near miss database collected by TUAT, and estimated the risk of the risky traffic situations in the videos.Kataoka et al. [30] and Suzuki et al. [31] proposed the traffic accident detection and anticipation model and created a near-miss incident database.Chan et al. [32] also proposed accident anticipation methods using dashcam videos.They designed a Dynamic-Spatial-Attention (DSA) Recurrent Neural Network (RNN) and achieved the accident anticipation about 2 seconds before its occurrence with 80% recall and 56.14% precision.By using CCTV (Closed-circuit Television), Zhang and Abdel-Aty [33] built a model to predict pedestrian conflict at intersections based on machine learning.They used PET (Post Encroachment Time) and TTC (Time To Collision) as conflict indicators.On the other hand, the vision-based method may fail to recognize the situation accurately due to the bad weather or lack of lighting [34].However, visibility-related weather hazards significantly impact drivers because of decreased vision and increased crash risk [35].Therefore, a risky traffic situation detection method that is not affected by weather or lighting is required.For this purpose, our method leverages smartphone sensors for the detection of risky traffic situations.

III. OVERVIEW AND SYSTEM DESIGN A. SYSTEM DESIGN PRINCIPLE
The participants of our system are smartphones and a server.In the recent edge-computing paradigm, privacy-sensitive data should be processed on the edge side.On the other hand, risky traffic situation detection requires data from multiple users.Therefore, each smartphone sends the server a message to trigger detection.The server starts detection and classification of risky traffic situations when a trigger message arrives.A simple approach for this purpose is to build a single CNN.However, depending on user demands, we need to update (e.g., retrain, replace, etc.) the detection and classification models.Therefore, we design a local anomaly behavior detector, a risky traffic situation detector, and a risky situation classifier separately.The local anomaly behavior detector is simple and computationally lightweight while the risky traffic situation detector detects anomalies.We use the risky traffic situation detector in addition to the risky situation classifier to detect signs of unknown risky traffic situations that are not included (or not labeled) in training data.Then, the detected anomaly data is fed into the risky situation classifier built by supervised learning for the classification of risky traffic situations, manually classified by humans.

B. SYSTEM OVERVIEW
The proposed system architecture is illustrated in Figure 1.We assume users (i.e., pedestrians and drivers) have their own smartphones.These smartphones collect the positions and inertial sensor data (i.e., 3-axis acceleration and angular velocity).We have developed a dedicated Android application for data collection as acceleration, angular velocity, geomagnetism, azimuth, and GPS location.
We assume a pedestrian brings a smartphone in his/her pant pocket.We also assume the upside of the phone is down facing the screen toward the body as shown in Fig. 2. For vehicles, we assume a smartphone is fixed so that one of the 3 axes of the accelerometer is aligned with the vehicle travel direction.For the axes of the gyro and the accelerometer, given the vehicle travel direction, we apply transformation from the device coordinate to the earth coordinate as shown in Fig. 2. We note that people may place their smartphones at different positions, which leads to different phone orientations.In such cases, we may apply methods to estimate phone orientation by using inertial sensors [36].Another possibility is to apply the same transformation from the device coordinate to the earth coordinate as vehicles' smartphones, assuming that the phone's orientation remains stable.Since our goal is to design the novel concept to recognize risky traffic situations, we leave this problem out of scope of this paper.
A situation is time-series data containing smartphones' coordinates, accelerations, speeds, and orientations that reside in a predefined cell.A situation is said to be risky if at least one smartphone detects its owner's abnormal behavior and if the situation is not seen in normal, compliant pedestrian/vehicle movement and locations (i.e., normal traffic).Risky situations are classified into multiple classes, such as "Pedestrians rush out into an intersection, and a vehicle stops suddenly" and "A vehicle turning at an intersection is about to collide with a vehicle going straight".Our goal is to detect every risky situation using smartphone sensor data and classify it into a corresponding class.
Our approach detects and classifies risky traffic situations as follows.Firstly, each smartphone continuously monitors its own sensor data to judge whether or not it is in a risky situation.This is done by a local anomaly behavior detector, which is implemented and run in the smartphone's application.If the smartphone, say i, is in a risky situation, it sends a notification to the cloud server to activate a risky situation detector.It communicates with smartphones (each is denoted by j) in the same cell as smartphone i.The risky situation detector takes the coordinates, accelerations, speeds, and orientations of the smartphones as inputs, and decides whether the situation is risky or not.Once the situation is regarded as risky, then a risky situation classifier is activated to classify the situation into one of the predefined classes.
Considering the resource-constraint environment on each smartphone, we model the local anomaly behavior detector as a decision tree using threshold values.The risky situation detector is implemented as an autoencoder since it is much easier to collect normal traffic data than risky situation data.We train the autoencoder using the normal traffic data obtained by a realistic traffic simulator.The risky situation classifier is modeled by using a CNN, trained with the data obtained by imitating risky situations in the real world.Since such data is hard to obtain, we only have a limited dataset collected in a driving school.The data may not be sufficient for classifying normal and risky situations.Nevertheless, we can still use them for the classification of risky situations.This is the reason why we have two-layer architecture of a detector and a classifier.

C. LOCAL ANOMALY BEHAVIOR DETECTOR
The local anomaly behavior detector can be realized using the data from various sensors.In this paper, we take a simple, rule-based approach using thresholds.This section explains how we build the rules for pedestrians and vehicles.
For pedestrians, there is much anomalous behavior that may lead to risky situations.The following suddenly start running exemplifies how we design the local anomaly behavior detector.Firstly, we identify the "primitive contexts", which are stand still, walk, and run, that constitute the behavior.A two-second segment of the sensor data is classified into one of the above three primitive contexts based on the standard deviation σ y of the acceleration in the y-axis.The thresholds for the classification are defined empirically as below, referring to our dataset obtained in the field experiment.
⎧ ⎨ ⎩ standstill σ y < 2.7925 walk 2.7925 ≤ σ y < 30.5976 run 30.5976≤ σ y (1) We iterate the above classification every 0.5 seconds to obtain the sequence of the primitive contexts.Then, for the sequence of the primitive contexts in the last 3 seconds (6 primitives), we detect the local anomaly behavior, i.e., suddenly start running, if we find run after stand still or walk longer than 2 seconds.We perform the detection every second.
For vehicles, we define their primitive contexts by the following three factors: the acceleration in forward-backward direction (i.e., acceleration, stable, deceleration); horizontal behavior (i.e., left turn, straight, right turn); and speed (i.e., stop, low speed, high speed).The x-axis acceleration a x is used for the forward-backward direction while the z-axis angular velocity ω z is used for the horizontal behavior.For the speed classification, we use the speed v G reported by GPS.To obtain a x , ω z , and v G , we use the averages of the sensor readings for 2 seconds.Similarly to the primitive context recognition of pedestrians, we first recognize the above primitive contexts of vehicles as below.⎧ ⎨ ⎩ suddenacceleration a x < −0.5 m/s 2 stable −0.5 m/s 2 ≤ a x < 0.5 m/s 2 suddendeceleration 0.5 m/s 2 ≤ a x (2) We iterate the above primitive context recognition every 0.5 seconds to obtain the time series of the primitive contexts.Finally, we detect local anomaly behavior from vehicles based on pre-defined rules for a time series of primitive contexts.In this paper, we define the local anomaly behavior of vehicles as continuously fast, sudden stop, and sudden avoidance.The local anomaly behavior detector judges whether it is an anomaly or not every second by using the time series of the primitive contexts for the time window of 4 seconds.The continuously fast is detected if all of the primitive contexts are high speed.We detect sudden stop if sudden deceleration is observed followed by the stop primitive context.Also, we detect sudden avoidance when sudden right (left) turn occurs after sudden left (right) turn.
We have confirmed that our local anomaly behavior detector successfully detects anomaly behaviors through a real experiment.Figure 3 shows an example acceleration data measured by smartphones of a pedestrian and a vehicle when a pedestrian starts running suddenly and a vehicle stops suddenly.In Figure 3(a), accel_x, accel_y and accel_z show acceleration values.The dashed red line in Figure 3 indicates that a risky traffic situation occurs at around 15:23:23 since a pedestrian started to rush out at 15:23:21 in this trial.There are significant changes in the acceleration on all axes in Figure 3(a).Their standard deviations of the accelerations on all axes are relatively high.Accel_y is specifically affected by the movement of rushing out because it is on the axis along the user's body.
On the other hand, the vehicle responded to the pedestrian's rushing out stops suddenly at around 15:23:22.At that time, accel_y and accel_z are significantly changed, as shown in Figure 3(b).When the vehicle decelerates, we observed around −1.0 G in the y-acceleration which is the direction of the travel.When the vehicle stopped suddenly, it completely stopped after the deceleration.As we can see from Fig. 3, there are clear changes in the acceleration that are easily detected by the local anomaly behavior detector.

D. RISKY SITUATION DETECTOR
When a local anomaly behavior is detected by a smartphone, the sensor data from the pedestrians and vehicles in the proximity of the smartphone detecting local anomaly is further analyzed to detect the risky traffic situation by an autoencoder [37].The autoencoder is a neural network model for dimensionality reduction and uses the same data for input and output for the training.The autoencoder network is trained to encode and decode the input data by minimizing the number of units in the hidden layer.As a result, the autoencoder can successfully decode data that is similar to the training data, but when restoring other data, the error from the input data becomes large.This error means the features are not found in the training data.Focusing on the above characteristic, the autoencoder is used for anomaly detection [38], [39].Thus, we use the autoencoder for risky traffic situation detection.
In order to express the occurrence of the risky traffic situations in the surrounding space as a multidimensional array, our proposed system divides a field into 100m square cells and collects data about pedestrians and vehicles for each cell.The vehicles send their positions, speeds, and accelerations.Similarly, the pedestrians send their positions and speeds.Each cell has an array of 13 feature values including the number of pedestrians and vehicles and commonlyused statistic values of speeds and accelerations from the pedestrians and vehicles.The above data collection process is conducted at each second.A multidimensional array is generated from the arrays obtained in the latest 4 seconds in each cell.Furthermore, the multidimensional arrays in all cells are merged into another multidimensional array.This multidimensional array is given to the autoencoder as an input for the risky traffic situation detection.The autoencoder is composed of 13 layers, where layers 1-9 are the encoder part and layers 10-13 are the decoder part as shown in Figure 4.
The autoencoder encodes and decodes the observation data, and calculates the mean squared error (MSE) between its output and input data as an abnormality score.The risky traffic situation is detected when the score exceeds a predefined threshold.We can adjust the threshold depending on the application requirements.Obviously, there is a tradeoff between false positives and false negatives.For example, if a user focuses more on finding unknown risky situations, s/he may set a small value to the threshold, allowing more false positives.In this paper, we define the threshold as the sum of the mean and standard deviation of the abnormality score of safe environments.

E. RISKY SITUATION CLASSIFIER
After the autoencoder detects a risky traffic situation, we use CNN to classify the risky traffic situation.We collect the sensor data from pedestrians and vehicles related to the  triggering smartphone to understand the situation further.In addition, we also collect the sensor data in past few seconds so that we can analyse why the situation is happened.The sensor data is composed of two vectors of the velocity and the angular velocity measured by smartphones.In our prototype, the 50m square area around the triggering smartphone is divided into 5×5 grid cells.We collect the sensor data from pedestrians and vehicles in each cell and calculates statistical values, i.e., the mean, standard deviation, variance, minimum, maximum, median, skewness, norm, frequency, and amplitude from the collected sensor data in past 4 seconds.Figure 5 illustrates the CNN structure of the risky situation classifier.A 5×5 array generated from the cells is given to CNN as input.CNN uses the softmax function to estimate the traffic situations.

IV. PROTOTYPE IMPLEMENTATION
Figure 6 illustrates the function layout of our prototype system.It consists of a smartphone application and a server application.In the prototype, we used a general-purpose server machine with Ubuntu OS.As shown in Figure 6, the server application consists of a sensor data storage, the risky situation detector, and the risky situation classifier.
The local anomaly behavior detector runs on each smartphone, monitoring anomalies from its own sensor data.When it detects an anomaly, it sends the server a message to trigger the risky situation detector with the sensor data via cellular and/or Wi-Fi networks.The sensor data sent from a smartphone is received by Nginx/1.14.2 and delivered to Fluentd (td-agent 1.3.3).The sensor data received by Fluentd is sequentially stored in a MySQL database.For the implementation of the risky situation classifier, we used CUDA and pyTorch.When a risky traffic situation is detected, our system sends caution to the corresponding users.
We note that our system is not necessarily real-time because our main goal is to understand various risky traffic situations with potential risks.Therefore, we can still collect sensor data from smartphones for offline analysis.To do so, smartphones need to record their sensor data in their local storage, which is later sent to the server for post analysis.This means that delay or packet drops in wireless communication do not affect the performance of anomaly detection and risky situation classification.

V. EVALUATION A. DATA COLLECTION
Since our goal is to detect and classify risky traffic situations, collecting real data on such situations is difficult.Therefore, we have generated sensor data based on the movements of vehicles and pedestrians reproduced by a traffic simulator called VISSIM [40].In addition, to demonstrate the performance of our risky situation classifier, we collected real data at a car driving course.

1) SIMULATION DATASET
To collect a sufficient volume of data for various traffic situations only by a field experiment in the real world is costly and challenging.Therefore, VISSIM [40], the detailed traffic simulator from PTV Group, is adapted to efficiently generate and collect the equivalent data.If the data from the simulation is used as the training data, the difference from real-world data may influence the model's accuracy.In the case of anomaly, the difference between simulation and real-world might not be negligible since unusual behavior is difficult to be modeled in general.The model cannot apply to calculate similarity to an unusual scenario.However, in the ideal environment where all road users strictly follow the traffic rules, the road user's behavior can be assumed to be sufficiently close to the behavior model in the simulated environment.Since the model can be applied to calculating differences from normal behavior, the simulation data can be used to train the anomaly detection model.The collected data from VISSIM is shown in Table 1.We assume that both pedestrians and vehicles have smartphones in the simulation.Through smartphones, we collect their positions and speeds from pedestrians and their positions, speeds, and accelerations from vehicles in the simulations.
VISSIM can create a traffic environment based on the aerial map of the actual road.Figure 7 illustrates an example of the intersection created by VISSIM.This intersection is designed with the arrangement and configuration settings of the road, sidewalk, traffic signal, traffic of the vehicle and pedestrians, and so on.We used several aerial images to model the intersections manually.In Figure 7(b), green and gray areas are the pedestrian-dedicated area and the roadway, respectively.The bar on the road, which is green or red, is the traffic signal.The white stripe on the road is the crosswalk.We suppose that vehicles drive only on roadways and pedestrians move only on pedestrian-dedicated areas and crosswalks.

2) SIMULATION SETTINGS
As a preliminary experiment, three traffic scenarios are defined: (i) safe traffic situation, (ii) a pedestrian rushes out into the road, and a vehicle stops suddenly (called a mixed traffic situation), and (iii) a vehicle invades sidewalk in an intersection.Randomly selected 80% of the data from (i) safe traffic situation is used to train the anomaly detection model.The rest 20% of the data from (i) safe traffic situation and all the data from the other two situations are used for the evaluation.(i) Safe traffic situation: The training data for the anomaly detection model is generated by VISSIM.To collect the data for scenario (i), the ideal traffic situation is designed as shown in Figure 7 where all road users are well-mannered.107,701 trials of sensor data were collected.The other situations are designed by some modifications of preferences from scenario (i).(ii) Mixed traffic situation: 2,453 trials are conducted for reproducing the scenario (ii) a mixed traffic situation.In this situation, pedestrians can also walk through the roadway other than the crosswalk as shown in Figure 8.When pedestrians rush out into the roadway just in front of the vehicle, the vehicle decelerates suddenly and stops to avoid a collision.However, vehicles sometimes move through the pedestrians in the simulation, because they cannot stop in front of them due to the short distance from pedestrians.Such data is filtered out and not used.(iii) Vehicle invades a sidewalk in an intersection: 2,374 trials were conducted for (iii) A Vehicle invades sidewalk in intersection.The situation is designed as shown in Figure 9 where the vehicle invades a green area that is a pedestriandedicated area.Vehicles and pedestrians in this situation pass through each other if the collision occurs in the pedestriandedicated area.

3) REAL-WORLD DATASET
An experiment was conducted at a car driving course as shown in Figure 10.In the experiment, two pedestrians and one vehicle were on the driving course.Each pedestrian carried a smartphone in his/her right pocket with the top of the phone facing down and the screen facing out of the body.The smartphone of the vehicle was fixed between the driver and the passenger seat.The coordinate axes of vehicle sensor data are calibrated based on the direction of gravity.The positive directions of the X-axis, Y-axis, and Z-axis are transformed into the vehicle's right, forward, and ceiling directions, respectively.
We have collected data in four types of scenarios: a) a pedestrian starts running suddenly, b) a vehicle keeps driving at high speed, c) a pedestrian rushes out into the road, and a vehicle stops suddenly (called a mixed traffic situation) and d) a pedestrian walks and a vehicles drives safely (called a normal behavior).We note that scenarios a) and b) are risky traffic situations independently performed by either a pedestrian or a vehicle while scenario c) represents a risky traffic situation with simultaneous behaviors performed by pedestrians and vehicles in proximity.Figure 11 shows the data collection environment in scenario c).The ground truth of the occurrence of risky traffic situations is manually labeled by annotators based on the recorded video.The occurrence time of each situation is defined as when the risky traffic situations of pedestrians and vehicles are observed.We also have collected data in safe situations as in scenario d).We explain the detail of each scenario below.1) Pedestrian starts running suddenly: At first, a pedestrian confirmed that there was no vehicle approaching around them and they started walking normally from the sidewalk to the pedestrian crossing.Then, they started running suddenly when reaching the middle of the pedestrian crossing.After crossing the road, s/he slowed down and started walking normally.After a while, s/he stopped walking to finish the trial.We note that there was no vehicle around them during each trial.In total, 155 trials were conducted in this scenario.2) Vehicle keeps driving at high speed: A driver confirmed that there were no pedestrians around him, keeping his driving speed over 30km/h.In this scenario, we continuously recorded the sensor data.We divided the recorded data into 137 trials.We note that there was no pedestrian around the vehicle during the trials.
3) Mixed traffic situation: In this scenario, two pedestrians started walking from a sidewalk toward the roadway.When they approached the roadway, they started running suddenly to the roadway and tried to cross it, as shown in Figure 11.However, the pedestrians noticed that a vehicle was approaching them, and they quickly stopped near the center line of the roadway.At the same time, the vehicle also noticed that the pedestrians were rushing out, and it stopped suddenly with hard braking.After that, the pedestrians confirmed that the vehicle stopped and resumed crossing the roadway.Also, after the vehicle confirmed that the pedestrians had finished crossing, it resumed driving.In this scenario, the pedestrians and the vehicle performed risky traffic situations simultaneously.286 trials were conducted in this scenario.4) Normal behavior: 231 trials were conducted for normal behavior.Pedestrians walked normally, and a vehicle drove at a speed slower than the limit.

B. EVALUATION FOR RISKY SITUATION DETECTION
The detail of the data used for evaluation is shown in Table 2.We collected data from three scenarios: Safe traffic environment, Mixed traffic situation, Vehicles invade sidewalk in intersection.We used 80% of the (i) Safe traffic situation data for training and the other data including the other scenarios for evaluation.
The histogram of anomaly scores is shown in Figure 12.The scores are calculated by the risky situation detector using the autoencoder.The test data is extracted from the normal and anomaly data in Table 2.The numbers of the normal data and the risky situation data are 4,000 from (i) Safe traffic situation and 2,000 from (ii) Mixed traffic situation and (iii) Vehicles invade sidewalk in intersection, respectively.For better visualization, we exclude risky situations with anomaly scores of 1.2 or more in Figure 12.The histogram shows a clear difference between the anomaly score distributions of normal and anomaly data, which means that the proposed method can accurately detect risky situations.Based on these anomaly scores calculated by the autoencoder, the anomaly data is detected by a threshold that is defined as the sum of the mean and standard deviation of the abnormality score of training data.
To evaluate the performance of the risky situation detector, the confusion matrix is shown in Table 3.The result shows   that the false positive and false negative rates are 10% and 21%, respectively.
The anomaly detection rate in each situation is also shown in Table 4.The detection rate is the recall of anomaly and normal data.We used all of the anomaly data and 20% of the normal data not used for training in Table 2 for the evaluation.The result shows that the proposed method accurately detects the anomalies without their definitions.The false positive rate is sufficiently low, which is 9.2%.

C. EVALUATION OF RISKY SITUATION CLASSIFICATION
We evaluated the performance of the risky situation classifier by using a dataset with 809 trials of four traffic situations described in Section V-A.3:(a) start running suddenly for pedestrians, (b) keep driving at high speed for vehicles, (c) the mixed traffic situation, and (d) the normal behavior.Table 5 describes the trial numbers of each situation.We used 80% of the dataset for training and the rest for evaluation.
Table 6 shows a confusion matrix of the risky situation classifier.From the result, sudden stop of vehicles and start running suddenly of pedestrians are accurately classified.On the other hand, some cases of (c) are wrongly classified as (b) or (c).The CNN model classifies sensor data into one of the predefined traffic situations.In the classification, we analyze the sensor data from time t to t + 5 [sec.]where t is the time when the local anomaly behavior detector detects an anomaly.However, due to time differences between smartphones, the proposed method sometimes fails to associate the sensor data of pedestrians and vehicles within 5 seconds, which results in classification failures.Another cause is the GPS error after turning on the GPS module.GPS requires some time to acquire accurate position fixes after waking up.Thus, in the experiment, the local anomaly behavior detector sometimes failed to detect the anomalies, leading to the failures of the classification of the mixed traffic situation.Nevertheless, the proposed method succeeded in classifying the mixed traffic situation with an accuracy of 89.3%.From the above results, we have confirmed that the proposed method can accurately classify traffic situations of pedestrians, vehicles, and their combination.

VI. DISCUSSION A. INFLUENCING FACTORS
There are some factors that influence the performance of our method as listed in Table 7. First, GPS error affects the performance since we rely on GPS positions for recognizing in which cell phones exist.To see the effect of GPS error on the anomaly detection, we conducted a simulation.The result shows that the effect of GPS error is limited because we use phone positions to determine in which grid cells they exist.The detail is described in Section VI-B.The phone placement is another factor as we mentioned in Section III-B, which can be mitigated by phone coordinate transformation while phones are stable.Also, how to segment a map may affect the performance.In our method, we simply divide the map into square grid cells.However, map segmentation considering road structures may contribute to increase the performance.Another factor is the length of the time slot, which should be carefully designed because the appropriate length is important to detect and classify risky situations.
Because the main scope of our paper is to propose the novel concept to recognize risky traffic situations, we leave further investigation on such influencing factors as future work.

B. EFFECT OF GPS ERROR
Given that the proposed method relies on the positions of pedestrians and vehicles, as obtained from GPS devices, to estimate risky situations, the accuracy of these estimates could be impacted by GPS errors.However, our method can even learn positions with GPS errors if positions of training data contain GPS errors.We also remark that recent GPS is quite accurate even in urban areas due to quasizenith satellites [41].In addition, the local anomaly behavior detector uses the speed reported by GPS, which is affected by GPS error.However, we note that we can alternatively use speeds estimated by using inertial sensors as proposed in [42].To see the effect of GPS errors on the risky situation detector, we conducted simulation experiments.In the simulation, as previously established in the existing works [43], [44], we assume that GPS errors follow a normal distribution.We subsequently derived F-measures, which are depicted in Figure 13.
From these results, the F-measure decreases as the standard deviation increases.However, even when the standard deviation reaches 20, the F-measure maintains a value of approximately 0.8.Thus, our proposed framework demonstrates a robust ability to detect risky situations, even in the presence of GPS error.
We also conducted various simulation experiments with training and test datasets with different standard deviations, considering that GPS errors may fluctuate over time.The F-measures derived from these datasets are summarized in Table 8.As indicated in the table, similar to the previously discussed case, the F-measure decreases as the standard deviation increases.However, the differences between the training and test datasets have a limited impact, as these F-measures consistently vary around 0.8.

VII. CONCLUSION AND FUTURE WORK
In this paper, we proposed a novel platform that aggregates behavioral data from pedestrians and drivers using their smartphones to recognize risky traffic situations.We leverage smartphones as ubiquitous sensors for wide coverage at a low cost.We also use the local anomaly behavior detector running on smartphones to avoid continuous transmission of the sensor data.Our method combines an autoencoder for risky situation detection and a CNN model for risky traffic situation classification.The data generated from a detailed traffic simulator are used for the anomaly detector using the autoencoder and have achieved an F-measure of 0.85 for binary classification.Also, through field experiments, collected data in a dedicated field in Kobe with about 20 volunteers and showed that the proposed method achieves an F-measure of 0.89 for classification into three near-miss categories.Our future work includes further analysis of other scenarios to classify more complex situations.We are also planning to design a feedback mechanism for pedestrians and vehicles according to the classified traffic situations.We believe that the proposed method will be an enabler for the safety of pedestrians and vehicles by providing us with an understanding of hidden risky traffic situations.

FIGURE 3 .
FIGURE 3. The acceleration of a pedestrian and a vehicle when the pedestrian starts running suddenly and the vehicle stops suddenly.

FIGURE 5 .
FIGURE 5. CNN structure of the risky situation classifier.

FIGURE 6 .
FIGURE 6. Function-layout of the prototype.

FIGURE 9 .
FIGURE 9. Data collection: Vehicle invades sidewalk in an intersection.

FIGURE 10 .
FIGURE 10.Experiment environment: A car driving course.

FIGURE 13 .
FIGURE 13.F-measure of the anomaly detection with standard deviations of GPS error.TABLE 8. F-measures of the anomaly detection derived from training and test datasets with different standard deviations of GPS error.