Loading [a11y]/accessibility-menu.js
Driver confusion status detection using recurrent neural networks | IEEE Conference Publication | IEEE Xplore

Driver confusion status detection using recurrent neural networks


Abstract:

In this paper, we present a method for estimating the confusion level of a driver using a classifier trained on multimodal sensor data. Using the driver confusion status ...Show More

Abstract:

In this paper, we present a method for estimating the confusion level of a driver using a classifier trained on multimodal sensor data. Using the driver confusion status detector, a car navigation system can proactively support the driver when he/she is confused. A corpus of data was collected during on-road driving in traffic using a navigation system and a car instrumented with a variety of sensors. The data was manually annotated with the driver's confusion status and with multiple features representing driver's behavior and the traffic conditions. We compared different types of classifiers trained from the data: logistic regression, a feed-forward neural network, a recurrent neural networks, and a long short-term memory (LSTM)-based recurrent neural network. The accuracy was evaluated using F-max as well as precision/recall. We found that the LSTM outperformed the other models.
Date of Conference: 11-15 July 2016
Date Added to IEEE Xplore: 29 August 2016
ISBN Information:
Electronic ISSN: 1945-788X
Conference Location: Seattle, WA, USA

1. Introduction

Human-machine interfaces (HMI) for car information and entertainment systems are very important for safe driving and can offer a convenient interface to control navigation and other automotive functions. Speech interfaces are currently employed in car HMI's to reduce driving distraction. In practice, drivers need to handle complex situations inside and outside of the cars, such as difficult traffic conditions, unclear navigation instructions, and limited visibility. In such conditions, drivers may become confused because of a lack of information about how to proceed. Often, the needed information is available via the HMI, but the driver does not have enough time to retrieve that information using speech or manual interfaces. If the system can anticipate these situations then it can proactively provide more helpful information. We propose to detect driver confusion in order to provide a more proactive interface. There has been some prior work directed at detecting the driver's state, or likely actions, using sensor data available in the vehicle. Available data may include traffic conditions, navigation status, vehicle status, and information about the driver's behavior that can be extracted from sensors such as cameras and microphones. In prior work, corpora of such data have been recorded during driving and annotated according to driver status and driving conditions [1]–[3]. In these studies, data-driven approaches were used for prediction. For example, the driver's emotional state was detected using a Bayesian network obtained from multimodal data consisting of traffic condition, driving condition, and the drivers' facial expressions [4]. Gaussian mixture models, estimated from speech signals [5], have been used for detection of driver stress. In addition, destination prediction and driver action prediction were investigated using driving condition histories, obtained using the controller area network (CAN) bus, and the navigation system status [6]. All of these approaches employed classification without modeling the dynamics of the signals. However, it has been suggested, in the context of stress detection in speech, that temporal dynamics of sensor data and the dependency between multiple features are important [7]. Recently, neural network models such as feed-forward deep neural networks (DNNs), recurrent neural networks (RNNs) and related architectures such as and long short-term memory (LSTM) RNNs, and convolutional neural networks (CNNs) have been shown to dramatically improve the performance of speech and image recognition. In addition, speaker emotion detection has been investigated in speech signals using RNN and LSTM models [8], [9]. The sensor data involved in driver state prediction is challenging due to the large variability and dynamic range. Deep network models may be more capable of modeling the dynamics and interdependencies in sensor data than previous approaches. In this study, therefore, we propose as a proof of concept to apply deep network architectures to the problem of predicting driver confusion. Since the complexity of the problem relative to the amount of data in our corpus is unknown, we compare performance using a variety of models: logistic regression (LR), DNNs, RNNs, and LSTM-RNNs.

Contact IEEE to Subscribe

References

References is not available for this document.