By Topic

Multimedia, IEEE Transactions on

Issue 5 • Date Aug. 2013

Filter Results

Displaying Results 1 - 25 of 27
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (181 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (130 KB)  
    Freely Available from IEEE
  • Guest Editorial for Special Section on Multimodal Biomedical Imaging: Algorithms and Applications

    Page(s): 973 - 974
    Save to Project icon | Request Permissions | PDF file iconPDF (110 KB)  
    Freely Available from IEEE
  • Multimodal Photoacoustic Tomography

    Page(s): 975 - 982
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (805 KB) |  | HTML iconHTML  

    Currently available optical microscopic imaging techniques-confocal microscopy, multi-photon (also referred to as two-photon) microscopy, and optical coherence tomography-have revolutionized biological and medical research, based on strong optical contrast and high spatial resolution. Unfortunately, owing to unavoidable strong light scattering in biological tissues, such methods cannot maintain contrast and spatial resolution beyond one optical transport mean free path ( ~ 1 mm in tissues). Although model-based diffuse optical tomography is able to operate at greater depths, this technique fails to maintain spatial resolution. Photoacoustic tomography overcomes the fundamental penetration depth problem and achieves high-resolution optical imaging in deep tissues by combing light and ultrasound. In this review article, the multimodal imaging capability of photoacoustic tomography, integrated with existing imaging tools, is contemplated and the potential preclinical and clinical impacts of the combined systems are discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Review of Recent Advances in Registration Techniques Applied to Minimally Invasive Therapy

    Page(s): 983 - 1000
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1573 KB) |  | HTML iconHTML  

    Minimally invasive and less invasive procedure is becoming more and more common in medical therapy. Image guidance is an indispensable component in minimally invasive procedures by providing critical information about the position of the target sites and the optimal manipulation of the devices, while the field of view is limited to naked eyes due to the small incision. Registration is one of the enabling technologies for computer-aided image guidance, which brings high-resolution pre-operative data into the operating room to provide more realistic information about the patient's anatomy. In this paper, we survey the recent advances in registration techniques applied to minimally and/or less invasive therapy, including a wide variety of therapies in surgery, endoscopy, interventional cardiology, interventional radiology, and hybrid procedures. The registration approaches are categorized into several groups, including projection-to-volume, slice-to-volume, video-to-volume, and volume-to-volume registration. The focus is on recent advances in registration techniques that are specifically developed for minimally and/or less invasive procedures in the following medical specialties: neuroradiology and neurosurgery, cardiac applications, and thoracic-abdominal interventions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integration of Multivariate Data Streams With Bandpower Signals

    Page(s): 1001 - 1013
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2120 KB) |  | HTML iconHTML  

    The urge to further our understanding of multimodal neural data has recently become an important topic due to the ever increasing availability of simultaneously recorded data from different neural imaging modalities. In case where EEG is one of the modalities, it is of interest to relate a nonlinear function of the raw EEG time-domain signal, say, EEG band power, to another modality such as the hemodynamic response, as measured with NIRS or fMRI. In this work we tackle exactly this problem defining a novel algorithm that we denote multimodal source power correlation analysis (mSPoC). The validity and high performance of the mSPoC framework is demonstrated for simulated and real-world multimodal data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Directive Contrast Based Multimodal Medical Image Fusion in NSCT Domain

    Page(s): 1014 - 1024
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2309 KB) |  | HTML iconHTML  

    Multimodal medical image fusion, as a powerful tool for the clinical applications, has developed with the advent of various imaging modalities in medical imaging. The main motivation is to capture most relevant information from sources into a single output, which plays an important role in medical diagnosis. In this paper, a novel fusion framework is proposed for multimodal medical images based on non-subsampled contourlet transform (NSCT). The source medical images are first transformed by NSCT followed by combining low- and high-frequency components. Two different fusion rules based on phase congruency and directive contrast are proposed and used to fuse low- and high-frequency coefficients. Finally, the fused image is constructed by the inverse NSCT with all composite coefficients. Experimental results and comparative study show that the proposed fusion framework provides an effective way to enable more accurate analysis of multimodality images. Further, the applicability of the proposed framework is carried out by the three clinical examples of persons affected with Alzheimer, subacute stroke and recurrent tumor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Monitoring of Tumor Response to Au Nanorod-Indocyanine Green Conjugates Mediated Therapy With Fluorescence Imaging and Positron Emission Tomography

    Page(s): 1025 - 1030
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1172 KB) |  | HTML iconHTML  

    Fluorescence imaging can track the expression of fluorescent protein and 18F-fluorodeoxyglucose based positron emission tomography ( [18F]FDG-PET) can evaluate the changes of [18F]FDG uptake in tumor cells during the antitumor treatment. In this work, fluorescence imaging and [18F]FDG-PET were both employed to monitor tumor response to Au nanorod-indocyanine green (AuNR-ICG) conjugates mediated therapy in a subcutaneous MDA-MB-231 mouse xenograft model. Serial fluorescence and [18F]FDG-PET images following the antitumor treatment were obtained and quantitative analysis revealed significant decreases in fluorescence intensity and metabolic activity in tumors treated with AuNR-ICG conjugates under near infrared laser irradiation. The results suggest that the combination of fluorescence and [18F]FDG-PET imaging can provide a noninvasive tool to assess the tumor response to antitumor therapy on a molecular scale. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fluorescence Tomography Reconstruction With Simultaneous Positron Emission Tomography Priors

    Page(s): 1031 - 1038
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1573 KB) |  | HTML iconHTML  

    In this paper, fluorescence molecular tomography (FMT) imaging guided by priors from simultaneous positron emission tomography (PET) was performed on a multi-modality imaging system combining PET and FMT. The target prior information from PET images was employed to the FMT reconstruction procedure using the iteratively reweighted least-squares method. Numerical simulations and phantom experiments were performed to validate the proposed method. The results indicate that incorporating the PET prior information into the FMT reconstruction can potentially improve the spatial resolution of FMT. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transcranial Ultrasound and Magnetic Resonance Image Fusion With Virtual Navigator

    Page(s): 1039 - 1048
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2047 KB) |  | HTML iconHTML  

    The Virtual Navigator (VN) technology was used for the fusion of transcranial Ultrasound (US) and brain Magnetic Resonance Images (MRI), with a repeatability error under 0.1 cm. The superimposition of US to the previously acquired MRI volume consisted of external point-based registration, that was subsequently refined with image-based registration of internal brain structures. The common registration procedure with the usage of external fiducial markers acquired with the two modalities was improved using facial anatomical landmarks, with a reduction of the internal targeted structure residual shift (maximum 0.7 cm in the cranio-caudal direction). This allowed the investigation of Deep Cerebral Veins and dural sinuses insonated from the condyloid process of the mandible, a recently introduced US window. The fusion of these vessels to the MRI volume provided their anatomical position and helped excluding false Doppler signal sources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Joint Multimodal Group Analysis Framework for Modeling Corticomuscular Activity

    Page(s): 1049 - 1059
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2452 KB) |  | HTML iconHTML  

    Corticomuscular coupling analysis based on multiple data sets such as electroencephalography (EEG) and electromyography (EMG) signals provides a useful tool for understanding human motor control systems. Two probably most popular methods are the pair-wise magnitude-squared coherence (MSC) between EEG and simultaneously-recorded EMG signals, and partial least square (PLS). Unfortunately, MSC and PLS generally deal with only two types of data sets at the same time, while we may need to analyze more than two types of data sets. Moreover, it is not straightforward to extend MSC to the group level for combining results across subjects. Also, PLS can have the information mixing problem since only the variations in one data set are used to predict the other data set. To address these concerns, we propose a joint multimodal analysis framework for corticomuscular coupling analysis. The proposed framework models multiple data spaces simultaneously in a multidirectional fashion. Furthermore, to address the inter-subject variability concern in real-world medical applications, we extend the proposed framework from the individual subject level to the group level to obtain common corticomuscular coupling patterns across subjects. We apply the proposed framework to concurrent EEG, EMG and behavior data collected in a Parkinson's disease (PD) study. The results reveal several highly correlated temporal patterns among the three types of signals and their corresponding spatial activation patterns. In PD subjects, there are enhanced connections between occipital region and other regions, which is consistent with the previous medical finding. The proposed framework is a promising technique for performing multi-subject and multi-modal data analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linking Brain Responses to Naturalistic Music Through Analysis of Ongoing EEG and Stimulus Features

    Page(s): 1060 - 1069
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1896 KB) |  | HTML iconHTML  

    This study proposes a novel approach for the analysis of brain responses in the modality of ongoing EEG elicited by the naturalistic and continuous music stimulus. The 512-second long EEG data (recorded with 64 electrodes) are first decomposed into 64 components by independent component analysis (ICA) for each participant. Then, the spatial maps showing dipolar brain activity are selected in terms of the residual dipole variance through a single dipole model in brain imaging, and clustered into a pre-defined number (estimated by the minimum description length) of clusters. Subsequently, the temporal courses of the EEG theta and alpha oscillations of each component for each cluster are produced and correlated with the temporal courses of tonal and rhythmic features of the music. Using this approach, we found that the extracted temporal courses of the theta and alpha oscillations along central and occipital area of scalp in two of the selected clusters significantly correlated with the musical features representing progressions in the rhythmic content of the stimulus. We suggest that this demonstrates that with the proposed approach, we have managed to discover what kinds of brain responses were elicited when a participant was listening continuously to the long piece of naturalistic music. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interactive Multiview Video System With Low Complexity 2D Look Around at Decoder

    Page(s): 1070 - 1082
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1980 KB) |  | HTML iconHTML  

    Multiview video with interactive 2D look around at the receiver is a challenging application with several issues in terms of effective use of storage and bandwidth resources, reactivity of the system, quality of the viewing experience and system complexity. The impression of 3D immersion is highly dependent on the smoothness of the navigation and thus on the number of 2D viewpoints. The classical decoding system for generating virtual views first projects a reference or encoded frame to a given viewpoint and then fills in the holes due to potential occlusions. This last step still constitutes a complex operation with specific software or hardware at the receiver and requires a certain quantity of information from the neighboring frames for ensuring consistency between the virtual images. In this work we propose a new approach that shifts most of the burden due to interactivity from the decoder to the encoder, by anticipating the navigation of the decoder and sending auxiliary information that guarantees temporal and interview consistency. This leads to an additional cost in terms of transmission rate and storage, which we minimize by using optimization techniques based on the user behavior modeling. We show by experiments that the proposed system represents a valid solution for interactive multiview systems with classical decoders. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Intra-Coding for H.264/AVC by Using Projection-Based Predicted Block Residuals

    Page(s): 1083 - 1093
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2769 KB) |  | HTML iconHTML  

    An efficient intra-prediction mode decision mechanism for H.264/AVC is presented in this research. A projection-based approach, which employs the reconstructed surrounding pixels and block content to compute the predicted block residuals (PBR), can effectively eliminate less probable modes from the computation of Rate Distortion Optimization. Both the projected vectors and corresponding predictors can be formed by shifting/adding the related data so the proposed scheme is suitable for hardware implementation. According to the PBR and coding information acquired during the encoding process, some prediction modes can also be skipped to further accelerate the intra coding. The experimental results show that the proposed scheme can effectively reduce the encoding time with slight video quality degradation and bit-rate increment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mode Decision-Based Algorithm for Complexity Control in H.264/AVC

    Page(s): 1094 - 1109
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3575 KB) |  | HTML iconHTML  

    The latest H.264/AVC video coding standard achieves high compression rates in exchange for high computational complexity. Nowadays, however, many application scenarios require the encoder to meet some complexity constraints. This paper proposes a novel complexity control method that relies on a hypothesis testing that can handle time-variant content and target complexities. Specifically, it is based on a binary hypothesis testing that decides, on a macroblock basis, whether to use a low- or a high-complexity coding model. Gaussian statistics are assumed so that the probability density functions involved in the hypothesis testing can be easily adapted. The decision threshold is also adapted according to the deviation between the actual and the target complexities. The proposed method is implemented on the H.264/AVC reference software JM10.2 and compared with a state-of-the-art method. Our experimental results prove that the proposed method achieves a better trade-off between complexity control and coding efficiency. Furthermore, it leads to a lower deviation from the target complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Part-Based Hand Gesture Recognition Using Kinect Sensor

    Page(s): 1110 - 1120
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1595 KB) |  | HTML iconHTML  

    The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in human body tracking, face recognition and human action recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Mover's Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences. The extensive experiments demonstrate that our hand gesture recognition system is accurate (a 93.2% mean accuracy on a challenging 10-gesture dataset), efficient (average 0.0750 s per frame), robust to hand articulations, distortions and orientation or scale changes, and can work in uncontrolled environments (cluttered backgrounds and lighting conditions). The superiority of our system is further demonstrated in two real-life HCI applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Crowdsourcing Multimedia QoE Evaluation: A Trusted Framework

    Page(s): 1121 - 1137
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1902 KB) |  | HTML iconHTML  

    Crowdsourcing has emerged in recent years as a potential strategy to enlist the general public to solve a wide variety of tasks. With the advent of ubiquitous Internet access, it is now feasible to ask an Internet crowd to conduct QoE (Quality of Experience) experiments on their personal computers in their own residences rather than in a laboratory. The considerable size of the Internet crowd allows researchers to crowdsource their experiments to a more diverse set of participant pool at a relatively low economic cost. However, as participants carry out experiments without supervision, the uncertainty of the quality of their experiment results is a challenging problem. In this paper, we propose a crowdsourceable framework to quantify the QoE of multimedia content. To overcome the aforementioned quality problem, we employ a paired comparison method in our framework. The advantages of our framework are: 1) trustworthiness due to the support for cheat detection; 2) a simpler rating procedure than that of the commonly-used but more difficult mean opinion score (MOS), which places less burden on participants; 3) economic feasibility since reliable QoE measures can be acquired with less effort compared with MOS; and 4) generalizability across a variety of multimedia content. We demonstrate the effectiveness and efficiency of the proposed framework by a comparison with MOS. Moreover, the results of four case studies support our assertion that the framework can provide reliable QoE evaluation at a lower cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning to Photograph: A Compositional Perspective

    Page(s): 1138 - 1151
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2466 KB) |  | HTML iconHTML  

    In this paper, we present an intelligent photography system which can recommend the most user-favored view rectangle for arbitrary camera input, from a photographic compositional perspective. Automating this process is difficult, due to the subjectivity of human's aesthetics judgement and large variations of image contents, where heuristic compositional rules lack generality. Motivated by the recent prevalence of photo-sharing websites, e.g., Flickr.com, we develop a learning-based framework which discovers the underlying aesthetic photographic compositional structures from a large set of user-favored online sharing photographs and utilizes the implicitly shared knowledge among the professional photographers for aesthetically optimal view recommendation. In particular, we propose an Omni-Range Context method which explicitly encodes the spatial and geometric distributions of various visual elements in the photograph as well as cooccurrence characteristics of visual element pairs by using generative mixture models. Searching the optimal view rectangle is then formulated as maximum a posterior by imposing the trained prior distributions along with additional photographic constraints. The proposed system has the potential to operate in near real-time. Comprehensive user studies well demonstrate the effectiveness of the proposed framework for aesthetically optimal view recommendation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Message Passing Matching Dynamics for Overlapping Point Identification

    Page(s): 1152 - 1162
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2691 KB) |  | HTML iconHTML  

    Existing registration algorithms usually converge to a local minimum due to inaccurate evaluation of the tentative correspondences established. In this paper, we move a step further and instead estimate the extent to which a point lies in the overlapping area. To this end, we regard the registration problem as an exchange network and develop a matching dynamics to characterize the interaction inside. Then we propose a novel algorithm based on the powerful message passing scheme derived from the matching dynamics for the optimization of the overlapping point weight. The novel algorithm penalizes in the process of deterministic annealing those tentative correspondences that violate the properties of the matching dynamics. The rigid transformation that brings the two overlapping shapes into alignment is finally estimated in the weighted least squares sense. Our experiments use both synthetic and real data to show that our proposed algorithm is more likely to converge to the global minimum than four selected state of the art ones for more accurate and robust results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Face Image Retrieval Using Attribute-Enhanced Sparse Codewords

    Page(s): 1163 - 1173
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1623 KB) |  | HTML iconHTML  

    Photos with people (e.g., family, friends, celebrities, etc.) are the major interest of users. Thus, with the exponentially growing photos, large-scale content-based face image retrieval is an enabling technology for many emerging applications. In this work, we aim to utilize automatically detected human attributes that contain semantic cues of the face photos to improve content-based face retrieval by constructing semantic codewords for efficient large-scale face retrieval. By leveraging human attributes in a scalable and systematic framework, we propose two orthogonal methods named attribute-enhanced sparse coding and attribute-embedded inverted indexing to improve the face retrieval in the offline and online stages. We investigate the effectiveness of different attributes and vital factors essential for face retrieval. Experimenting on two public datasets, the results show that the proposed methods can achieve up to 43.5% relative improvement in MAP compared to the existing methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shape Similarity Analysis by Self-Tuning Locally Constrained Mixed-Diffusion

    Page(s): 1174 - 1183
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1847 KB) |  | HTML iconHTML  

    Similarity analysis is a powerful tool for shape matching/retrieval and other computer vision tasks. In the literature, various shape (dis)similarity measures have been introduced. Different measures specialize on different aspects of the data. In this paper, we consider the problem of improving retrieval accuracy by systematically fusing several different measures. To this end, we propose the locally constrained mixed-diffusion method, which partly fuses the given measures into one and propagates on the resulted locally dense data space. Furthermore, we advocate the use of self-adaptive neighborhoods to automatically determine the appropriate size of the neighborhoods in the diffusion process, with which the retrieval performance is comparable to the best manually tuned kNNs. The superiority of our approach is empirically demonstrated on both shape and image datasets. Our approach achieves a score of 100% in the bull's eye test on the MPEG-7 shape dataset, which is the best reported result to date. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding the Characteristics of Internet Short Video Sharing: A YouTube-Based Measurement Study

    Page(s): 1184 - 1194
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1525 KB) |  | HTML iconHTML  

    Established in 2005, YouTube has become the most successful Internet website providing a new generation of short video sharing service. Today, YouTube alone consumes as much bandwidth as did the entire Internet in year 2000 . Understanding the features of YouTube and similar video sharing sites is thus crucial to their sustainable development and to network traffic engineering. In this paper, using traces crawled in a 1.5-year span (from February 2007 to September 2008), we present an in-depth and systematic measurement study on the characteristics of YouTube videos. We find that YouTube videos have noticeably different statistics compared to traditional streaming videos, ranging from length, access pattern, to their active life span. The series of datasets also allow us to identify the growth trend of this fast evolving Internet site, which has seldom been explored before. We also look closely at the social networking aspect of YouTube, as this is a key driving force toward its success. In particular, we find that the links to related videos generated by uploaders' choices form a small-world network. This suggests that the videos have strong correlations with each other, and creates opportunities for developing novel caching and peer-to-peer distribution schemes to efficiently deliver videos to end users. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Network Coding Meets Multimedia: A Review

    Page(s): 1195 - 1212
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1247 KB) |  | HTML iconHTML  

    While every network node only relays messages in a traditional communication system, the recent network coding (NC) paradigm proposes to implement simple in-network processing with packet combinations in the nodes. NC extends the concept of “encoding” a message beyond source coding (for compression) and channel coding (for protection against errors and losses). It has been shown to increase network throughput compared to traditional networks implementation, to reduce delay and to provide robustness to transmission errors and network dynamics. These features are so appealing for multimedia applications that they have spurred a large research effort towards the development of multimedia-specific NC techniques. This paper reviews the recent work in NC for multimedia applications and focuses on the techniques that fill the gap between NC theory and practical applications. It outlines the benefits of NC and presents the open challenges in this area. The paper initially focuses on multimedia-specific aspects of network coding, in particular delay, in-network error control, and media-specific error control. These aspects permit to handle varying network conditions as well as client heterogeneity, which are critical to the design and deployment of multimedia systems. After introducing these general concepts, the paper reviews in detail two applications that lend themselves naturally to NC via the cooperation and broadcast models, namely peer-to-peer multimedia streaming and wireless networking. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling of Driver Behavior in Real World Scenarios Using Multiple Noninvasive Sensors

    Page(s): 1213 - 1225
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1381 KB) |  | HTML iconHTML  

    With the development of new in-vehicle technology, drivers are exposed to more sources of distraction, which can lead to an unintentional accident. Monitoring the driver attention level has become a relevant research problem. This is the precise aim of this study. A database containing 20 drivers was collected in real-driving scenarios. The drivers were asked to perform common secondary tasks such as operating the radio, phone and a navigation system. The collected database comprises of various noninvasive sensors including the controller area network-bus (CAN-Bus), video cameras and microphone arrays. The study analyzes the effects in driver behaviors induced by secondary tasks. The corpus is analyzed to identify multimodal features that can be used to discriminate between normal and task driving conditions. Separate binary classifiers are trained to distinguish between normal and each of the secondary tasks, achieving an average accuracy of 77.2%. When a joint, multi-class classifier is trained, the system achieved accuracies of 40.8%, which is significantly higher than chances (12.5%). We observed that the classifiers' accuracy varies across secondary tasks, suggesting that certain tasks are more distracting than others. Motivated by these results, the study builds statistical models in the form of Gaussian Mixture Models (GMMs) to quantify the actual deviations in driver behaviors from the expected normal driving patterns. The study includes task independent and task dependent models. Building upon these results, a regression model is proposed to obtain a metric that characterizes the attention level of the driver. This metric can be used to signal alarms, preventing collision and improving the overall driving experience. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • EDICS categories for IEEE Transactions on Multimedia - January 2012

    Page(s): 1226
    Save to Project icon | Request Permissions | PDF file iconPDF (333 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo