By Topic

Selected Topics in Signal Processing, IEEE Journal of

Issue 5 • Date Sept. 2012

Filter Results

Displaying Results 1 - 21 of 21
  • Table of contents

    Publication Year: 2012 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • IEEE Journal of Selected Topics in Signal Processing publication information

    Publication Year: 2012 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (34 KB)  
    Freely Available from IEEE
  • Introduction to the Issue on Emerging Techniques in 3-D

    Publication Year: 2012 , Page(s): 409 - 410
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Consistent Stereo-Assisted Absolute Phase Unwrapping Methods for Structured Light Systems

    Publication Year: 2012 , Page(s): 411 - 424
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2124 KB) |  | HTML iconHTML  

    Phase shifted sinusoidal patterns have proven to be effective in structured light systems, which typically consist of a camera and projector. They offer low decoding complexity, require as few as three projection frames per reconstruction, and are well suited for capturing dynamic scenes. In these systems, depth is reconstructed by determining the phase projected onto each pixel in the camera and establishing correspondences between camera and projector pixels. Typically, multiple periods are projected within the set of sinusoidal patterns, thus requiring phase unwrapping on the phase image before correspondences can be established. A second camera can be added to the structured light system to help with phase unwrapping. In this work, we present two consistent phase unwrapping methods for two-camera stereo structured light systems. The first method enforces viewpoint consistency by phase unwrapping in the projector domain. Loopy belief propagation is run over the graph of projector pixels to select pixel correspondences between the left and right camera that align in 3-D space and are spatially smooth in each 2-D image. The second method enforces temporal consistency by unwrapping across space and time. We combine a quality guided phase unwrapping approach with absolute phase estimates from the stereo cameras to solve for the absolute phase of connected regions. We present results for both methods to show their effectiveness on real world scenes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-Time Distance-Dependent Mapping for a Hybrid ToF Multi-Camera Rig

    Publication Year: 2012 , Page(s): 425 - 436
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2518 KB) |  | HTML iconHTML  

    We propose a real-time mapping procedure for data matching to deal with hybrid time-of-flight (ToF) multi-camera rig data fusion. Our approach takes advantage of the depth information provided by the ToF camera to calculate the distance-dependent disparity between the two cameras that constitute the system. As a consequence, the not co-centric binocular system behaves as a co-centric system with co-linear optical axes between their sensors. The association between mapped and non-mapped image coordinates can be described by a set of look-up tables. This, in turn, reduces the complexity of the whole process to a simple indexing step, and thus, performs in real-time. The experimental results show that in addition to being straightforward and easy to compute, our proposed data matching approach is highly accurate which facilitates further fusion operations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimodal Stereo Vision System: 3D Data Extraction and Algorithm Evaluation

    Publication Year: 2012 , Page(s): 437 - 446
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1052 KB) |  | HTML iconHTML  

    This paper proposes an imaging system for computing sparse depth maps from multispectral images. A special stereo head consisting of an infrared and a color camera defines the proposed multimodal acquisition system. The cameras are rigidly attached so that their image planes are parallel. Details about the calibration and image rectification procedure are provided. Sparse disparity maps are obtained by the combined use of mutual information enriched with gradient information. The proposed approach is evaluated using a Receiver Operating Characteristics curve. Furthermore, a multispectral dataset, color and infrared images, together with their corresponding ground truth disparity maps, is generated and used as a test bed. Experimental results in real outdoor scenarios are provided showing its viability and that the proposed approach is not restricted to a specific domain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temporal-Dense Dynamic 3-D Reconstruction With Low Frame Rate Cameras

    Publication Year: 2012 , Page(s): 447 - 459
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1607 KB) |  | HTML iconHTML  

    Temporal-dense 3-D reconstruction for dynamic scenes is a challenging and important research topic in signal processing. Although dynamic scenes can be captured by multiple high frame rate cameras, high price, and large storage are still problematic for practical applications. To address this problem, we propose a new method for temporal-densely capturing and reconstructing dynamic scenes with low frame rate cameras, which consists of spatio-temporal sampling, spatio-temporal interpolation, and spatio-temporal fusion. In spatio-temporal fusion, dual-tree discrete wavelet transform and shape context are employed to compute positional constraints that drive a Poisson image editing framework to obtain unsampled images and hence realistic time-varying shapes. With this method, not only shapes but also textures are recovered. This method can be extended to temporal-denser reconstruction by simply adding more cameras or using a few higher frame rate cameras. Experimental results show that temporal-dense dynamic 3-D reconstruction can be achieved with low frame rate cameras by our proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward Assessing and Improving the Quality of Stereo Images

    Publication Year: 2012 , Page(s): 460 - 470
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2533 KB) |  | HTML iconHTML  

    Imaging systems have incorporated numerous technological innovations such as 3-D television and handheld devices. Despite these advances, these techniques still require the human eyes to refocus until the sense of depth perception is achieved by the observer. The more time this takes, the more eye muscles become fatigued and the brain tires from confusion. However, the exact intricacies involved are far more complex. To alleviate these problems, we introduce a learning framework that aims to improve the quality of stereo images. Instead of attempting to cover all factors that affect the quality of stereo images, such as image resolution, monitor response, viewing glass response, viewing conditions, viewer differences, and compression artifacts, we first introduce a set of universally relevant geometric stereo features for anaglyph image analysis based on feature point correspondence across color channels. We then build a regression model that effectively captures the relationship between the stereo features and the quality of stereo images and show that the model performs on par with the average human judge in our study. Finally, we demonstrate the value of the proposed quality model in two proposed applications where it is used to help enhance the quality of stereo images and also to extract stereo key frames from a captured 2-D video. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Edge-Based Reduced-Reference Quality Metric for 3-D Video Compression and Transmission

    Publication Year: 2012 , Page(s): 471 - 482
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1810 KB) |  | HTML iconHTML  

    3-D video applications are delivered over a range of different transmission systems. In order to provide demanding customers with a better service over unreliable communication channels, compression and transmission system parameters can be changed “on the fly.” For interactive 3-D video services, video compression can be adapted (e.g., it can be made more robust and/or rate adaptive) based on the quality measured at the receiver. It has been shown that measuring the (3-D) video quality at the receiver-side, and using this information as a feedback to fine tune the system parameters, will result in improved performance in such systems. However, measuring 3-D video quality using Full-Reference (FR) quality metrics is not feasible due to the need of the original 3-D video sequence at the receiver-side for comparison. Therefore, this paper proposes a Reduced-Reference (RR) quality metric for color plus depth 3-D video compression and transmission, using the extracted edge information of color plus depth map 3-D video. This work is motivated by the fact that the edges/contours of the depth map can represent different depth levels and this can be considered for measuring structural degradations. Since depth map boundaries are also coincident with the corresponding color image object boundaries, edge information of the color image and of the depth map is compared to obtain a quality index (structural degradation) for the corresponding color image sequence. The performance of the method is evaluated for different compression ratios and network conditions. The proposed method achieves good results compared to its counterpart FR quality metric, with a lower overhead for side-information. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancement of Depth Maps With Alpha Channel Estimation for 3-D Video

    Publication Year: 2012 , Page(s): 483 - 494
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1462 KB) |  | HTML iconHTML  

    Depth images are widely used for 3-D scene generation. In depth image acquisition, accurate estimation of the depths of object boundaries, which has a critical impact on the visual quality of the generated 3-D scene, is very difficult, especially in the case of objects with a hairy region. We aimed to generate a dynamic 3-D scene without serious degradation in visual quality by developing solutions for the problems that occur in depth images obtained using an active depth sensor. A novel alpha channel estimation algorithm is proposed for seamless composition along with a depth map improvement method for hairy objects. By utilizing additional depth or infrared (IR) information, the existing matting algorithm can be improved significantly. We further enhanced the alpha estimation method in the temporal domain. The depth map was enhanced by filtering depth values along spatiotemporal neighborhoods based on information provided by the color and alpha images. The proposed method was examined mainly using a time-of-flight (TOF) camera, and Kinect is used too. The experimental results demonstrated that the proposed method can generate a 3-D scene with a greater degree of naturalness as compared to other methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical Hole-Filling For Depth-Based View Synthesis in FTV and 3D Video

    Publication Year: 2012 , Page(s): 495 - 504
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1559 KB) |  | HTML iconHTML  

    Three-dimensional television (3DTV) is believed to be the future of television broadcasting that would replace current 2D HDTV technology. Future 3DTV would bring a more life-like and visually immersive home entertainment experience, in which users will have the freedom to navigate through the scene to choose a different viewpoint. A desired view can be synthesized at the receiver side using depth image-based rendering (DIBR). While this approach has many advantages, one of the key challenges in DIBR is how to fill the holes caused by disocclusion regions and wrong depth values. In this paper, we propose two new approaches for disocclusion removal in DIBR. Both approaches namely hierarchical hole-filling (HHF) and depth adaptive hierarchical hole-filling eliminate the need for any smoothing or filtering of the depth map. Both techniques use a pyramid-like approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image. The lower resolution estimates involve a pseudo zero canceling plus Gaussian filtering of the wrapped image. The depth adaptive HHF incorporates the depth information to produce a higher resolution rendering around previously occluded areas. Experimental results show that HHF and depth adaptive HHF yield virtual images and stereoscopic videos that are free of geometric distortions and a better rendering quality both subjectively and objectively than traditional hole-filling approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fusion of Geometry and Color Information for Scene Segmentation

    Publication Year: 2012 , Page(s): 505 - 521
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1849 KB) |  | HTML iconHTML  

    Scene segmentation is a well-known problem in computer vision traditionally tackled by exploiting only the color information from a single scene view. Recent hardware and software developments allow to estimate in real-time scene geometry and open the way for new scene segmentation approaches based on the fusion of both color and depth data. This paper follows this rationale and proposes a novel segmentation scheme where multidimensional vectors are used to jointly represent color and depth data and normalized cuts spectral clustering is applied to them in order to segment the scene. The critical issue of how to balance the two sources of information is solved by an automatic procedure based on an unsupervised metric for the segmentation quality. An extension of the proposed approach based on the exploitation of both images in stereo vision systems is also proposed. Different acquisition setups, like time-of-flight cameras, the Microsoft Kinect device and stereo vision systems have been used for the experimental validation. A comparison of the effectiveness of the different depth imaging systems for segmentation purposes is also presented. Experimental results show how the proposed algorithm outperforms scene segmentation algorithms based on geometry or color data alone and also other approaches that exploit both clues. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Characterization of 3-D Volumetric Probabilistic Scenes for Object Recognition

    Publication Year: 2012 , Page(s): 522 - 537
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2487 KB) |  | HTML iconHTML  

    This paper presents a new volumetric representation for categorizing objects in large-scale 3-D scenes reconstructed from image sequences. This work uses a probabilistic volumetric model (PVM) that combines the ideas of background modeling and volumetric multi-view reconstruction to handle the uncertainty inherent in the problem of reconstructing 3-D structures from 2-D images. The advantages of probabilistic modeling have been demonstrated by recent application of the PVM representation to video image registration, change detection and classification of changes based on PVM context. The applications just mentioned, operate on 2-D projections of the PVM. This paper presents the first work to characterize and use the local 3-D information in the scenes. Two approaches to local feature description are proposed and compared: 1) features derived from a PCA analysis of model neighborhoods; and 2) features derived from the coefficients of a 3-D Taylor series expansion within each neighborhood. The resulting description is used in a bag-of-features approach to classify buildings, houses, cars, planes, and parking lots learned from aerial imagery collected over Providence, RI. It is shown that both feature descriptions explain the data with similar accuracy and their effectiveness for dense-feature categorization is compared for the different classes. Finally, 3-D extensions of the Harris corner detector and a Hessian-based detector are used to detect salient features. Both types of salient features are evaluated through object categorization experiments, where only features with maximal response are retained. For most saliency criteria tested, features based on the determinant of the Hessian achieved higher classification accuracy than Harris-based features. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments

    Publication Year: 2012 , Page(s): 538 - 552
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1614 KB) |  | HTML iconHTML  

    This paper presents a review and comparative study of recent multi-view approaches for human 3D pose estimation and activity recognition. We discuss the application domain of human pose estimation and activity recognition and the associated requirements, covering: advanced human–computer interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, movies, 3D TV and animation, physical therapy, autonomous mental development, smart environments, sport motion analysis, video surveillance, and video annotation. Next, we review and categorize recent approaches which have been proposed to comply with these requirements. We report a comparison of the most promising methods for multi-view human action recognition using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) Multi-View Human Action Dataset, and the i3DPost Multi-View Human Action and Interaction Dataset. To compare the proposed methods, we give a qualitative assessment of methods which cannot be compared quantitatively, and analyze some prominent 3D pose estimation techniques for application, where not only the performed action needs to be identified but a more detailed description of the body pose and joint configuration. Finally, we discuss some of the shortcomings of multi-view camera setups and outline our thoughts on future directions of 3D body pose estimation and human action recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points

    Publication Year: 2012 , Page(s): 553 - 565
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1666 KB) |  | HTML iconHTML  

    In this paper, we address the problem of human action recognition in reconstructed 3-D data acquired by multi-camera systems. We contribute to this field by introducing a novel 3-D action recognition approach based on detection of 4-D (3-D space + time) spatio-temporal interest points (STIPs) and local description of 3-D motion features. STIPs are detected in multi-view images and extended to 4-D using 3-D reconstructions of the actors and pixel-to-vertex correspondences of the multi-camera setup. Local 3-D motion descriptors, histogram of optical 3-D flow (HOF3D), are extracted from estimated 3-D optical flow in the neighborhood of each 4-D STIP and made view-invariant. The local HOF3D descriptors are divided using 3-D spatial pyramids to capture and improve the discrimination between arm- and leg-based actions. Based on these pyramids of HOF3D descriptors we build a bag-of-words (BoW) vocabulary of human actions, which is compressed and classified using agglomerative information bottleneck (AIB) and support vector machines (SVMs), respectively. Experiments on the publicly available i3DPost and IXMAS datasets show promising state-of-the-art results and validate the performance and view-invariance of the approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Noisy Depth Maps Fusion for Multiview Stereo Via Matrix Completion

    Publication Year: 2012 , Page(s): 566 - 582
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3146 KB) |  | HTML iconHTML  

    This paper introduces a general framework to fuse noisy point clouds from multiview images of the same object. We solve this classical vision problem using a newly emerging signal processing technique known as matrix completion. With this framework, we construct the initial incomplete matrix from the observed point clouds by all the cameras, with the invisible points by any camera denoted as unknown entries. The observed points corresponding to the same object point are put into the same row. When properly completed, the recovered matrix should have rank one, since all the columns describe the same object. Therefore, an intuitive approach to complete the matrix is by minimizing its rank subject to consistency with observed entries. In order to improve the fusion accuracy, we propose a general noisy matrix completion method called log-sum penalty completion (LPC), which is particularly effective in removing outliers. Based on the majorization–minimization algorithm (MM), the non-convex LPC problem is effectively solved by a sequence of convex optimizations. Experimental results on both point cloud fusion and MVS reconstructions verify the effectiveness of the proposed framework and the LPC algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Framework for the Analysis and Optimization of Encoding Latency for Multiview Video

    Publication Year: 2012 , Page(s): 583 - 596
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1779 KB) |  | HTML iconHTML  

    We present a novel framework for the analysis and optimization of encoding latency for multiview video. First, we characterize the elements that have an influence in the encoding latency performance: 1) the multiview prediction structure and 2) the hardware encoder model. Then, we provide algorithms to find the encoding latency of any arbitrary multiview prediction structure. The proposed framework relies on the directed acyclic graph encoder latency (DAGEL) model, which provides an abstraction of the processing capacity of the encoder by considering an unbounded number of processors. Using graph theoretic algorithms, the DAGEL model allows us to compute the encoding latency of a given prediction structure, and determine the contribution of the prediction dependencies to it. As an example of DAGEL application, we propose an algorithm to reduce the encoding latency of a given multiview prediction structure up to a target value. In our approach, a minimum number of frame dependencies are pruned, until the latency target value is achieved, thus minimizing the degradation of the rate-distortion performance due to the removal of the prediction dependencies. Finally, we analyze the latency performance of the DAGEL derived prediction structures in multiview encoders with limited processing capacity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rendering 3-D High Dynamic Range Images: Subjective Evaluation of Tone-Mapping Methods and Preferred 3-D Image Attributes

    Publication Year: 2012 , Page(s): 597 - 610
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1623 KB) |  | HTML iconHTML  

    High dynamic range (HDR) images provide superior picture quality by allowing a larger range of brightness levels to be captured and reproduced than traditional 8-bit low dynamic range (LDR) images. Even with existing 8-bit displays, picture quality can be significantly improved if content is first captured in HDR format, and then is tone-mapped to convert it from HDR to the LDR format. Tone mapping methods have been extensively studied for 2-D images. This paper addresses the problem of presenting stereoscopic tone-mapped HDR images on 3-D LDR displays and how it is different from the 2-D scenario. We first present a subjective psychophysical experiment that evaluates existing tone-mapping operators on 3-D HDR images. The results show that 3-D content derived using tone-mapping is much preferred to that captured directly with a pair of LDR cameras. Global (spatially invariant) and local (spatially variant) tone-mapping methods have similar 3-D effects. The second part of our study focuses on how the preferred level of brightness and the preferred amount of details differ between 3-D and 2-D images by conducting another set of subjective experiments. Our results show that while people selected slightly brighter images in 3-D viewing compared to 2-D, the difference is not statistically significant. However, compared to 2-D images, the subjects consistently preferred having a greater amount of details when watching 3-D. These results suggest that 3-D content should be prepared differently (sharper and possibly slightly brighter) from the same content intended for 2-D displaying, to achieve optimal appearance in each format. The complete database of the original HDR image pairs and their LDR counterparts are available online. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Information for Author

    Publication Year: 2012 , Page(s): 611 - 612
    Save to Project icon | Request Permissions | PDF file iconPDF (350 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2012 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE
  • [Blank page - back cover]

    Publication Year: 2012 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (5 KB)  
    Freely Available from IEEE

Aims & Scope

The Journal of Selected Topics in Signal Processing (J-STSP) solicits special issues on topics that cover the entire scope of the IEEE Signal Processing Society including the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals by digital or analog devices or techniques.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Fernando Pereira
Instituto Superior Técnico