By Topic

Multimedia, IEEE Transactions on

Issue 5 • Date Aug. 2008

Filter Results

Displaying Results 1 - 25 of 30
  • Table of contents

    Publication Year: 2008 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (49 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Publication Year: 2008 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • Multimedia Applications in Mobile/Wireless Context

    Publication Year: 2008 , Page(s): 673 - 674
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Reliable Event-Detection in Wireless Visual Sensor Networks Through Scalar Collaboration and Game-Theoretic Consideration

    Publication Year: 2008 , Page(s): 675 - 690
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2018 KB) |  | HTML iconHTML  

    In this work we consider an event-driven wireless visual sensor network (WVSN) comprised of untethered camera nodes and scalar sensors deployed in a hostile environment. In the event-driven paradigm, each camera node transmits a surveillance frame to the cluster-head only if an event of interest was captured in the frame, for energy and bandwidth conservation. We thus examine a simple image processing algorithm at the camera nodes based on difference frames and the chi-squared detector. We show that the test statistic of the chi-squared detector is equivalent to that of a robust (non-parametric) detector and that this simple algorithm performs well on indoor surveillance sequences and some, but not all, outdoor sequences. In outdoor sequences containing significant changes in background and lighting, this simple detector may produce a high probability of error and benefits from the inclusion of scalar sensor decisions. The scalar sensor decisions are, however, prone to attack and may exhibit errors that are arbitrarily frequent, pervasive throughout the network and difficult to predict. To achieve attack prediction and mitigation given an attacker whose actions are not known a priori, we employ game-theoretic analysis. We show that the scalar sensor error can be controlled through cluster-head checking and appropriate selection of cluster size n. Given this attack mitigation, we employ real-life sequences to determine the total probability of error when individual and combined decisions are utilized and we discuss the ensuing ramifications and performance issues. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Error-Resilient Video Encoding and Transmission in Multirate Wireless LANs

    Publication Year: 2008 , Page(s): 691 - 700
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (442 KB) |  | HTML iconHTML  

    In this paper, we present a cross-layer approach for video transmission in wireless LANs that employs joint source and application-layer channel coding, together with rate adaptation at the wireless physical layer (PHY). While the purpose of adopting PHY rate adaptation in modern wireless LANs like the IEEE 802.11a/b is to maximize the throughput, in this paper we exploit this feature to increase the robustness of wireless video. More specifically, we investigate the impact of adapting the PHY transmission rate, thus changing the throughput and packet loss channel characteristics, on the rate-distortion performance of a transmitted video sequence. To evaluate the video quality at the decoder, we develop a cross-layer modeling framework that considers jointly the effect of application-layer joint source-channel coding (JSCC), error concealment, and the PHY transmission rate. The resulting models are used by an optimization algorithm that calculates the optimal JSCC allocation for each video frame, and PHY transmission rate for each outgoing transport packet. The comprehensive simulation results obtained with the H.264/AVC codec demonstrate considerable increase in the PSNR of the decoded video when compared with a system that employs separately JSCC and PHY rate adaptation. Furthermore, our performance analysis indicates that the optimal PHY transmission rate calculated by the proposed algorithm, can be significantly different when compared with rate adaptation algorithms that target throughput improvement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cross-Layer Optimization for State Update in Mobile Gaming

    Publication Year: 2008 , Page(s): 701 - 710
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (714 KB) |  | HTML iconHTML  

    In a large-scale mobile gaming environment with limited wireless network bandwidth, efficient mechanisms for state update are crucial to allow graceful real-time interaction for a large number of players. By using the state updating threshold as a key parameter that bridges the resulting state distortion and the network traffic, we are able to study the fundamental traffic-distortion tradeoffs via both theoretical modeling and numerical analysis using real game traces. We consider a WiMAX link model, where the bandwidth allocation is driven by the underlying physical layer link quality as well as application layer gaming behaviors. Such a cross-layer optimization problem can be solved using standard convex programming techniques. By exploring the temporal locality of gaming behavior, we also propose a prediction method for on-line bandwidth adaptation. Using real data traces from a multiplayer driving game, TORCS, the proposed network-aware bandwidth allocation method (NABA) is able to achieve significant reduction in state distortion compared to two baselines: uniform and proportional policies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimedia Clip Generation From Documents for Browsing on Mobile Devices

    Publication Year: 2008 , Page(s): 711 - 723
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1368 KB) |  | HTML iconHTML  

    Small displays on mobile handheld devices, such as personal digital assistants (PDAs) and cellular phones, are the bottlenecks for usability of most content browsing applications. Generally, conventional content such as documents and Web pages need to be modified for effective presentation on mobile devices. This paper proposes a novel visualization for documents, called multimedia thumbnails, which consists of text and image content converted into playable multimedia clips. A multimedia thumbnail utilizes visual and audio channels of small portable devices as well as both spatial and time dimensions to communicate text and image information of a single document. The proposed algorithm for generating multimedia thumbnails includes 1) a semantic document analysis step, where salient content from a source document is extracted; 2) an optimization step, where a subset of this extracted content is selected based on time, display, and application constraints; and 3) a composition step, where the selected visual and audible document content is combined into a multimedia thumbnail. Scalability of MMNails that allows generation of multimedia clips of various lengths is also described. A user study is presented that evaluates the effectiveness of the proposed multimedia thumbnail visualization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmentation-Based View-Dependent 3-D Graphics Model Transmission

    Publication Year: 2008 , Page(s): 724 - 734
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1184 KB) |  | HTML iconHTML  

    For wireless network based graphics applications, a key challenge is how to efficiently transmit complex 3-D models over bandwidth-limited wireless channels. Most existing 3-D mesh transmission systems do not consider such a view-dependent delivery issue, and thus transmit unnecessary portions of 3-D mesh models, which leads to the waste in precious wireless network bandwidth. In this paper, we propose a novel view-dependent 3-D model transmission scheme, where a 3-D model is partitioned into a number of segments, each segment is then independently coded using the MPEG-4 3DMC coding algorithm, and finally only the visible segments are selected and delivered to the client. Moreover, we also propose analytical models to find the optimal number of segments so as to minimize the average transmission size. Simulation results show that such a view-based 3-D model transmission is able to substantially save the transmission bandwidth and therefore has a significant impact on wireless graphics applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Deblocking With Coefficient Regularization, Shape-Adaptive Filtering, and Quantization Constraint

    Publication Year: 2008 , Page(s): 735 - 745
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5017 KB) |  | HTML iconHTML  

    We propose an effective deblocking scheme with extremely low computational complexity. The algorithm involves three parts: local ac coefficient regularization (ACR) of shifted blocks in the discrete cosine transform (DCT) domain, block-wise shape adaptive filtering (BSAF) in the spatial domain, and quantization constraint (QC) in the DCT domain. The DCT domain ACR suppresses the grid noise (blockiness) in monotone areas. The spatial-domain BSAF alleviates the staircase noise along the edge, and the ringing near the edge and the corner outliers. The narrow quantization constraint set is imposed to prevent possible oversmoothing and improve PSNR performance. Extensive simulation results and comparative studies are provided to justify the effectiveness and efficiency of the proposed deblocking algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Watermarking Method Based on Significant Difference of Wavelet Coefficient Quantization

    Publication Year: 2008 , Page(s): 746 - 757
    Cited by:  Papers (40)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2662 KB) |  | HTML iconHTML  

    This paper proposes a blind watermarking algorithm based on the significant difference of wavelet coefficient quantization for copyright protection. Every seven nonoverlap wavelet coefficients of the host image are grouped into a block. The largest two coefficients in a block are called significant coefficients in this paper and their difference is called significant difference. We quantized the local maximum wavelet coefficient in a block by comparing the significant difference value in a block with the average significant difference value in all blocks. The maximum wavelet coefficients are so quantized that their significant difference between watermark bit 0 and watermark bit 1 exhibits a large energy difference which can be used for watermark extraction. During the extraction, an adaptive threshold value is designed to extract the watermark from the watermarked image under different attacks. We compare the adaptive threshold value to the significant difference which was quantized in a block to determine the watermark bit. The experimental results show that the proposed method is quite effective against JPEG compression, low-pass filtering, and Gaussian noise; the PSNR value of a watermarked image is greater than 40 dB. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multilevel Asymmetric Scheme for Digital Fingerprinting

    Publication Year: 2008 , Page(s): 758 - 766
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (909 KB) |  | HTML iconHTML  

    The present paper proposes an asymmetric watermarking scheme suitable for fingerprinting and precision-critical applications. The method is based on linear algebra and is proved to be secure under projection attack. The problem of anonymous fingerprinting is also addressed, by allowing a client to get a watermarked image from a server without revealing her own identity. In particular, we consider the specific scenario where the client is a structured organization being trusted as a whole but involving possibly untrusted members. In such a context, where the watermarked copy can be made available to all members, but only authorized subgroups should be able to remove the watermark and recover a distortion-free image, a multilevel access to the embedding key is provided by applying Birkhoff polynomial interpolation. Extensive simulations demonstrate the robustness of the proposed method against standard image degradation operators. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Audio-Visual Speech Recognition Based on Late Integration

    Publication Year: 2008 , Page(s): 767 - 779
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (603 KB) |  | HTML iconHTML  

    Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention because of its robustness in noisy environments. In this paper, we present a late integration scheme-based AVSR system whose robustness under various noise conditions is improved by enhancing the performance of the three parts composing the system. First, we improve the performance of the visual subsystem by using the stochastic optimization method for the hidden Markov models as the speech recognizer. Second, we propose a new method of considering dynamic characteristics of speech for improved robustness of the acoustic subsystem. Third, the acoustic and the visual subsystems are effectively integrated to produce final robust recognition results by using neural networks. We demonstrate the performance of the proposed methods via speaker-independent isolated word recognition experiments. The results show that the proposed system improves robustness over the conventional system under various noise conditions without a priori knowledge about the noise contained in the speech. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Cyclic Interface for the Presentation of Multiple Music Files

    Publication Year: 2008 , Page(s): 780 - 793
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1265 KB) |  | HTML iconHTML  

    This paper proposes a novel cyclic interface for browsing through a song database. The method, which sums multiple audio streams on a server and broadcasts only a single summed stream, allows the user to hear different parts of each audio stream by cycling through all available streams. Songs are summed into a single stream based on a combination of spectral entropy and local power of each song's waveform. Perceptual parameters of the system are determined based on experiments conducted on 20 users, for three, four, and five songs. Results illustrate that the proposed methodology requires less listening time as compared to traditional list-based interfaces when the desired audio clip is among one of the audio streams. Applications of this methodology include any search system which returns multiple audio search results, including music query by example. The proposed methodology can be used for real-time searching with an ordinary internet browser. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Graph-Based Multiplayer Detection and Tracking in Broadcast Soccer Videos

    Publication Year: 2008 , Page(s): 794 - 805
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1498 KB)  

    In this paper, we propose a graph-based approach for detecting and tracking multiple players in broadcast soccer videos. In the first stage, the position of the players in each frame is determined by removing the non player regions. The remaining pixels are then grouped using a region growing algorithm to identify probable player candidates. A directed weighted graph is constructed, where probable player candidates correspond to the nodes of the graph while each edge links candidates in a frame with the candidates in next two consecutive frames. Finally, dynamic programming is applied to find the trajectory of each player. Experiments with several sequences from broadcasted videos of international soccer matches indicate that the proposed approach is able to track the players reasonably well even under varied illumination and ground conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic Coding by Supervised Dimensionality Reduction

    Publication Year: 2008 , Page(s): 806 - 818
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1625 KB) |  | HTML iconHTML  

    This paper addresses the problem of representing multimedia information under a compressed form that permits efficient classification. The semantic coding problem starts from a subspace method where dimensionality reduction is formulated as a matrix factorization problem. Data samples are jointly represented in a common subspace extracted from a redundant dictionary of basis functions. We first build on greedy pursuit algorithms for simultaneous sparse approximations to solve the dimensionality reduction problem. The method is extended into a supervised algorithm, which further encourages the class separability in the extraction of the most relevant features. The resulting supervised dimensionality reduction scheme provides an interesting tradeoff between approximation (or compression) and discriminant feature extraction (or classification). The algorithm provides a compressed signal representation that can directly be used for multimedia data mining. The application of the proposed algorithm to image recognition problems further demonstrates classification performances that are competitive with state-of-the-art solutions in handwritten digit or face recognition. Semantic coding certainly represents an interesting solution to the challenging problem of processing huge volumes of multidimensional data in modern multimedia systems, where compressed data have to be processed and analyzed with limited computational complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combining Topological and Geometrical Features for Global and Partial 3-D Shape Retrieval

    Publication Year: 2008 , Page(s): 819 - 831
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1025 KB) |  | HTML iconHTML  

    This paper presents a novel framework for 3-D object content-based search and retrieval, appropriate for both partial and global matching applications. The framework is based on a graph representation of a 3-D object which is enhanced by local geometric features. The 3-D object is decomposed into meaningful parts and an attributed graph is constructed based on the connectivity of the parts. Every 3-D part is approximated with a suitable superellipsoid and a novel 3-D shape descriptor, called a 3-D distance field descriptor, is computed and associated to the corresponding graph nodes. The matching process used is based on attributed graph matching algorithm appropriate for this application. The proposed method not only provides successful retrieval results in terms of geometric similarity but also is invariant to rotation, translation and scaling of an object as well as to the different poses of articulated objects. Finally, it can be effectively used for partial and global 3-D object retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Color-Based Image Salient Region Segmentation Using Novel Region Merging Strategy

    Publication Year: 2008 , Page(s): 832 - 845
    Cited by:  Papers (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2363 KB) |  | HTML iconHTML  

    In this paper, we propose a novel unsupervised algorithm for the segmentation of salient regions in color images. There are three phases in this algorithm. In the first phase, we use nonparametric density estimation to extract candidates of dominant colors in an image, which are then used for the quantization of the image. The label map of the quantized image forms initial regions of segmentation. In the second phase, we define salient region with two properties; i.e., conspicuous; compact and complete. According to the definition, two new parameters are proposed. One is called ldquoImportance indexrdquo, which is used to measure the importance of a region, and the other is called ldquoMerging likelihoodrdquo, which is utilized to measure the suitability of region merging. Initial regions are merged based on the two new parameters. In the third phase, a similarity check is performed to further merge the surviving regions. Experimental results show that the proposed method achieves excellent segmentation performance for most of our test images. In addition, the computation is very efficient. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks

    Publication Year: 2008 , Page(s): 846 - 857
    Cited by:  Papers (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1028 KB) |  | HTML iconHTML  

    This paper presents a multistage system for speech/music discrimination which is based on a three-step procedure. The first step is a computationally efficient scheme consisting of a region growing technique and operates on a 1-D feature sequence, which is extracted from the raw audio stream. This scheme is used as a preprocessing stage and yields segments with high music and speech precision at the expense of leaving certain parts of the audio recording unclassified. The unclassified parts of the audio stream are then fed as input to a more computationally demanding scheme. The latter treats speech/music discrimination of radio recordings as a probabilistic segmentation task, where the solution is obtained by means of dynamic programming. The proposed scheme seeks the sequence of segments and respective class labels (i.e., speech/music) that maximize the product of posterior class probabilities, given the data that form the segments. To this end, a Bayesian Network combiner is embedded as a posterior probability estimator. At a final stage, an algorithm that performs boundary correction is applied to remove possible errors at the boundaries of the segments (speech or music) that have been previously generated. The proposed system has been tested on radio recordings from various sources. The overall system accuracy is approximately 96%. Performance results are also reported on a musical genre basis and a comparison with existing methods is given. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Collaboration for Enhanced Sender-Driven Video Streaming

    Publication Year: 2008 , Page(s): 858 - 870
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1469 KB) |  | HTML iconHTML  

    We propose a sender-driven system for adaptive streaming from multiple servers to a single receiver over separate network paths. The servers employ information in receiver feedbacks to estimate the available bandwidth on the paths and then compute appropriate transmission schedules for streaming media packets to the receiver based on the bandwidth estimates. An optimization framework is proposed that enables the senders to compute their transmission schedules in a distributed way, and yet to dynamically coordinate them over time such that the resulting video quality at the receiver is maximized. To reduce the computational complexity of the optimization framework an alternative technique based on packet classification is proposed. The substantial reduction in online complexity due to the resulting packet partitioning makes the technique suitable for practical implementations of adaptive and efficient distributed streaming systems. Simulations with Internet network traces demonstrate that the proposed solution adapts effectively to bandwidth variations and packet loss. They show that the proposed streaming framework provides superior performance over a conventional distortion-agnostic scheme that performs proportional packet scheduling on the network paths according to their respective bandwidth values. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized Periodic Broadcast of Nonlinear Media

    Publication Year: 2008 , Page(s): 871 - 884
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (450 KB) |  | HTML iconHTML  

    Conventional video consists of a single sequence of video frames. During a client's playback period, frames are viewed sequentially from some specified starting point. The fixed frame ordering of conventional video enables efficient scheduled broadcast delivery, as well as efficient near on-demand delivery to large numbers of concurrent clients through use of periodic broadcast protocols in which the video file is segmented and transmitted on multiple channels. This paper considers the problem of devising scalable protocols for near on-demand delivery of "nonlinear" media files whose content may have a tree or graph, rather than linear, structure. Such media allows personalization of the media playback according to individual client preferences. We formulate a mathematical model for determination of the optimal periodic broadcast protocol for nonlinear media with piecewise-linear structures. Our objective function allows differing weights to be placed on the startup delays required for differing paths through the media. Studying a number of simple nonlinear structures we provide insight into the characteristics of the optimal solution. For cases in which the cost of solving the optimization model is prohibitive, we propose and evaluate an efficient approximation algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Content-Aware Playout and Packet Scheduling for Video Streaming Over Wireless Links

    Publication Year: 2008 , Page(s): 885 - 895
    Cited by:  Papers (34)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1025 KB) |  | HTML iconHTML  

    Media streaming over wireless links is a challenging problem due to both the unreliable, time-varying nature of the wireless channel and the stringent delivery requirements of media traffic. In this paper, we use joint control of packet scheduling at the transmitter and content-aware playout at the receiver, so as to maximize the quality of media streaming over a wireless link. Our contributions are twofold. First, we formulate and study the problem of joint scheduling and playout control in the framework of Markov decision processes. Second, we propose a novel content-aware adaptive playout control, that takes into account the content of a video sequence, and in particular the motion characteristics of different scenes. We find that the joint scheduling and playout control can significantly improve the quality of the received video, at the expense of only a small amount of playout slowdown. Furthermore, the content-aware adaptive playout places the slowdown preferentially in the low-motion scenes, where its perceived effect is lower. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Queuing-Based Dynamic Channel Selection for Heterogeneous Multimedia Applications Over Cognitive Radio Networks

    Publication Year: 2008 , Page(s): 896 - 909
    Cited by:  Papers (51)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1217 KB) |  | HTML iconHTML  

    In this paper, we propose a dynamic channel-selection solution for autonomous wireless users transmitting delay-sensitive multimedia applications over cognitive radio networks. Unlike prior works that seldom consider the requirement of the application layer, our solution explicitly considers various rate requirements and delay deadlines of heterogeneous multimedia users. Note that the users usually possess private utility functions, application requirements, and distinct channel conditions in different frequency channels. To efficiently manage available spectrum resources in a decentralized manner, information exchange among users is necessary. Hence, we propose a novel priority virtual queue interface that determines the required information exchanges and evaluates the expected delays experienced by various priority traffics. Such expected delays are important for multimedia users due to their delay-sensitivity nature. Based on the exchanged information, the interface evaluates the expected delays using priority queuing analysis that considers the wireless environment, traffic characteristics, and the competing users' behaviors in the same frequency channel. We propose a dynamic strategy learning (DSL) algorithm deployed at each user that exploits the expected delay and dynamically adapts the channel selection strategies to maximize the user's utility function. We simulate multiple video users sharing the cognitive radio network and show that our proposed solution significantly reduces the packet loss rate and outperforms the conventional single-channel dynamic resource allocation by almost 2 dB in terms of video quality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Resource Allocation for Robust Distributed Multi-Point Video Conferencing

    Publication Year: 2008 , Page(s): 910 - 925
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1387 KB) |  | HTML iconHTML  

    This paper proposes a distributed multi-point video conferencing system over packet erasure channels, where the aggregation of multiple video streams and resource allocation are performed in a distributed manner. Video stream combiners, which are located in different geographical areas and serve as portals for conferees, aggregate incoming streams supplied by local users with other streams aggregated from nearby video stream combiners. A packet-division multiple-access (PDMA)-based error protection scheme is proposed to be performed at each video stream combiner to minimize the maximal expected video distortion among aggregated streams. The proposed error protection scheme for multi-stream aggregation also supports user preference. In order to deliver video streams to end users with different preferred quality, a consensus algorithm is proposed to adaptively perform resource allocation based on user preference. Simulation results show that the proposed multi-stream aggregation and error protection scheme has significant gains over traditional multi-stream error protection schemes for a multi-point video conferencing system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulating a Smartboard by Real-Time Gesture Detection in Lecture Videos

    Publication Year: 2008 , Page(s): 926 - 935
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1021 KB) |  | HTML iconHTML  

    Gesture plays an important role for recognizing lecture activities in video content analysis. In this paper, we propose a real-time gesture detection algorithm by integrating cues from visual, speech and electronic slides. In contrast to the conventional ldquocomplete gesturerdquo recognition, we emphasize detection by the prediction from ldquoincomplete gesturerdquo. Specifically, intentional gestures are predicted by the modified hidden Markov model (HMM) which can recognize incomplete gestures before the whole gesture paths are observed. The multimodal correspondence between speech and gesture is exploited to increase the accuracy and responsiveness of gesture detection. In lecture presentation, this algorithm enables the on-the-fly editing of lecture slides by simulating appropriate camera motion to highlight the intention and flow of lecturing. We develop a real-time application, namely simulated smartboard, and demonstrate the feasibility of our prediction algorithm using hand gesture and laser pen with simple setup without involving expensive hardware. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recognizing Human Emotional State From Audiovisual Signals*

    Publication Year: 2008 , Page(s): 936 - 946
    Cited by:  Papers (25)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1723 KB) |  | HTML iconHTML  

    Machine recognition of human emotional state is an important component for efficient human-computer interaction. The majority of existing works address this problem by utilizing audio signals alone, or visual information only. In this paper, we explore a systematic approach for recognition of human emotional state from audiovisual signals. The audio characteristics of emotional speech are represented by the extracted prosodic, Mel-frequency Cepstral Coefficient (MFCC), and formant frequency features. A face detection scheme based on HSV color model is used to detect the face from the background. The visual information is represented by Gabor wavelet features. We perform feature selection by using a stepwise method based on Mahalanobis distance. The selected audiovisual features are used to classify the data into their corresponding emotions. Based on a comparative study of different classification algorithms and specific characteristics of individual emotion, a novel multiclassifier scheme is proposed to boost the recognition performance. The feasibility of the proposed system is tested over a database that incorporates human subjects from different languages and cultural backgrounds. Experimental results demonstrate the effectiveness of the proposed system. The multiclassifier scheme achieves the best overall recognition rate of 82.14%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo