Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Multimedia, 2007. ISM 2007. Ninth IEEE International Symposium on

Date 10-12 Dec. 2007

Filter Results

Displaying Results 1 - 25 of 51
  • Ninth IEEE International Symposium on Multimedia - Cover

    Publication Year: 2007 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (161 KB)  
    Freely Available from IEEE
  • Ninth IEEE International Symposium on Multimedia - Title page

    Publication Year: 2007 , Page(s): i - iii
    Save to Project icon | Request Permissions | PDF file iconPDF (107 KB)  
    Freely Available from IEEE
  • Ninth IEEE International Symposium on Multimedia - Copyright

    Publication Year: 2007 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (72 KB)  
    Freely Available from IEEE
  • Ninth IEEE International Symposium on Multimedia - TOC

    Publication Year: 2007 , Page(s): v - viii
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • General Co-chairs' Foreword

    Publication Year: 2007 , Page(s): ix - x
    Save to Project icon | Request Permissions | PDF file iconPDF (104 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Message from the Program Chairs

    Publication Year: 2007
    Save to Project icon | Request Permissions | PDF file iconPDF (88 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Conference Organizing Committee

    Publication Year: 2007 , Page(s): xii - xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (104 KB)  
    Freely Available from IEEE
  • Technical Program Committee

    Publication Year: 2007 , Page(s): xv - xviii
    Save to Project icon | Request Permissions | PDF file iconPDF (113 KB)  
    Freely Available from IEEE
  • The Design of a Multi-party VoIP Conferencing System over the Internet

    Publication Year: 2007 , Page(s): 3 - 10
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB) |  | HTML iconHTML  

    In this paper, we present the design of a VoIP conferencing system that enables the voice communication of multiple users in the Internet. After studying the conversational dynamics in multi-party conferencing, we identify user-observable metrics that affect the perception of conversational quality and their trade-offs. Based on the dynamics and the behavior on delays, jitters, and losses of Internet traces collected in the PlanetLab, we design the transmission topology and schemes for loss concealments and play-out scheduling. Last, we compare the performance of our system and Skype (version 3.5.0.214) using repeatable experiments that simulate human participants and network conditions in a multi-party conferencing scenario. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Role of QoE on IPTV Services style

    Publication Year: 2007 , Page(s): 11 - 13
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (422 KB) |  | HTML iconHTML  

    The IPTV, Internet Protocol TV, is one of the hottest topics as an emerging service. This new media service has a significant potential where a various kind of content can be enjoyed in a variety of way. We are living in the content-centric world. This flood of data thanks to the evolution of the hardware since 60 year- old transistor technology becomes the potential problem these days. The user experience of this new media is thought as a key factor to success an IPTV service. Since a very early stage in ITU-T Focus Group on IPTV, QoE, Quality of Experience, is considered as a most important factor. This subjective concept should be measurable in a same manner as the QoS. The metadata function for the personalized service in IPTV will be described also. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Image Compression Scheme Based on Locally Adaptive Coding

    Publication Year: 2007 , Page(s): 14 - 21
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1246 KB) |  | HTML iconHTML  

    Vector quantization (VQ) is a simple and widely used compression technology in many applications. For image compression, VQ provides both a fixed compression ratio and maintains acceptable distortion. However, the performance of VQ still can be improved in terms of the image quality of compressed images and codebook size used for encoding and decoding. In this paper, a new VQ-like image compression method is proposed to improve the performance of traditional VQ by using locally adaptive coding concept. The experimental results confirm that the image quality of the compressed image offered by the proposed method is higher than 30 dB on average, and the number of codewords used in our codebook is less than that required by traditional VQ. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Making Sense of Ubiquitous Media style

    Publication Year: 2007 , Page(s): 22 - 26
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (571 KB) |  | HTML iconHTML  

    In the emerging Post-PC era, more and more computers 'in the net' can see, hear, or feel. Since these computers are networked, they can cooperate in the interpretation of their 'sensation'. Cameras, camcorders, etc. will soon be wirelessly connected, doubling as mobile phones. In other words: multimedia goes ubiquitous. On the other hand, users leverage off the wealth of text-based information present in the global Internet. However, the potential that lies in the 'cooperative sensation' and in the use of global textual information is by far not leveraged: it is the past, present, and future grand challenge to enable computers to 'make more sense' of all this information. The talk will provide a unified model for both multimedia sense-making and textual-information sense-making, and propose fostering the confluence of these two threads. Based on this unified view, it will suggest steps towards improved sense-making in the world of ubiquitous computers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detection of Questions in Arabic Audio Monologues Using Prosodic Features

    Publication Year: 2007 , Page(s): 29 - 36
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (203 KB)  

    Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we investigate is that of identifying question sentences in Arabic monologue lectures. Languages other than Arabic have received a lot of attention in this regard. We approach this problem by first segmenting the sentences from the continuous speech using intensity and duration features. Prosodic features are, then, extracted from each sentence. These features are used as input to decision trees to classify each sentence into either question or non question sentence. Our results suggest that questions are cued by more than one type of prosodic features in natural Arabic speech. We used C4.5 decision trees for classification and achieved 75.7% accuracy. Feature specific analysis further reveals that energy and fundamental frequency features are mainly responsible for discriminating between questions and non-question sentences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition

    Publication Year: 2007 , Page(s): 37 - 44
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (405 KB) |  | HTML iconHTML  

    In this paper, two multi-stream asynchrony Dynamic Bayesian Network models (MS-ADBN model and MM-ADBN model) are proposed for audio-visual speech recognition (AVSR). The proposed models, with different topology structures, loose the asynchrony of audio and visual streams to word level. For MS-ADBN model, both in audio stream and in visual stream, each word is composed of its corresponding phones, and each phone is associated with observation vector. MM- ADBN model is an augmentation of MS-ADBN model, a level of hidden nodes--state level, is added between the phone level and the observation node level, to describe the dynamic process of phones. Essentially, MS-ADBN model is a word model, while MM-ADBN model is a phone model. Speech recognition experiments are done on a digit audio-visual (A-V) database, as well as on a continuous A-V database. The results demonstrate that the asynchrony description between audio and visual stream is important for AVSR system, and MM-ADBN model has the best performance for the task of continuous A-V speech recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Adaptive Audio Quantizer for Voip Systems

    Publication Year: 2007 , Page(s): 45 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (199 KB) |  | HTML iconHTML  

    The Internet evolution has been requiring the development of new technology to support multimedia transmission such as images, database access, audio and video in realtime. Such development needs new services and supports like the voice over IP (VoIP) which has a main motivation in the low cost communication and management. VoIP systems have motivated this work which proposes an adaptive audio quantizer named IQ (intervalar quantizer) to reduce the data dimensionality and consequently the entropy, what allows better audio compression. This quantizer is adaptive because it has an error tolerance parameter which can be varied according to the available network bandwidth, allowing to adapt communication. After transmitting, the audio is improved by using a filter with complex poles in the Z plan. This filter attenuates non-important frequencies, privileging the sensitive ones to human audition. Results confirm that IQ and the filter offer good quality (measured using the mean opinion score metrics) and compress ratio. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Complexity Reduction and Fast Algorithm for 2-D Integer Discrete Wavelet Transform Using Symmetric Mask-Based Scheme

    Publication Year: 2007 , Page(s): 57 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (619 KB) |  | HTML iconHTML  

    Wavelet coding has been shown to be better than discrete cosine transform (DCT) in image/video processing. Moreover, it has the feature of scalability, which is involved in modern video standards. This work presents novel algorithms, namely 2-D symmetric mask-based discrete wavelet transform (SMDWT), to improve the critical issue of the 2-D lifting-based discrete wavelet transform (LDWT), and then obtains the benefit of low latency, high-speed operation, and low temporal memory. The SMDWT also has the advantages of high-performance embedded periodic extension boundary treatment, reduced complexity, regular signal coding, short critical path, reduced latency time, and independent subband coding processing. Moreover, the 2-D lifting-based DWT performance can also be easily improved by exploiting appropriate parallel method inherently in SMDWT. Comparing with the normal 2-D 5/3 integer lifting-based DWT the proposed method significantly improves lifting-based latency and complexity in 2-D DWT without degradation in image quality. The algorithm can be applied to real-time image/video applications, such as JPEG2000, MPEG-4 still texture object decoding, and wavelet-based Scalable Video Coding (SVC). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Local Binary Patterns for Human Detection on Hexagonal Structure

    Publication Year: 2007 , Page(s): 65 - 71
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (324 KB) |  | HTML iconHTML  

    Local binary pattern (LBP) was designed and has been widely used for efficient texture classification. LBP provides a simple and effective way to represent texture patterns. Uniform LBPs play an important role for LBP-based pattern/object recognition as they include majority of LBPs. On the other hand, Human detection based on Mahalanobis distance map (MDM) recognizes appearance of human based on geometrical structure. Each MDM shows a clear texture pattern that can be classified using LBPs. In this paper, we compute LBPs of MDMs on a hexagonal structure. The circular pixel arrangement in hexagonal structure results in higher accuracy for LBP representation than on square structure. Chi-square as a measure is used for human detection based on uniform LBPs obtained. We show that our method using LBPs built on MDMs has a higher human detection rate and a lower false positive rate compared to the method merely based on MDMs. We will also show using experimental results that LBPs on hexagonal structure lead to more robust human classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Early Termination for Fast H.264 Video Coding

    Publication Year: 2007 , Page(s): 72 - 77
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (358 KB) |  | HTML iconHTML  

    The H.264 standard applies several powerful coding methods to obtain high compression efficiency. However, it requires a lot of computation especially in variable block-size motion estimation. To reduce the motion estimation redundancy more effectively, an adaptive early termination algorithm is proposed in this paper. The proposed algorithm dynamically changes the thresholds for different coding modes according to video content. With the proposed method, many zero motion blocks can be predicted, the corresponding motion estimation can stop early, and the remaining computation can be omitted. Simulation results show that the proposed method can averagely reduce the entire coding time up to 14.38% and the motion estimation time up to 21.82% at the price of negligible coding loss. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatial-Temporal Error Detection Scheme for Video Transmission over Noisy Channels

    Publication Year: 2007 , Page(s): 78 - 85
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (279 KB) |  | HTML iconHTML  

    Error detection plays an important role in an error- robust video decoder. In this paper, a spatial-temporal error detection scheme for a video decoder is proposed. By considering inherently spatial and temporal similarities in video sequences, the visually corrected macroblocks in the decoded frames are detected by employing a set of error detection procedures, where one cross-boundary similarity index and one cross-frame similarity index are defined for spatial and temporal error detection, respectively. An adaptive threshold scheme is also proposed to make the proposed error detection method suitable for different video sequences. After being integrated with an H.264 decoder with error concealment techniques, the video quality improvement of 0.5-2.4 dB in PSNR is achieved. This method can also be integrated with other video codecs to improve the decoded video quality over noisy channels. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Moving Region Detection by Transportation Problem Solving

    Publication Year: 2007 , Page(s): 86 - 91
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2020 KB) |  | HTML iconHTML  

    In this paper, we propose a novel moving region detection method from the viewpoint of solving the transportation problem. This method extracts the relations between regions as a solution to the transformation problem between pixels belonging to adjacent frames. Moving regions are detected by utilizing the properties of these relations. This method does not require any models such as prior knowledge or particular assumptions about moving objects or backgrounds in a video. Since the method adaptively detects moving regions from input frame data, it can deal with the fluctuations of moving objects or backgrounds. We demonstrate the effectiveness of the proposed method through several experiments conducted using actual videos. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Summarization of Wearable Videos Based on User Activity Analysis

    Publication Year: 2007 , Page(s): 92 - 99
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1610 KB) |  | HTML iconHTML  

    This paper presents a model for automatic summarization of videos recorded by wearable cameras. The proposed model detects various user activities by computing the transform of matching image features among video frames. Four basic types of user activities are proposed, including "moving closer /farther", "panning", "making a turn", and "rotation". Different summarization techniques are provided for different activity types, and a wearable video sequence can be summarized as a compact set of panoramic images. The user activity analysis is solely based on the analysis of images, without resorting to the information of other sensors. Experimental results on a 19- minute video sequence demonstrate the effectiveness of our proposed model. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Feature-Based Full-Frame Image Stabilization

    Publication Year: 2007 , Page(s): 100 - 106
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1630 KB) |  | HTML iconHTML  

    Digital image stabilization usually discards boundary pixels and outputs a smaller video. In this paper, we present a new digital image stabilization algorithm that preserves the frame size of output video by pixel filling. The proposed algorithm eliminates the accumulation error by directly estimating the global motions in a transformation chain with reference to a fixed frame. A feature matching method is adopted to save the computational cost of the global motion estimation and to handle large motions. The experimental results show that the proposed algorithm produces stabilized full-frame video sequences with better frame alignment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Event-Based Segmentation of Sports Video Using Motion Entropy

    Publication Year: 2007 , Page(s): 107 - 111
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (293 KB) |  | HTML iconHTML  

    An event-based segmentation method for sports videos is presented. A motion entropy criterion is employed to characterize the level of intensity of relevant object motion in individual frames of a video sequence. The resulting motion entropy curve then is approximated with a piece-wise linear model using a homoscedastic error model based time series change point detection algorithm. It is observed that interesting sports events are correlated with specific patterns of the piece-wise linear model. A set of empirically derived classification rules then is derived based on these observations. Application of these rules to the motion entropy curve leads to this motion entropy curve, one is able to segment the corresponding video sequence into individual sections, each consisting of a semantically relevant event. The proposed method is tested on six hours of sports videos including basketball, soccer and tennis. Excellent experimental results are observed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RISA: A Real-Time Interactive Shadow Avatar

    Publication Year: 2007 , Page(s): 112 - 122
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (441 KB) |  | HTML iconHTML  

    As Webcams become an important factor in the PC environment, many camera-based communication techniques have been developed. Among them, gesture-based communication is attracting attention. In this paper, we propose a real-time interactive shadow avatar (RISA) which can express facial emotions by changing as response to the user's gestures. The avatar's shape is a virtual shadow constructed from a real-time sampled picture of user's shape. Several predefined facial animations overlap on the face area of the virtual shadow, according to the types of hand gestures. We use the background subtraction method to separate the virtual shadow, and a simplified region-based tracking method is adopted for tracking hand positions and detecting hand gestures. In order to achieve a smooth change of emotions, we use a refined morphing method which uses many more frames in contrast to traditional dynamic emoticons. Through our experiments, we found that in the cases where there was enough distance between a camera and a user, the accuracy was higher than in the cases where the distance between them was very close. We have found RISA to be very useful in simple online chatting and PC game environments and it was also highlighted in a real media art exhibition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A General Scheme for Extracting QR Code from a Non-uniform Background in Camera Phones and Applications

    Publication Year: 2007 , Page(s): 123 - 130
    Cited by:  Papers (11)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (956 KB) |  | HTML iconHTML  

    With the rapid advances in mobile communication technologies, QR code in the embedded camera devices has been used as new input interfaces. However, the previous works for extracting QR code from an image do not consider a non-uniform background. In this paper, we implement the applications of QR code and propose an efficient algorithm to extract QR code from the non-uniform background. In contrast with prior works, our approach is of higher accuracy for QR-code recognition and more practical for use in a mobile information environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.