By Topic

Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on

Date 6-9 July 2003

Go

Filter Results

Displaying Results 1 - 25 of 161
  • Improved text overlay detection in videos using a fusion-based classifier

    Page(s): III - 473-6 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (398 KB) |  | HTML iconHTML  

    In this paper, classifier fusion is adopted to demonstrate improved performance for our text overlay detections in the NIST TREC-2002 video retrieval benchmark. A normalized ensemble fusion is explored to combine two text overlay detection models. The fusion incorporates normalization of confidence scores, aggregation via combiner function, and an optimize selection. The proposed fusion classifier resulted best out of 11 detectors submitted to the NIST text overlay detection benchmarking and its average precision performance is 227% of the second best detector in the benchmark. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhanced access to digital video through visually rich interfaces

    Page(s): III - 21-4 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (433 KB) |  | HTML iconHTML  

    An image-rich interface is presented, which emphasizes visual exploration of sets of images representing shots returned from a query of filter against a digital video corpus. This interface, a storyboard of keyframes for multiple video segments, maintains temporal layout, accommodates contextual cues and filtering, supports additional filtering through visual features, and provides a means of drilling down to synchronized points in the associated video. These features allow for effective information retrieval from a video collection, as evidenced by the success achieved in interactive query for the TREC 2002 Video Retrieval Track (TREC-V). This paper introduces TREC-V, discusses the design of the multi-segment storyboard interface, illustrates its use with respect to the TREC-V topics, and presents results and conclusions based on the TREC-V evaluation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Practical real-time video codec for mobile devices

    Page(s): III - 509-12 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB) |  | HTML iconHTML  

    Real-time software-based video codec is widely used on PCs with relatively strong computing capability. However, mobile devices, such as pocket PCs and handheld PCs, still suffer from weak computational power, short battery lifetime and limited display capability. We developed a practical low-complexity real-time video codec for mobile devices. Several methods that can significantly reduce the computational cost are adopted in this codec and described in this paper, including a predictive algorithm for motion estimation, the integer discrete cosine transform (IntDCT), and a DCT/quantizer bypass technique. A real-time video communication implementation of the proposed coded is also introduced. Experiments show that substantial computation reduction is achieved while the loss in video quality is negligible. The proposed codec is very suitable for scenarios where low-complexity computing is required. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scrambling of engineering drawings

    Page(s): III - 85-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (348 KB) |  | HTML iconHTML  

    Engineering drawings are ubiquitously used for capturing, conveying and archiving innovative engineering designs. Many engineering companies' core intellectual property resides in their proprietary engineering drawings. Therefore, protection of such vital data is extremely important. This paper provides a swap-transformation matrix based approach to scramble engineering drawings in order to enable confidentiality. An engineering drawing involves the topological information and vertex information. The vertex information is more valuable than the topological information, since the vertices information primarily determines the content of engineering drawings. We argue that the vertex information is more valuable than the topological information, even if some topological information is lost, a drawing may be reconstructed from the vertex positions. We provide for three keys to ensure the security of the drawing. The technique can facilitate digital rights management of engineering drawings. The advantages of our technique are that scrambling is computationally less intensive than encryption and it allows for partial obfuscation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic video summarization in compressed domain MPEG video

    Page(s): III - 329-32 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (356 KB) |  | HTML iconHTML  

    In this paper, we present a semantic summarization algorithm that interfaces with the metadata and that works in compressed domain, in particular MPEG-1 and MPEG-2 videos. In enabling a summarization algorithm through high-level semantic content, we try to address two major problems. First, we present the facility provided in the DVA system that allows the semi-automatic creation of this metadata. Second, we address the main point of this system which is the utilization of this metadata to filter out frames, creating an abstract of a video summary quality survey indicates that the proposed method performs satisfactorily. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extending MPEG-7 description scheme of moving regions by the semantic visual-spatio-temporal relationships

    Page(s): III - 269-72 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (373 KB) |  | HTML iconHTML  

    The recent proliferation of multimedia contents led to the need to more effective and efficient content representation techniques to speed up their retrieval. MPEG-7 is a new standard that aims at describing the low-level (syntactic) and high-level (semantic) multimedia content. In MPEG-7, the relationships between moving regions (i.e. objects) in videos are represented by the directional, spatial and temporal relationships. In this paper, we propose to extend the description scheme (DS) of moving objects to include rich sets of visual-spatio-temporal (VST) relationships that support semantical descriptions of the relationships between objects. Also, we propose an XML based DSs of the VST relationships and then present the bit format representations for the VST relationships. The VST relationships are more intuitive to users, thus simplifying the formulation of user queries. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis-by-synthesis distortion computation for rate-distortion optimized multimedia streaming

    Page(s): III - 345-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (375 KB) |  | HTML iconHTML  

    This paper presents an analysis-by-synthesis technique to evaluate the perceptual importance of multimedia packets for rate-distortion optimized streaming. The proposed technique, instead of relying on a priori information, computes the distortion that would be caused by the loss of each single packet, including the effects of error propagation and receiver-side error concealment. A rate-distortion optimized streaming algorithm is presented to compare the perceptual performance obtained using content-adaptive analysis-by-synthesis distortion values versus distortion values obtained using a priori knowledge of the statistical importance of the elements of the compressed multimedia bitstream. Simulations with video test sequences compressed with the MPEG-2 coding standard show that the proposed technique delivers substantial and consistent PSNR gains (1.2-2.8 dB) with respect to ideal frame type-driven a priori distortion evaluation for a wide range of channel conditions. Compared to distortion-agnostic streaming techniques such as SoftARQ, the gain is even more pronounced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Registering electrophoresis images for bioinformatics study of protein

    Page(s): III - 465-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (328 KB) |  | HTML iconHTML  

    Matching 2-D gel electrophoresis images is the first and a major step in bioinformatics study of protein. This paper presents an improved implementation of a thin-plate spline image registration. The effectiveness of the algorithm is demonstrated on matching 2-D gel protein separation profiles. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Face tracking in video with hybrid of Lucas-Kanade and condensation algorithm

    Page(s): III - 293-6 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (355 KB) |  | HTML iconHTML  

    In this paper, we present a robust face tracking system for video indexing and retrieval. Our face tracker is designed based on the condensation algorithm. The strength of our face tracking is in the incorporation of Lucas-Kanade feature tracker in the measurement stage of condensation. Skin color and facial feature points are used for tracking. The pros and cons of using color and facial feature points complement each other and ensure the effectiveness of our system. We also adopt a bi-directional tracking approach to enhance the robustness. We demonstrate the efficacy of our technique in the challenging task of tracking faces in various video sources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling of the non-deterministic synchronization behaviors in SMIL2.0 documents

    Page(s): III - 265-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (345 KB) |  | HTML iconHTML  

    A novel model namely extended real-time synchronization model (E-RTSM) for modeling SMIL2.0 synchronization behaviors is proposed in this paper. E-RTSM deals with schedule-based synchronization as well as event-based synchronization in SMIL2.0. Converting of the temporal relationship of a SMIL2.0 document to E-RTSM is presented. Moreover, design of the E-RTSM-based data-retrieving engine for SMIL2.0 presentations is also proposed in the paper. The data-retrieving engine estimates the worst-case playback time of each object at the parsing stage and applying an error compensation mechanism at run-time to adjust the estimated playback time as well as the schedule of the fetching request for data retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework

    Page(s): III - 401-4 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (347 KB) |  | HTML iconHTML  

    We developed a unified framework to extract highlights from three sports: baseball, golf and soccer by detecting some of the common audio events that are directly indicative of highlights. We used MPEG-7 audio features and entropic prior hidden Markov models (HMM) as the audio features and classifier respectively to recognize these common audio events. Together with pre- and post-processing techniques using general sports knowledge, we have been able to generate promising results dealing with the audio track that is dominated by audio mixtures and noisy background. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust buyer authentication scheme for multimedia object

    Page(s): III - 97-100 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (329 KB) |  | HTML iconHTML  

    In this paper, we propose a buyer authentication scheme that ensures copyright protection of multimedia objects. We show that the authentication mark is robust and recoverable in spite of a number of intentional attacks to destroy the authentication mark. The authentication mark is introduced in the wavelet space. The wavelet coefficients of the multimedia signal are randomly grouped into a number of disjoint subsets of the wavelet coefficients. However, the manipulation of the wavelet coefficient is such that the signal quality is always within the just noticeable distortion (jnd) limit. The buyer specific key survives even though we assume that a potential forger knows the proposed authentication scheme and this knowledge can be used as an attack. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Object tracking using adaptive block matching

    Page(s): III - 65-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (408 KB) |  | HTML iconHTML  

    We propose an object-tracking algorithm that predicts the object contour using motion vector information. Tracking is achieved by predicting the object boundary using motion vectors, followed by contour update, using occlusions/disocclusion detection. An adaptive block-based approach has been used for estimating motion between frames. An efficient modulation scheme is used to control the gap between frames used for object tracking. The algorithm for detecting occlusions proceeds in two steps. First, covered regions are estimated from the displaced frame difference. These covered regions are classified into actual occlusions and false alarms using motion characteristics. Disocclusion detection is also performed in a similar manner. The immediate applications of the proposed tracking algorithm are video compression using MPEG-4 and content retrieval based on standards like H.26L. Preliminary simulation results demonstrate the performance of the proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of MPEG-7 low level audio descriptors with compressed data

    Page(s): III - 273-6 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (375 KB) |  | HTML iconHTML  

    This paper presents a detailed analysis of lossy compression effects on a set of the MPEG-7 low-level audio descriptors. The analysis results show that lossy compression has a detrimental effect on the integrity of practical search and retrieval schemes that utilize the low level audio descriptors. Methods are then proposed to reduce the detrimental effects of compression in searching schemes. These proposed methods include multi-frame searching and machine learning derived prediction. The proposed mechanisms greatly reduce the effect of compression on the set of MPEG-7 descriptors; however, future scope is identified to develop new audio descriptors that account for compression effects in their structure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An improved parallel architecture fro MPEG-4 motion estimation in 3G mobile applications

    Page(s): III - 441-4 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (343 KB) |  | HTML iconHTML  

    A high-parallel VLSI core architecture for MPEG-4 motion estimation is proposed in this paper. It possesses the characteristics of low memory bandwidth and low clock rate requirements, thus primarily aiming at 3G mobile applications. Based on a one-dimensional tree architecture, the architecture employs the dual-register/buffer technique to reduce the preload and alignment cycles. As an example, full-search block matching algorithm has been mapped onto this architecture using a 16-PE array that has the ability to calculate the motion vectors of QCIF video sequences in real time at 1 MHz clock rate and using 15.5 Mbytes/s memory bandwidth. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Merging results of distributed image libraries

    Page(s): III - 33-6 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (398 KB) |  | HTML iconHTML  

    Exploitation of information repositories available on the Internet requires users to separately query each repository and manually gather retrieved results. Such a solution could be simplified by using a centralized server that acts as a gateway between the user and repositories: the centralized server forwards the user query to federated repositories and fuses retrieved documents for presentation to the user. To perform these tasks efficiently, the centralized server should perform two main functions: resource selection and data fusion. The former is required to forward the user query only to the repositories that are candidate to contain relevant documents. The latter is used to gather all retrieved documents and conveniently arrange them for presentation to the user. In the case of image repositories, data fusion is particularly challenging owing to the difficulty to normalize document scores returned by different repositories. In this paper a novel solution is presented for fusion of results returned by different image repositories. Experimental results are presented that show the potential of the proposed approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimodal summarization of meeting recordings

    Page(s): III - 25-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (393 KB) |  | HTML iconHTML  

    Recorded meetings are useful only if people can find, access, and browse them easily. Key-frames and video skims are useful representations that can enable quick previewing of the content without actually watching a meeting recording from beginning to end. This paper proposes a new method for creating meeting video skims based on audio and visual activity analysis together with text analysis. Audio activity analysis is performed by analyzing sound directions-indicating different speakers-and audio amplitude. Detection of important visual events in a meeting is achieved by analyzing the localized luminance variations in consideration with the omni-directional property of the video captured by our meeting recording system. Text analysis is based on the term frequency-inverse document frequency measure. The resulting video skims better capture the important meeting content compared to the skims obtained by uniform sampling. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multidimensional humming transcription using a statistical approach for query by humming systems

    Page(s): III - 385-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (390 KB) |  | HTML iconHTML  

    A new statistical pattern recognition approach applied to human humming transcription is proposed in this research. A music note has two important attributes, i.e. pitch and duration. The proposed algorithm generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust front-end for such application. The segment of a note in the humming waveform is modeled by a hidden Markov model (HMM) while the pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human objects, and an overall correct recognition rate of around 80% is demonstrated. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On designing end-user multicast for multiple video sources

    Page(s): III - 497-500 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (331 KB) |  | HTML iconHTML  

    In this paper, we present a new application level multicast protocol called Emma (end-user multicast for multi-party applications) suitable for communication systems where multiple video sources are exchanged in real-time among end-hosts, such as video-conferencing. The primal feature of Emma is that video sources with the higher priority given by users are prioritized among others for the provision of quality of service at the user level and that all the operations in Emma are done in a distributed manner. Our experimental results have shown that Emma achieves reasonable performance on overlay networks with high user satisfaction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Topic-based inter-video structuring of a large-scale news video corpus

    Page(s): III - 305-8 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (415 KB) |  | HTML iconHTML  

    We propose a topic-based inter-video news video corpus structuring method and a visual interface to efficiently browse through the structured corpus. Such inter-video structuring was not deeply sought in previous works. The topic-based structure is analyzed by closed-caption text analysis; topic segmentation and tracking. The visual interface provides the ability to 1) search and select a topic by query terms and 2) track a topic thread interactively referring to the text analysis results. Although topic retrieval is somewhat similar to conventional video retrieval methods, the combination with topic tracking makes it remarkably easy to narrow down the results that match a user's interest and moreover reveal underlying content-based structures, where the structure itself contains rich information. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models

    Page(s): III - 29-32 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (416 KB) |  | HTML iconHTML  

    Structure elements in a time sequence (e.g. video) are repetitive segments with consistent deterministic or stochastic characteristics. While most existing work in detecting structures follows a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unified approach to structure discovery from long video sequences as simultaneously finding the statistical descriptions of structure and locating segments that matches the descriptions. We model the multilevel statistical structure as hierarchical hidden Markov models, and present efficient algorithms for learning both the parameters and the model structure. When tested on a specific domain, soccer video, the unsupervised learning scheme achieves very promising results: it automatically discovers the statistical descriptions of high-level structures, and at the same time achieves even slightly better accuracy in detecting discovered structures in unlabelled videos than a supervised approach designed with domain knowledge and trained with comparable hidden Markov models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An evaluation of adaptive beamformer based on average speech spectrum for noisy speech recognition

    Page(s): III - 209-12 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (359 KB) |  | HTML iconHTML  

    Distant-talking speech recognition in noisy environments is indispensable for self-moving robots or tele-conference systems. However, background noise and room reverberations seriously degrade the sound-capture quality in real acoustic environments. A microphone array is an ideal candidate as an effective method for capturing distant-talking speech. AMNOR (adaptive microphone-array for noise reduction) was proposed as an adaptive beamformer for capturing the desired distant signals in noisy environments by Kaneda et al. Although the AMNOR has been proven effective, it can be further improved if we know spectrum characteristics of the desired distant signals in advance. Therefore, we regarded speech as a desired distant signal and designed an AMNOR based on the average speech spectrum. In this paper, we particularly focused on the performance of AMNOR based on the average speech spectrum for distant-talking speech capture and recognition. As a result of evaluation experiments in real acoustic environments, we confirmed that the ASR (automatic speech recognition) performance was improved 5-10% by using AMNOR based on the average speech spectrum in noisy environments. In addition, the proposed AMNOR provides better noise reduction performance than that of conventional AMNOR. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On testing methods for biometric authentication

    Page(s): III - 241-4 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (340 KB) |  | HTML iconHTML  

    The use of biometric data for user authentication and/or recognition is now a reality. On the other hand, there is still a strong need for new technologies to overpass intrinsic limitations of already "established" techniques. This not only requires to devise new algorithms but to determine the real potential and limitations of existing techniques. This is possible only devising standard testing and assessment procedures based on statistical observations of the outputs of the system. In order to define better standard evaluation process, a system based on space-variant iconic image matching is described and the validation procedure defined. It turns out that all methods based on the same biometric measurements have the same intrinsic limitations, which can be only overcome by the adoption of a multi-modal or multi-algorithmic approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel motion-based representation for video mining

    Page(s): III - 469-72 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB) |  | HTML iconHTML  

    It is a challenging issue to analyze video content for video mining tasks due to lacking of effective representation of video. As motion is a distinctive feature of video sequence, it is a reasonable and efficient manner to represent video content based on motion. In this paper we proposed a novel motion-based representation for video mining tasks, including a fast dominant motion extraction scheme, called integral template match (ITM), and a set of qualitative and quantitative description schemes. Currently, we applied it to such video mining tasks as shot boundary detection, camera motion segmentation and recognition. The experiments on the test collection of TREC2002 video track have demonstrated the efficiency and robustness of the proposed schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient buffering control for a software-only, high-level, high-profile, MPEG-2 decoder

    Page(s): III - 489-92 vol.3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (388 KB) |  | HTML iconHTML  

    A high-quality MPEG-2 software decoder should support a good scalability performance for a wide range of video format, especially for the high-resolution MPEG-2 video (e.g., HDTV). However, it is found that the existing parallel decoder suffers significant performance degradation when decoding high-level MPEG-2 video with the full system configuration, due to inefficient management mechanism such that the memory space in the decoder. We propose an efficient buffer management mechanism such that the memory requirement is reduced by 50%. This is approached by two steps: first we use an ST scheme to minimize the transmission buffer in a slave node by allowing dynamic sharing between frames in one group of picture (GOP). Then we further reduce the buffer space by a dynamic buffer allocation according to image type. The revised parallel decode showed a satisfactory scale-up performance when decoding the high-resolution video formats. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.