By Topic

Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on

Date 4-4 May 2001

Filter Results

Displaying Results 1 - 25 of 144
  • Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489)

    Publication Year: 2001
    Request permission for commercial reuse | PDF file iconPDF (800 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 2001, Page(s):xx - xxii
    Request permission for commercial reuse | PDF file iconPDF (155 KB)
    Freely Available from IEEE
  • Face recognition by wavelet domain associative memory

    Publication Year: 2001, Page(s):481 - 485
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (440 KB) | HTML iconHTML

    We propose a face recognition scheme based on an auto-associative memory (AM) model. Two kinds of AM models are compared, namely, pseudo-inverse memory and radial basis function (RBF) network, and we found that RBF based associative memory is much more efficient. To capture substantial facial features and reduce computational complexity, we use a wavelet transform (WT) to decompose face images and... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhanced CELP coding with discrete spectral modeling

    Publication Year: 2001, Page(s):111 - 113
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (256 KB) | HTML iconHTML

    During the coding of speech with the code excited linear prediction (CELP) method, the linear prediction (LP) filter coefficients are usually calculated by standard autocorrelation or covariance methods. These methods minimize the mean squared error between the speech signal and predicted value. The perceptual quality of the coded speech is indicated by the spectral distortion measured over a set ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modelling high level semantics for video data management

    Publication Year: 2001, Page(s):291 - 295
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (316 KB) | HTML iconHTML

    In order to manage large collections of video content, we need appropriate video content models that can facilitate interaction with the content. The important issue for video applications is to accommodate different ways in which a video sequence can function semantically. The authors propose and illustrate a metamodel framework that allows users to develop and specify their own semantics using a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human face recognition using a spatially weighted modified Hausdorff distance

    Publication Year: 2001, Page(s):477 - 480
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (356 KB) | HTML iconHTML

    Hausdorff distance is an efficient measure of the similarity of two point sets. We propose a modified Hausdorff distance measure for human face recognition. This modified Hausdorff distance measure incorporates information about the location of important facial features when comparing the edge maps of two facial images. The distance measure is weighted according to a weighted function derived from... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new two-stage scoring normalization approach to speaker verification

    Publication Year: 2001, Page(s):107 - 110
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (372 KB) | HTML iconHTML

    In speaker verification, the cohort and world models have been separately used for scoring normalization. The authors embed the two models in elliptical basis function networks and propose a two-stage decision procedure for improving verification performance. The procedure begins with normalization of an utterance by a world model. If the difference between the resulting score and a world threshol... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data management for visual information systems

    Publication Year: 2001, Page(s):287 - 290
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (356 KB) | HTML iconHTML

    Visual information systems contain a substantial amount of non-alphanumeric information, and represent a radical departure from the largely text-based paradigm of conventional information systems. For the effective operation of an information system, data search and management mechanisms play a pivotal role, and a key requirement of such a system is the ability to search and locate information muc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interactive emotional response computation for scriptable multimedia actors

    Publication Year: 2001, Page(s):473 - 476
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (468 KB) | HTML iconHTML

    Although modern computer graphics and animation are cable of producing near-realistic 3D images of virtual characters, the component of work that needs to be done by animators and artists is quite significant. A virtual actor framework developed by us aids animators in automating the modelling and animation of emotive virtual human heads with visual speech and gestures. We present the `situation p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A performance comparison of robust speech analysis methods in noisy environments

    Publication Year: 2001, Page(s):103 - 106
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (292 KB) | HTML iconHTML

    Various speech analysis methods are compared from the points of view of robustness against noise. Computer simulations reveal that the most robust method is the instrumental variable method involving an input estimation technique based on thresholding the amplitude of a modified prediction error sequence View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image segmentation by edge pixel classification with maximum entropy

    Publication Year: 2001, Page(s):283 - 286
    Cited by:  Papers (2)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (316 KB) | HTML iconHTML

    Image segmentation is a process to classify image pixels into different classes according to some pre-defined criterion. An entropy based image segmentation method is proposed to segment a gray-scale image. The method starts with an arbitrary template. An index called Gray-scale Image Entropy (GIE) is employed to measure the degree of resemblance between the template and the true scene that gives ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Vision model based perceptual coding of digital images

    Publication Year: 2001, Page(s):87 - 91
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (780 KB) | HTML iconHTML

    The paper presents a perceptual coder based on the Embedded Block Coding with Optimised Truncation (EBCOT) structure which visually outperforms JPEG2000 VM 8.0 utilising mean-square-error (MSE) criterion. Furthermore, the proposed perceptual coder shows a performance comparable to or better than CVIS, the EBCOT implementation with visual masking. This performance gain is attributed to more advance... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PCR-based fair intelligent bandwidth allocation for rate adaptive video traffic

    Publication Year: 2001, Page(s):141 - 145
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (352 KB) | HTML iconHTML

    In this paper, we propose a network bandwidth sharing algorithm, Peak Cell Rate (PCR)-based Fair Intelligent Bandwidth Allocation (PFIBA) for transporting rate-adaptive video traffic using feedback, and report on its performance under a general PCR-based share policies. Through extensive simulations, we obtained following results. The PFIBA algorithm is capable of allocating bandwidth fairly for t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient bandwidth management scheme for real-time Internet applications

    Publication Year: 2001, Page(s):469 - 472
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (436 KB) | HTML iconHTML

    Differentiated services (DiffServ) has been proposed as a scalable solution for Internet QoS. Within the DiffServ architecture, premium services is a service class which is proposed for interactive real-time applications such as real-time voice and video over the Internet. In order to ensure the service quality of premium services, each DiffServ domain need to appropriately negotiate a service lev... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Object recognition by combining viewpoint invariant Fourier descriptor and convex hull

    Publication Year: 2001, Page(s):401 - 404
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (296 KB) | HTML iconHTML

    It is observed that the shape recognition process that uses global information would fail when dealing with occlusion. In this paper, an algorithm that combines the methods of viewpoint invariant Fourier descriptor and convex hull is presented for recognizing 3D planar objects by their contours. Invariants are calculated from a set of local segments extracted from the convex hull of a shape. Under... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intention-based probabilistic phrase spotting for speech understanding

    Publication Year: 2001, Page(s):99 - 102
    Cited by:  Papers (2)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (324 KB) | HTML iconHTML

    We present an approach towards probabilistic phrase spotting for evaluating a speech recognizer's utterance hypotheses for inferring the user's intention. The evaluation is done by mapping each word chain on each intention of the intention space. Therefore, we create an intention model for each intention as the basis for analysis. As the words of the speech recognizer's utterance hypotheses are as... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient decoding algorithms for Mandarin connected digit speech recognition

    Publication Year: 2001, Page(s):555 - 558
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (296 KB) | HTML iconHTML

    In this paper, a set of efficient decoding algorithms based on a speaker independent Mandarin connected digit speech recognition system is proposed. By simplifying the computation of observation probabilities and adopting an improved beam search pruning algorithm combined with duration information, the average decoding time is reduced from 0.92 second to 0.11 second per digit string while the reco... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Split vector quantization of signals in multiple transform domains with application to speech

    Publication Year: 2001, Page(s):186 - 188
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (208 KB) | HTML iconHTML

    A novel multiple transform split vector quantization (MTSVQ) scheme where each signal vector is projected into multiple transform domains is proposed. For a given transform domains representation, a codebook is designed for each equal energy subband. The coder selects codes from the domain that best represents the signal vector. Sample results using one dimensional speech signals confirm the super... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constructing colour image databases using BTC for efficient storage and effective retrieval

    Publication Year: 2001, Page(s):364 - 367
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (532 KB) | HTML iconHTML

    We introduce a simple image coding method, the block truncation coding (BTC) technique, as a novel approach to the construction of colour image databases. It is shown that BTC can not only be used to compress images, thus achieving storage efficiency, but the BTC codes can also be used directly to construct image features for effective image retrieval. From the BTC code we have developed an image ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Half-quadratic regularization, preconditioning and applications

    Publication Year: 2001, Page(s):32 - 35
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (312 KB) | HTML iconHTML

    The article addresses a wide class of image deconvolution or reconstruction situations where a sought image is recovered from degraded observed image. The sought solution is defined to be the minimizer of an objective function combining a data-fidelity term and an edge-preserving, convex regularization term. Our objective is to speed up the calculation of the solution in a wide range of situations... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Page segmentation and content classification for automatic document image processing

    Publication Year: 2001, Page(s):279 - 282
    Cited by:  Papers (2)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (316 KB) | HTML iconHTML

    Page segmentation and image content classification is an important step for automatic document image processing including mixed-type document image compression, form and check reading, and mail sorting. The authors first propose an enhanced background thinning based page segmentation approach. They then present a hierarchical approach for the classification of the segmented sub-images into one of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust speech recognition based on the second-order difference cochlear model

    Publication Year: 2001, Page(s):543 - 546
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (344 KB) | HTML iconHTML

    MFCC (Mel-Frequency Cepstral Coefficients) is a kind of traditional speech feature widely used in speech recognition. The error rate of the speech recognition algorithm using MFCC and CDHMM is known to be very low in a clean speech environment, but it increases greatly in a noisy environment, especially in the white noisy environment. We propose a new kind of speech feature called the auditory spe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • JPEG2000-based scalable reconstruction of image local regions

    Publication Year: 2001, Page(s):174 - 177
    Cited by:  Papers (2)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (384 KB) | HTML iconHTML

    JPEG2000-based scalable reconstruction of image local regions is a method making use of the property of wavelet-transform. First, we compress and encode the image with the basic algorithm of JPEG2000. Then we arrange the compressed data stream according to the zero-tree structure. Each zero-tree is corresponding to a mesh field of original image. Contents in this mesh can be reconstructed with dis... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Representation of arbitrarily-shaped image segments using wavelet basis

    Publication Year: 2001, Page(s):83 - 86
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (348 KB) | HTML iconHTML

    In object based image coding, an important issue is the efficient representation of arbitrarily shaped image segments. The paper describes a representation method using selected discrete wavelet basis. In this algorithm, the given image segment is successively approximated using 2D shape-independent discrete wavelet basis functions defined on a rectangle circumscribing the image segment. Simulatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An allocation algorithm for transporting compressed video

    Publication Year: 2001, Page(s):137 - 140
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (312 KB) | HTML iconHTML

    In this paper, we proposed a novel weight-based bandwidth allocation algorithm (WBA) for transporting compressed video traffic using feedback. Extensive simulation using a modified NIST simulator is conducted to evaluate its performance under a general weight-based share policy. Our results demonstrate that the WBA algorithm is capable of allocating bandwidth fairly for the minimum cell rate (MCR)... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.