By Topic

Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on

Date 4-4 May 2001

Filter Results

Displaying Results 1 - 25 of 144
  • Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489)

    Publication Year: 2001
    Save to Project icon | Request Permissions | PDF file iconPDF (800 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 2001 , Page(s): xx - xxii
    Save to Project icon | Request Permissions | PDF file iconPDF (155 KB)  
    Freely Available from IEEE
  • An integration of data mining and data warehousing for hierarchical multimedia information retrieval

    Publication Year: 2001 , Page(s): 373 - 376
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (400 KB) |  | HTML iconHTML  

    The paper presents a new approach to multimedia information retrieval with data warehousing techniques. To tackle the key issues such as multimedia data representation, storage, integration, indexing, similarity measures, searching methods and query processing, the proposed algorithms allow one: 1) to extend the concepts of conventional data warehouse and multimedia database to multimedia data warehouse for effective data representation and storage; 2) to develop a multimedia starflake schema to integrate multiple data streams for hierarchical data representation and indexing; 3) to introduce a dynamic similarity measurement scheme based on statistical feature selection criteria; 4) to apply data aggregation techniques for decision support to speed up query processing and searching. We conclude that the proposed approach can be applied to general multimedia systems with effective data storage, retrieval and integration View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel scheme for fractal image coding

    Publication Year: 2001 , Page(s): 114 - 116
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (200 KB) |  | HTML iconHTML  

    In traditional fractal image coding schemes, domain blocks are constrained to be twice as large as range blocks in order to ensure the convergence of the iterative decoding stage. However, this constraint has limited the fractal encoder to exploit the self-similarity of the original image. In order to overcome the shortcoming, a novel scheme using same sized range and domain blocks is proposed. Experimental results show the improvement in compression ratio and image quality View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theory of discrete time SISO linear (L,M) shift invariant system

    Publication Year: 2001 , Page(s): 233 - 235
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (180 KB) |  | HTML iconHTML  

    We have characterized the discrete time single input single output (SISO) linear (L,M) shift invariant system by a two-dimensional kernel function and a filter bank structure. Based on the characterization, we have investigated the conditions for the stability, the invertibility, the causality and the finite response properties of a discrete time SISO linear (L,M) shift invariant system. The advantages of the analysis is that a linear time varying system can be analyzed and designed through a finite number of one-dimensional kernel functions and linear time invariant (LTI) filters. Hence, it facilitates the analysis and the design of a linear time varying system, such as an L/M rate changer used in digital image processing and digital video processing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PCR-based fair intelligent bandwidth allocation for rate adaptive video traffic

    Publication Year: 2001 , Page(s): 141 - 145
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB) |  | HTML iconHTML  

    In this paper, we propose a network bandwidth sharing algorithm, Peak Cell Rate (PCR)-based Fair Intelligent Bandwidth Allocation (PFIBA) for transporting rate-adaptive video traffic using feedback, and report on its performance under a general PCR-based share policies. Through extensive simulations, we obtained following results. The PFIBA algorithm is capable of allocating bandwidth fairly for the minimum cell rate (MCR) plus PCR-proportional fairness criteria among competitive rate-adaptive video sources, is capable to reallocate smoothly when there are renegotiations of the minimum guaranteed cell rate or PCR by some connections, is able to reallocate smoothly when a new connection is admitted, and is able to reallocate smoothly when a connection is throttled somewhere earlier along the connection path. Furthermore, we show the algorithm prevents congestion, especially during the initial periods when buffer queues can build up significantly View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Face recognition by wavelet domain associative memory

    Publication Year: 2001 , Page(s): 481 - 485
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB) |  | HTML iconHTML  

    We propose a face recognition scheme based on an auto-associative memory (AM) model. Two kinds of AM models are compared, namely, pseudo-inverse memory and radial basis function (RBF) network, and we found that RBF based associative memory is much more efficient. To capture substantial facial features and reduce computational complexity, we use a wavelet transform (WT) to decompose face images and choose the lowest resolution subband coefficients for face representation. Results indicate that the modular scheme yields accurate recognition on the widely used XM2VTS face database and Olivetti Research Laboratory (ORL) face database View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dual channels noise cancelling system

    Publication Year: 2001 , Page(s): 445 - 448
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (296 KB) |  | HTML iconHTML  

    To realize speaker-free speech recognition with DSP, a 2-channel recording system was developed and used to set up a speech library. The filters, including a high-pass filter, an LMS adaptive filter and a combination of these two, were adopted to separate a speaker's voice from a noisy background. The noise cancellation effectiveness of these filters was evaluated. It is worth noting that 1/f noise is shown to be a very important factor for the effectiveness of an adaptive filter. After the prior cancellation of 1/f noise at frequencies lower than 1 Hz in this study, the output signal of an adaptive filter can be improved significantly. This is of value not only for speech processing systems, but also for other adaptive filter systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Page segmentation and content classification for automatic document image processing

    Publication Year: 2001 , Page(s): 279 - 282
    Cited by:  Papers (1)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB) |  | HTML iconHTML  

    Page segmentation and image content classification is an important step for automatic document image processing including mixed-type document image compression, form and check reading, and mail sorting. The authors first propose an enhanced background thinning based page segmentation approach. They then present a hierarchical approach for the classification of the segmented sub-images into one of two categories: text and picture. The approach combines a cross-correlation method, the Kolmogorov complexity measure (A.N. Kolmogorov, 1965), and a neural network classifier in order to achieve both efficiency and high accuracy. Our approach has been tested on a number of mixed-type document images with good results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of clustering techniques to detect hand signs

    Publication Year: 2001 , Page(s): 259 - 262
    Cited by:  Papers (1)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (408 KB) |  | HTML iconHTML  

    The term multimedia has a different meaning to different communities. The computer industry uses this term to refer to a system that can display audio and video clips. Generally speaking, a multimedia system supports multiple presentation modes to convey information. Humans have five senses: sight, hearing, touch, smell and taste. In theory, a system based on this generalized definition must be able to convey information in support of all senses. This would be a step towards virtual environments that facilitate total recall of an experience. This study builds on our previous work with audio and video servers and explores haptic data in support of touch and motor skills. It investigates the use of clustering techniques to recognize hand signs using haptic data. An application of these results is communication devices for the hearing impaired View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient bandwidth management scheme for real-time Internet applications

    Publication Year: 2001 , Page(s): 469 - 472
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (436 KB) |  | HTML iconHTML  

    Differentiated services (DiffServ) has been proposed as a scalable solution for Internet QoS. Within the DiffServ architecture, premium services is a service class which is proposed for interactive real-time applications such as real-time voice and video over the Internet. In order to ensure the service quality of premium services, each DiffServ domain need to appropriately negotiate a service level agreement (SLA) with its customers and neighboring domains. Because the resources for premium service are usually a small part of the total network bandwidth, dynamic SLA negotiation is preferred to maximize the resource utilization. However, a completely dynamic SLA negotiation scheme introduces a scalability problem for the bandwidth broker (BB). We introduce the concept of the “pipe” as a viable solution that avoids the scalability problem while managing the Internet bandwidth efficiently. A threshold-based updating scheme for the pipe is used which minimizes the updating overhead for the BB while maintaining a high utilization for the pipe View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancement of fax documents using a binary angular representation

    Publication Year: 2001 , Page(s): 125 - 128
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (336 KB) |  | HTML iconHTML  

    In this paper, we explore a new approach to enhancing fax documents using a binary directional filter bank (DFB). The process of sending fax documents often results in distortions that are visible in the form of spurious point noise and ragged edges. We propose a new approach that remains in the binary domain for the entire process. Conventional directional filter banks provide representations that delineate the directional components in the text letters enabling edges and contours to be smoothed appropriately. Our binary DFB receives a binary input and outputs a binary image comprised of directional components. With proper weighting of the subbands, the synthesis section can suppress the information of the artifacts in the output image. This paper provides a description of a new binary DFB and its application to enhancing text that has been degraded by the faxing process View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • JPEG2000-based scalable reconstruction of image local regions

    Publication Year: 2001 , Page(s): 174 - 177
    Cited by:  Papers (2)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB) |  | HTML iconHTML  

    JPEG2000-based scalable reconstruction of image local regions is a method making use of the property of wavelet-transform. First, we compress and encode the image with the basic algorithm of JPEG2000. Then we arrange the compressed data stream according to the zero-tree structure. Each zero-tree is corresponding to a mesh field of original image. Contents in this mesh can be reconstructed with discrete wavelet transform (DWT) coefficients in the zero-tree. We preview the lowest resolution sub-image (LL sub-image) before searching. As the object is found, we focus on it and reconstruct contents in the aim mesh level by level in both spatial revolution and Signal-to-Noise (SNR). We only use part of the compressed data stream to reconstruct image local regions. It can save much calculation resource. In the end of this paper, an experimental case of JPEG2000-based scalable reconstruction of image local regions is given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Motion-based classification of cartoons

    Publication Year: 2001 , Page(s): 146 - 149
    Cited by:  Papers (5)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (420 KB) |  | HTML iconHTML  

    This paper describes a simple high-level classification of multimedia broadcast material into cartoon non-cartoon. The input video sequences are from a broad range of material which is representative of entertainment viewing. Classification of this type of high-level video genre is difficult because of its large inter-class variation. The task is made more difficult when classification is over a small time (10's of seconds) introducing a great deal of intra-class variation. This paper presents a purely dynamic based approach for content-based classification of video sequences in the form of a new global motion measure of foreground objects. Experiments are reported on a diverse database consisting of: 8 cartoon and 20 non-cartoon sequences. Results are shown in identification error rates against time of sequence used for classification. The system produces a best identification error rate of 3% on 66 separate decisions based on 23 second sequences trained using a total of ~20 minutes of video View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intention-based probabilistic phrase spotting for speech understanding

    Publication Year: 2001 , Page(s): 99 - 102
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (324 KB) |  | HTML iconHTML  

    We present an approach towards probabilistic phrase spotting for evaluating a speech recognizer's utterance hypotheses for inferring the user's intention. The evaluation is done by mapping each word chain on each intention of the intention space. Therefore, we create an intention model for each intention as the basis for analysis. As the words of the speech recognizer's utterance hypotheses are assigned confidence levels, we treat these inputs as uncertain observations. We use Bayesian belief networks as a mathematical foundation for intention modelling and probability theory for evaluating such word chains. The algorithm considers syntactical and semantical relations between the words within a phrase, evaluating words regarding previously observed words of the current phrase View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Morphological filter based noise removal from vibration signals of fighter plane

    Publication Year: 2001 , Page(s): 251 - 254
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (228 KB) |  | HTML iconHTML  

    The paper solves the pretreatment problem of the vibration signal of a certain fighter plane's engine. The engine's vibration signal is mainly affected by noise from the plane's subsystems and power supply instruments on the ground. The basic frequency of the vibration signal is a little smaller than the frequency of the disturbing signal when the engine operates in the throttling state, the maximum state and the accelerating state, so traditional linear filters such as low-pass filters cannot adequately remove the disturbing signal. We introduce the mixed nonlinear morphological filter with a moving average filter to remove the noise. This method is compared with the moving average filter and a median filter. Test results show that the proposed algorithm can remove the noise with less distortion View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structure of the Internet?

    Publication Year: 2001 , Page(s): 449 - 452
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB) |  | HTML iconHTML  

    We consider a major component in the design of an Internet search engine, viz., how the relevance of a Web page can be determined. A number of methods are described. A number of design issues related to search engines are also discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image segmentation by edge pixel classification with maximum entropy

    Publication Year: 2001 , Page(s): 283 - 286
    Cited by:  Papers (1)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB) |  | HTML iconHTML  

    Image segmentation is a process to classify image pixels into different classes according to some pre-defined criterion. An entropy based image segmentation method is proposed to segment a gray-scale image. The method starts with an arbitrary template. An index called Gray-scale Image Entropy (GIE) is employed to measure the degree of resemblance between the template and the true scene that gives rise to the gray-scale image. The classification status of the edge pixels in the template is modified in such a way as to maximize the GIE. By repeatedly processing all the edge pixels until a termination condition is met, the template would be changed to a configuration that closely resembles the true scene. This optimum template (in an entropy sense) is taken to be the desired segmented image. Investigation results from simulation study and the segmentation of practical images demonstrate the feasibility of the proposed method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effect of channel quality estimation error on the performance of interactive mobile video system

    Publication Year: 2001 , Page(s): 312 - 315
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB) |  | HTML iconHTML  

    Adaptive modulation is an effective scheme to observably improve the performance of interactive video data transmission over mobile wireless channels. However, its effectiveness is greatly affected by the accuracy of the channel quality estimation. With the measurements of extra bit error rate cost (EBC) and potential theoretical channel capacity loss (PTCCL), we analyze the effect of channel quality estimation error on the performance of the interactive mobile video system using an adaptive modulation scheme View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Retrieving faces using adaptive subspace self-organising map

    Publication Year: 2001 , Page(s): 377 - 380
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB) |  | HTML iconHTML  

    We present the adaptive manifold self-organising map (AMSOM) for a face retrieval system. Our experimental results show that it has an excellent potential for face retrieval applications. As compared to the more traditional sub-space self-organising map, the results in many cases are better View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interactive emotional response computation for scriptable multimedia actors

    Publication Year: 2001 , Page(s): 473 - 476
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (468 KB) |  | HTML iconHTML  

    Although modern computer graphics and animation are cable of producing near-realistic 3D images of virtual characters, the component of work that needs to be done by animators and artists is quite significant. A virtual actor framework developed by us aids animators in automating the modelling and animation of emotive virtual human heads with visual speech and gestures. We present the `situation processor' component of this system that uses the OCC (Ortony, Clore and Collins) cognitive-emotional theory to intelligently query a user to determine the emotional response of a synthetic character in a movie scene. The use of our system significantly reduces the workload of animators View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theory and experiment analysis of disparity for stereoscopic image pairs

    Publication Year: 2001 , Page(s): 68 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB) |  | HTML iconHTML  

    Disparity is the geometrical difference between stereoscopic pairs; a pair of images of the same scene acquired from slightly different perspectives. The authors give a comprehensive analysis of the statistical characteristics of disparity. Based on experimentation, they discuss the relations between disparity, depth and object, the relation between block size and disparity estimation, and the influence of error criteria on disparity estimation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A self-organizing evolving algorithm combined with a transient chaotic neural network

    Publication Year: 2001 , Page(s): 239 - 242
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (296 KB) |  | HTML iconHTML  

    A new multi-stage self-organizing channel assignment algorithm combined with a transiently chaotic neural network (TCNN) is proposed. The performance is greatly improved by progressively initializing a mutual inhibition technique that is based on the mechanisms of bristle differentiation. The simulation results show our proposed algorithm improves greatly performance, while the convergence rate iteration numbers are comparable to existing algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Object recognition by combining viewpoint invariant Fourier descriptor and convex hull

    Publication Year: 2001 , Page(s): 401 - 404
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (296 KB) |  | HTML iconHTML  

    It is observed that the shape recognition process that uses global information would fail when dealing with occlusion. In this paper, an algorithm that combines the methods of viewpoint invariant Fourier descriptor and convex hull is presented for recognizing 3D planar objects by their contours. Invariants are calculated from a set of local segments extracted from the convex hull of a shape. Under such approach, an object is represented by sets of invariant points instead of a single point in a 2D parameter space of I1 and I2. The method is efficient and yields a high recognition rate in recognizing partially occluded objects. Classification can be carried out correctly even when the convex hull of the object has changed as a result of occlusion View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-rate hybrid codec and its performance evaluation for image compression

    Publication Year: 2001 , Page(s): 178 - 181
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB) |  | HTML iconHTML  

    A new multi-rate hybrid codec algorithm, MRHC, is proposed in this paper. It uses the embedded code stream of the wavelet image and combines both the source coding and the channel coding into one procedure. By applying puncture factor to a convolution code, the error correction codes can also be implemented while the image compression is carried out. This method can optimize the rate allocation and greatly enhance the wavelet image resilience to channel errors View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.