By Topic

Intelligent Multimedia, Video and Speech Processing, 2004. Proceedings of 2004 International Symposium on

Date 20-22 Oct. 2004

Filter Results

Displaying Results 1 - 25 of 207
  • Fingerprint recognition with improved wavelet domain features

    Page(s): 33 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (871 KB) |  | HTML iconHTML  

    A fingerprint recognition algorithm based on the wavelet domain features of a fingerprint image is proposed. Critical wavelet coefficients are selected to form a feature vector of the fingerprint. As compared with a recently reported algorithm using a similar approach, the proposed algorithm is superior in terms of both recognition rate and computational complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tiling artifact reduction based on max-lift wavelet filter for Motion JPEG2000 at low bit-rate

    Page(s): 611 - 614
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (964 KB) |  | HTML iconHTML  

    Using a JPEG2000 image sequence at low bit rate as video source, named Motion JPEG2000, is very promising and interesting. However, JPEG2000 also causes higher complexity of implementation. In practice, in JEPG2000 coding systems, an image is segmented into serial tiles and each tile is compressed or transformed independently, thus creating tiling artifacts in the tile boundary. At low bit rate, tiling artifacts in JPEG2000 images are annoying. This paper introduces a new post-processing method to reduce this artifact, where max-lift wavelet subband decomposition was used and then subband coefficients were adaptively filtered and soft-thresholded. Experiments showed that the post-processing was effective to obviously enhance the visual quality of the JPEG2000 image at low bit rate. This helps improve subjective quality of the Motion JPEG2000 video. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Facial feature detection with structure constraints

    Page(s): 121 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (900 KB) |  | HTML iconHTML  

    Human facial feature extraction is an important step in many facial image interpretation tasks. In this paper, an algorithm for detecting facial features is proposed, in which corner-based facial feature detectors are coupled with a statistical model of the spatial arrangement of facial features to yield a robust performance. Experimental results show that the proposed method is robust and can extract facial features with good accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ICT fixed point error performance analysis

    Page(s): 294 - 297
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (868 KB) |  | HTML iconHTML  

    The DCT is the most popular transform in video coding. However, two recently developed video coding standards, H.264 and AVS, adopt order-4 and order-8 integer cosine transforms (ICT) respectively. The ICTs adopted in these standards have simpler implementation but their compaction ability is similar to that of DCT. The two ICTs are simpler because they can be implemented using multiplication of small integers while the DCT requires multiplication of irrational numbers. In this paper, the fixed-point error performance of the DCT and the ICT, adopted by the AVS, are analyzed and compared. Both theoretical and experimental results show that the ICT system produces more accurate pixels than the DCT system when the transform coefficients are represented using the same number of bits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temporal hidden Markov models

    Page(s): 137 - 140
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (892 KB) |  | HTML iconHTML  

    The hidden Markov model (HMM) is a double stochastic process. The observable process produces a sequence of observations and the hidden process is a Markov process. The HMM assumes that the occurrence of one observation is statistically independent of the occurrence of the others. To avoid this limitation, a temporal HMM is proposed. The hidden process in the temporal HMM is the same, but the observable process is now a Markov process. Each observation in the training sequence is assumed to be statistically dependent on its predecessor, and codewords or Gaussian components are used as states in the observable Markov process. Speaker identification experiments performed on 138 Gaussian mixture speaker models in the YOHO database shows a better performance for the temporal HMM compared to the standard HMM. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New approach to multiple regions of interest image coding using IWT and bitplane-classification shift

    Page(s): 390 - 393
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (954 KB) |  | HTML iconHTML  

    Regions of interest (ROI) image coding allows ROIs of the image which are of higher importance than the background (BG). In this paper, a new approach for multiple ROI coding, called bitplane classification shift (BC-shift), is presented. In this method, all the bitplanes of the image coefficients are composed of three parts - most significant ROI bitplanes, general significant ROIs and BG bitplanes, and least significant BG bitplanes. For a single ROI, the new method can encode an ROI with any quality by up-shifting the most significant ROI bitplanes and downshifting the least significant BG bitplanes. When multiple regions of interest are encoded, it can encode and transmit a certain number bitplanes of ROIs according to different degrees of interest. The experiments show that the new method, in addition to having a small complexity and bitrate overhead, can handle multiple ROIs of arbitrary shapes, without coding shape information, flexibly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A knowledge-based mediator for dynamic integration of heterogeneous multimedia information sources

    Page(s): 467 - 470
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (950 KB) |  | HTML iconHTML  

    To facilitate the efficient and accurate access and operation of a huge amount of multimedia information from various data sources, we propose a mediator to integrate heterogeneous multimedia information sources and their corresponding data operation API dynamically. To achieve this purpose, a reflection technique is adopted to extract heterogeneous API features at run time; a classifier component is designed to obtain global API features; and an ontology mechanism is used for hiding information heterogeneity. Ultimately, by using the proposed mediator, a global knowledge-based multimedia information API is produced, which makes multimedia applications and Web applications "understand" and operate heterogeneous information automatically and transparently. The proposed mediator provides multimedia applications with a unified interface for information operation and facilitates communication and interoperation between disparate multimedia applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Texture segmentation using local morphological multifractal exponents

    Page(s): 438 - 441
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (922 KB) |  | HTML iconHTML  

    This paper deals with the problem of segmenting various textures. For this purpose, we have applied mathematical morphology for the multifractal analysis of images. The digital gray level image is treated as a 3D surface whose multifractal measures are calculated by performing dilations on this surface. Plotting the acquired measures against the size of the structuring element, the local morphological multifractal exponents can be estimated, based on which the unsupervised fuzzy C-means clustering method is used to segment a texture image into the desired number of classes. Randomly choosing 12 natural textures from the Brodatz album, 66 mosaics of 2 textures and 495 mosaics of 4 textures are used to test the new segmentation approach and other two techniques, where the multifractal features are extracted by the box-counting based methods. The comparison results demonstrate that the proposed approach can differentiate texture images more effectively and provide more robust segmentation results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic e-movie creation of 3D animation and video retrieval

    Page(s): 129 - 132
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1024 KB) |  | HTML iconHTML  

    This paper describes a software system called EMM (electronic moviemaker) designed to visualize the user's input screenplay words by a sound motion picture with the effects of real images, 3D animation, or their composition. A virtual director achieves the user's intentions by a knowledge-based (KB) approach through setting a scene, determining the corresponding shot types and shot sequence, and planning virtual camerawork dependent on the cinematic expertise stored in a KB domain, where real images are extracted from digital video by applying advanced content-based retrieval techniques. Animation generation is automated by interpreting textual screenplay into the TVML language, and the resultant movie is showed on a TVML player. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal key frame representation for video shot retrieval

    Page(s): 270 - 273
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (940 KB) |  | HTML iconHTML  

    In this paper, we propose an optimal representation scheme to construct an optimal key frame based on global statistics for video retrieval. Each pixel in this optimal key frame is constructed by considering the probability of occurrence of those pixels at the corresponding pixel position among the frames in a video shot. Therefore, this constructed key frame is called the "temporally maximum occurrence frame" (TMOF), which is an optimal representation of all the frames in a video shot. The retrieval performance of this representation scheme is further improved by considering the k pixel values with the largest probabilities of occurrence and the highest peaks at each pixel position for a video shot. The corresponding schemes are called k-TMOF and k-pTMOF, respectively. These key frame representation schemes are compared to other histogram-based techniques for video shot representation and retrieval. Experimental results show that our proposed representations outperform the alpha-trimmed average histogram for video retrieval. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new still image watermark embedding algorithm based on fractal transformation

    Page(s): 627 - 630
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB) |  | HTML iconHTML  

    Digital watermarking technology is an effective solution to protect copyright of multimedia data such as still images, video and audio. In this paper, a new still image watermark embedding algorithm based on fractal transformation is proposed. The watermark information is embedded into the host image through directly modifying the contrast factor which is one of the fractal transformation parameters. The experimental results show that the effectiveness and robustness of the algorithm can survive image processing such as JPEG compression, low-pass filter, Gaussian, salt and pepper noise, cropping etc. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A computerized plant species recognition system

    Page(s): 723 - 726
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (923 KB) |  | HTML iconHTML  

    In this paper, a computerized plant species recognition system (CPSRS) is presented. CPSRS is a Web-based application, which provides a familiar and efficient way to search and identify plant species in the field. It is built on the Java Web infrastructure to support platform-independent application. The Java applets and servlets are adopted to balance the computing burden in both client and server. The architecture of CPSRS is introduced to show how it is designed and works. Two types of plant species retrieval methods, text-based information retrieval and content-based leaf retrieval are discussed. With the text-based information retrieval method, the exact information of plant species is retrieved from the database according to the input searching criteria. For the content-based leaf retrieval, experimental results show that a recall rate of about 71.4% can be achieved when top five returned images are considered. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Vulnerability of speaker verification to voice mimicking

    Page(s): 145 - 148
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (907 KB) |  | HTML iconHTML  

    We consider mimicry, a simple technology form of attack requiring a low level of expertise, to investigate whether a speaker recognition system is vulnerable to mimicry by an impostor without using the assistance of any other technologies. Experiments on 138 speakers in the YOHO database and two people who played a role as imitators have shown that an impostor can attack the system if that impostor knows a registered speaker in the database who has very similar voice to the impostor's voice. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Region-based fractal coding of stereo video sequences with quadtree-based disparity compensation

    Page(s): 410 - 413
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (968 KB) |  | HTML iconHTML  

    In order to realize the efficient and economical transmission/storage of stereo video sequences and also the region-based functionality of MPEG-4, we propose a new stereo video sequence compression scheme which uses fractal coding and encodes each region independently by a prior image segmentation map (alpha plane) which is exactly the same as in MPEG-4. We encode the first n frames of the right video sequence as a "set" using the circular prediction mapping (CPM) method and encoding the remaining frames using the non contractive interframe mapping (NCIM) method. The CPM and NCIM methods accomplish the motion estimation/compensation which can exploit the high temporal correlations between adjacent frames of the right sequence; at the same time, we exploit the spatial correlations between the left and right frames by searching for similar blocks using quadtree-based disparity estimation/compensation. The experimental results indicate that we get a high compression ratio and also good PSNR. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech enhancement for adaptive microphone arrays via a blocking matrix using leaky adaptive filters

    Page(s): 9 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (926 KB) |  | HTML iconHTML  

    This paper presents a new broadband adaptive beamformer structure specifically applicable to microphone arrays. The proposed beamformer structure contains a blocking matrix (BM) that uses leaky adaptive filters (LAF) to limit the attenuation of the noise signal at the output of the BM while maintaining high target signal cancellation in the look direction. Simulation results show that the interference suppression of this proposed beamformer can be as much as 10 dB better than the beamformer in Goh et al. (2003), and it still superior by 5 dB when the signal-to-noise ratio (SNR) is as low as -5 dB. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Voice conversion algorithm using phoneme Gaussian mixture model

    Page(s): 5 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (922 KB) |  | HTML iconHTML  

    This paper presents a new voice conversion algorithm which modifies the utterance of a source speaker to sound like speech from a target speaker. Our method uses speech models based on phoneme units of speech, which finds accurate alignments between source and target speaker utterances. Using the alignments, vocal tract and glottal excitation characteristics are mapped across speakers. Objective and subjective tests suggest that convincing voice conversion is achieved while maintaining high speech quality, which is comparable to other frame-based approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ORAN: a basis for an Arabic OCR system

    Page(s): 703 - 706
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (910 KB) |  | HTML iconHTML  

    We present a system called ORAN (offline recognition of Arabic characters and numerals). This system is based on a method called modified MCR (minimum covering run) expression for document images. Using the correspondence between binary images and bipartite graphs, the MCR expression can be found by constructing a minimum covering or maximum matching in the corresponding graph. We use the structural information obtained from this expression to describe the character strokes according to some extracted features. These are obtained after a zoning scheme, where the baseline is detected and the line of text divided into four zones. Reference prototypes for the system are built according to a structural description of characters in some model documents. By this method, we overcome the problem of segmentation that is inherent to Arabic characters, even when they are machine printed or typed. Simple matching of the candidate characters to reference prototypes is performed. A recognition rate of more than 97% is achieved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast motion estimation for wavelet-based video coding

    Page(s): 398 - 401
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (964 KB) |  | HTML iconHTML  

    Multi-resolution motion estimation (MRME) has to be done in the wavelet domain for wavelet-based video coding. MRME tries to use the motion vectors obtained in the highest resolution level as an initial estimate and performs refinement in the remaining subbands in the lower levels. Based on a simple analysis, we have found that the pixel matching errors with similar magnitudes tend to appear in clusters for natural video sequences in the spatial domain and wavelet domain. In this paper, we report that the clustering property also exists in the wavelet domain which can significantly improve the efficiency of the partial distortion search (PDS) algorithm. This clustering characteristic appears in the hierarchical structure of the wavelet pyramid. A new wavelet based motion estimation technique is proposed, which makes use of the hierarchical structure of the wavelet pyramid into a new clustered pixel matching error partial distortion search (CPME-PDS). The computational time can be reduced substantially. This proposed algorithm is suitable for multi-resolution applications including DTV, HDTV and mobile videos standards. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wavelet domain image restoration using adaptively regularized constrained total least squares

    Page(s): 567 - 570
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (950 KB) |  | HTML iconHTML  

    It is know that the adaptively regularized constrained total least squares (ARCTLS) outperforms the constrained total least squares (CTLS) or the regularized CTLS in image restoration. In this paper, a wavelet domain image restoration approach, using ARCTLS is presented. The degraded image is first decomposed into subband images by using a wavelet transform. The ARCTLS algorithm is then applied to each of the subband images. The proposed restoration method is simulated and compared with the conventional RCTLS and ARCTLS techniques. It is shown that the proposed method yields a better improved signal-to-noise ratio (SNR). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A chaotic neural network for the attributed relational graph matching problem in pattern recognition

    Page(s): 695 - 698
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (889 KB) |  | HTML iconHTML  

    We propose a new algorithm based on a chaotic neural network to solve the attributed relational graph matching problem, which is an NP-hard problem of prominent importance in pattern recognition research. From some detailed analyses, we reach the conclusion that, unlike the conventional Hopfield neural networks for the attributed relational graph matching problem, the chaotic neural network can avoid getting stuck in local minima and thus yield excellent solutions. Experimental results also verify that this algorithm provides a more effective approach than many other heuristic algorithms for the attributed relational graph matching problem and thus has a profound application potential in pattern recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image analysis using anisotropic local contrast

    Page(s): 506 - 509
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (972 KB) |  | HTML iconHTML  

    This article describes a contrast measure inspired by findings from human vision. It takes into account the frequency and orientation selectivity of cortical cells in the mammalian visual system. The input image is decomposed into different spatial frequency bands and orientations using Gabor filters. Then, a local directional contrast measure is computed for each spatial frequency band by introducing a competition mechanism among the different orientations. The visual information conveyed by this new contrast measure, called local directional bandlimited contrast (LDBC), has been analyzed on real and synthetic images. Results based on a limited set of images are presented which confirm the consistency of this new measure with the subjective perception of contrasts in complex natural images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Audio watermarking for copyright protection based on psychoacoustic model and adaptive wavelet packet decomposition

    Page(s): 282 - 285
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (883 KB) |  | HTML iconHTML  

    A novel audio watermarking scheme for copyright protection is proposed. By modifying the relationship between adjacent coefficients, the watermark message is embedded in the wavelet packet domain. The embedding procedure is controlled by a psychoacoustic model to guarantee inaudibility. The new method combines the psychoacoustic model with wavelet packets to choose the best basis of wavelet packet decomposition such that the decomposed subbands are closer to critical bands. Therefore, the masking threshold in the new approach is larger than those proposed previously. In this way, the proposed scheme enlarges the range for the watermark to resist attacks and it becomes more robust. Experimental results show that the proposed scheme can effectively resist a variety of signal manipulations and attacks, including MPEG compression, low-pass filtering, resampling, adding white or random noise, etc. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multichannel lattice structure for adaptive noise cancellation [speech processing applications]

    Page(s): 326 - 329
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (901 KB) |  | HTML iconHTML  

    In this paper, a multichannel adaptive lattice structure (MCLS) for canceling the noise over a nonlinear transmission channel is presented. The presented structure applies to situations in which the reference signal and noisy primary signal are collected simultaneously. The coefficients of a multichannel multiple regression transversal filter are modified adaptively according to the backward prediction error vectors generated from the multichannel adaptive lattice predictors. This multichannel adaptive noise cancellation procedure involves the NLMS adaptive algorithm. The performance of MCLS for different type of transmission channels, different type of reference inputs, and different type of noise-free primary inputs is examine analytically. The new approach is experimentally shown to have better noise cancellation performance than the existing single-channel adaptive noise cancellation algorithm over a nonlinear transmission channel, especially in a low input SNR situation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Smooth ergodic hidden Markov model and its applications in text to speech systems

    Page(s): 234 - 237
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (914 KB) |  | HTML iconHTML  

    In text-to-speech systems, the accuracy of information extraction from text is crucial in producing high quality synthesized speech. In this paper, a new scheme for converting text into its equivalent phonetic spelling is proposed and developed. This method has many advantages over its predecessors and it can complement many other text to speech converting systems in order to get improved performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wavelet based speech enhancement using a new thresholding algorithm

    Page(s): 238 - 241
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (839 KB) |  | HTML iconHTML  

    In this paper, wavelet based speech enhancement is considered. In this method, the noisy signal transfers to the wavelet domain. Then, by applying a thresholding algorithm, the wavelet coefficients can be improved. Finally, using the inverse wavelet transform, the enhanced signal can be obtained. We have presented a new method for the thresholding algorithm. We compared our algorithm with some of the most popular methods by several experimental tests. Both subjective and objective tests show better performance and achievement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.