By Topic

An FPGA-Based Embedded Robust Speech Recognition System Designed by Combining Empirical Mode Decomposition and a Genetic Algorithm

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Shing-Tai Pan ; Dept. of Comput. Sci. & Inf. Eng., Nat. Univ. of Kaohsiung, Kaohsiung, Taiwan ; Xu-Yu Li

A field-programmable gate array (FPGA)-based robust speech measurement and recognition system is the focus of this paper, and the environmental noise problem is its main concern. To accelerate the recognition speed of the FPGA-based speech recognition system, the discrete hidden Markov model is used here to lessen the computation burden inherent in speech recognition. Furthermore, the empirical mode decomposition is used to decompose the measured speech signal contaminated by noise into several intrinsic mode functions (IMFs). The IMFs are then weighted and summed to reconstruct the original clean speech signal. Unlike previous research, in which IMFs were selected by trial and error for specific applications, the weights for each IMF are designed by the genetic algorithm to obtain an optimal solution. The experimental results in this paper reveal that this method achieves a better speech recognition rate for speech subject to various environmental noises. Moreover, this paper also explores the hardware realization of the designed speech measurement and recognition systems on an FPGA-based embedded system with the System-On-a-Chip (SOC) architecture. Since the central-processing-unit core adopted in the SOC has limited computation ability, this paper uses the integer fast Fourier transform (FFT) to replace the floating-point FFT to speed up the computation for capturing speech features through a mel-frequency cepstrum coefficient. The result is a significant reduction in the calculation time without influencing the speech recognition rate. It can be seen from the experiments in this paper that the performance of the implemented hardware is significantly better than that of existing research.

Published in:

Instrumentation and Measurement, IEEE Transactions on  (Volume:61 ,  Issue: 9 )