I. Introduction
Through decades of research and development, speech recognition technology has reached higher and higher accuracy. With the development of mobile computing architecture and the popularity of smart phones, people are able to enjoy the convenience anywhere and anytime brought by speech recognition applications like Google Voice Search and Siri. Constrained to computing capability and battery lifetime of mobile phones, both of the two applications apply distributed terminal-server solutions, which means users' terminals just have to do some preprocessing work, and most computational tasks such as acoustic modeling and feature matching and text are performed on servers.