End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Tuesday, 8 April, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC). During this time, there may be intermittent impact on performance. We apologize for any inconvenience.

End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus


Abstract:

Automatic speech recognition (ASR) has been significantly improved in the past years. However, most robust ASR systems are based on air-conducted (AC) speech, and their p...Show More

Abstract:

Automatic speech recognition (ASR) has been significantly improved in the past years. However, most robust ASR systems are based on air-conducted (AC) speech, and their performances in low signal-to-noise-ratio (SNR) conditions are not satisfactory. Bone-conducted (BC) speech is intrinsically insensitive to environmental noise, and therefore can be used as an auxiliary source for improving the performance of an ASR at low SNR. In this paper, we first develop a multi-modal Mandarin corpus, which contains air- and bone-conducted synchronized speech (ABCS). The multi-modal speeches are recorded with a headset equipped with both AC and BC microphones. To our knowledge, it is by far the largest corpus for conducting bone conduction ASR research. Then, we propose a multi-modal conformer ASR system based on a novel multi-modal transducer (MMT). The proposed system extracts semantic embeddings from the AC and BC speech signals by a conformer-based encoder and a transformer-based truncated decoder. The semantic embeddings of the two speech sources are fused dynamically with adaptive weights by the MMT module. Experimental results demonstrate the proposed multi-modal system outperforms single-modal systems with either AC or BC modality and multi-modal baseline system by a large margin at various SNR levels. It also shows the two modalities complement with each other, and our method can effectively utilize the complementary information of different sources.
Page(s): 513 - 524
Date of Publication: 23 November 2022

ISSN Information:

Funding Agency:

Author image of Mou Wang
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Mou Wang (Graduate Student Member, IEEE) received the B.S. degree in electronics and information engineering in 2016 from Northwestern Polytechnical University, Xi'an, China, where he is currently working the Ph.D. degree in information and communication engineering. From 2021 to 2022, he was a Visiting Ph.D. Student with the National University of Singapore, Singapore. His research interests include machine learning and ...Show More
Mou Wang (Graduate Student Member, IEEE) received the B.S. degree in electronics and information engineering in 2016 from Northwestern Polytechnical University, Xi'an, China, where he is currently working the Ph.D. degree in information and communication engineering. From 2021 to 2022, he was a Visiting Ph.D. Student with the National University of Singapore, Singapore. His research interests include machine learning and ...View more
Author image of Junqi Chen
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Junqi Chen (Graduate Student Member, IEEE) received the B.S. degree in detection guidance and control technology in 2020 from Northwestern Polytechnical University, Xi'an, China, where he is currently working toward the M.S. degree in signal and information processing. His research interests include deep learning and speech recognition.
Junqi Chen (Graduate Student Member, IEEE) received the B.S. degree in detection guidance and control technology in 2020 from Northwestern Polytechnical University, Xi'an, China, where he is currently working toward the M.S. degree in signal and information processing. His research interests include deep learning and speech recognition.View more
Author image of Xiao-Lei Zhang
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Xiao-Lei Zhang (Senior Member, IEEE) received the Ph.D. degree in information and communication engineering from Tsinghua University, Beijing, China, in 2012. He is currently a Full Professor with the School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China. He was a Postdoctoral Researcher with Perception and Neurodynamics Laboratory, The Ohio State University, Columbus, OH, USA. His r...Show More
Xiao-Lei Zhang (Senior Member, IEEE) received the Ph.D. degree in information and communication engineering from Tsinghua University, Beijing, China, in 2012. He is currently a Full Professor with the School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China. He was a Postdoctoral Researcher with Perception and Neurodynamics Laboratory, The Ohio State University, Columbus, OH, USA. His r...View more
Author image of Susanto Rahardja
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Singapore Institute of Technology, Singapore
Susanto Rahardja (Fellow, IEEE) received the Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore. He is currently a Professor with the Singapore Institute of Technology, Singapore, and a Ph.D. Advisor with Northwestern Polytechnical University, Xi'an, China. He also held other Visiting Professor appointments with several universities including University of Malaya, Kuala ...Show More
Susanto Rahardja (Fellow, IEEE) received the Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore. He is currently a Professor with the Singapore Institute of Technology, Singapore, and a Ph.D. Advisor with Northwestern Polytechnical University, Xi'an, China. He also held other Visiting Professor appointments with several universities including University of Malaya, Kuala ...View more

Author image of Mou Wang
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Mou Wang (Graduate Student Member, IEEE) received the B.S. degree in electronics and information engineering in 2016 from Northwestern Polytechnical University, Xi'an, China, where he is currently working the Ph.D. degree in information and communication engineering. From 2021 to 2022, he was a Visiting Ph.D. Student with the National University of Singapore, Singapore. His research interests include machine learning and speech signal processing. He was the recipient of the Excellent Paper Award from International Conference on Ubi-Media Computing and Workshops in 2019. He was awarded Outstanding Reviewer of IEEE Transactions on Multimedia in 2022.
Mou Wang (Graduate Student Member, IEEE) received the B.S. degree in electronics and information engineering in 2016 from Northwestern Polytechnical University, Xi'an, China, where he is currently working the Ph.D. degree in information and communication engineering. From 2021 to 2022, he was a Visiting Ph.D. Student with the National University of Singapore, Singapore. His research interests include machine learning and speech signal processing. He was the recipient of the Excellent Paper Award from International Conference on Ubi-Media Computing and Workshops in 2019. He was awarded Outstanding Reviewer of IEEE Transactions on Multimedia in 2022.View more
Author image of Junqi Chen
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Junqi Chen (Graduate Student Member, IEEE) received the B.S. degree in detection guidance and control technology in 2020 from Northwestern Polytechnical University, Xi'an, China, where he is currently working toward the M.S. degree in signal and information processing. His research interests include deep learning and speech recognition.
Junqi Chen (Graduate Student Member, IEEE) received the B.S. degree in detection guidance and control technology in 2020 from Northwestern Polytechnical University, Xi'an, China, where he is currently working toward the M.S. degree in signal and information processing. His research interests include deep learning and speech recognition.View more
Author image of Xiao-Lei Zhang
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Xiao-Lei Zhang (Senior Member, IEEE) received the Ph.D. degree in information and communication engineering from Tsinghua University, Beijing, China, in 2012. He is currently a Full Professor with the School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China. He was a Postdoctoral Researcher with Perception and Neurodynamics Laboratory, The Ohio State University, Columbus, OH, USA. His research interests include speech processing, machine learning, statistical signal processing, and artificial intelligence. He is a Member of SPS and ISCA.
Xiao-Lei Zhang (Senior Member, IEEE) received the Ph.D. degree in information and communication engineering from Tsinghua University, Beijing, China, in 2012. He is currently a Full Professor with the School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China. He was a Postdoctoral Researcher with Perception and Neurodynamics Laboratory, The Ohio State University, Columbus, OH, USA. His research interests include speech processing, machine learning, statistical signal processing, and artificial intelligence. He is a Member of SPS and ISCA.View more
Author image of Susanto Rahardja
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
Singapore Institute of Technology, Singapore
Susanto Rahardja (Fellow, IEEE) received the Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore. He is currently a Professor with the Singapore Institute of Technology, Singapore, and a Ph.D. Advisor with Northwestern Polytechnical University, Xi'an, China. He also held other Visiting Professor appointments with several universities including University of Malaya, Kuala Lumpur, Malaysia, University of Eastern Finland, Kuopio, Finland, Zhejiang University, Zhejiang University. He has more than 350 papers and 70 patents worldwide out of which 15 are U.S. patents. His research interests include multimedia, signal processing, wireless communications, discrete transforms, machine learning and signal processing algorithms and implementation. He contributed to the development of a series of audio compression technologies such as Audio Video Standards AVS-L, AVS-2 and ISO/IEC 14496-3:2005/Amd.2:2006, ISO/IEC 14496-3:2005/Amd.3:2006 in which some have been licensed worldwide. He was past Associate Editor for IEEE Transactions on Audio, Speech and Language Processing and IEEE Transactions on Multimedia, past Senior Editor of theIEEE Journal of Selected Topics in Signal Processing, and is currently an Associate Editor for the Elsevier Journal of Visual Communication and Image Representation, IEEE Transactions on Multimedia and IEEE Transactions on Industrial Electronics. He was the Conference Chair of 5th ACM SIGGRAPHASIA in 2012 and APSIPA 2nd Summit and Conference in 2010 and 2018 and other conferences in ACM, SPIE and IEEE. Dr Rahardja was the recipient of several honors including the IEE Hartree Premium Award, the Tan Kah Kee Young Inventors' Open Category Gold award, the Singapore National Technology Award, A*STAR Most Inspiring Mentor Award, Finalist of the 2010 World Technology & Summit Award, the Nokia Foundation Visiting Professor Award, the ACM Recognition of Service Award and the Thousand Talent Plan of People's Republic of China under Foreign Expert category. Professor Rahardja is a Fellow of the Academy of Engineering, Singapore.
Susanto Rahardja (Fellow, IEEE) received the Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore. He is currently a Professor with the Singapore Institute of Technology, Singapore, and a Ph.D. Advisor with Northwestern Polytechnical University, Xi'an, China. He also held other Visiting Professor appointments with several universities including University of Malaya, Kuala Lumpur, Malaysia, University of Eastern Finland, Kuopio, Finland, Zhejiang University, Zhejiang University. He has more than 350 papers and 70 patents worldwide out of which 15 are U.S. patents. His research interests include multimedia, signal processing, wireless communications, discrete transforms, machine learning and signal processing algorithms and implementation. He contributed to the development of a series of audio compression technologies such as Audio Video Standards AVS-L, AVS-2 and ISO/IEC 14496-3:2005/Amd.2:2006, ISO/IEC 14496-3:2005/Amd.3:2006 in which some have been licensed worldwide. He was past Associate Editor for IEEE Transactions on Audio, Speech and Language Processing and IEEE Transactions on Multimedia, past Senior Editor of theIEEE Journal of Selected Topics in Signal Processing, and is currently an Associate Editor for the Elsevier Journal of Visual Communication and Image Representation, IEEE Transactions on Multimedia and IEEE Transactions on Industrial Electronics. He was the Conference Chair of 5th ACM SIGGRAPHASIA in 2012 and APSIPA 2nd Summit and Conference in 2010 and 2018 and other conferences in ACM, SPIE and IEEE. Dr Rahardja was the recipient of several honors including the IEE Hartree Premium Award, the Tan Kah Kee Young Inventors' Open Category Gold award, the Singapore National Technology Award, A*STAR Most Inspiring Mentor Award, Finalist of the 2010 World Technology & Summit Award, the Nokia Foundation Visiting Professor Award, the ACM Recognition of Service Award and the Thousand Talent Plan of People's Republic of China under Foreign Expert category. Professor Rahardja is a Fellow of the Academy of Engineering, Singapore.View more

Contact IEEE to Subscribe

References

References is not available for this document.