Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering | IEEE Journals & Magazine | IEEE Xplore

Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering


Abstract:

Multichannel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and nontarget or noise sources fo...Show More

Abstract:

Multichannel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and nontarget or noise sources for signal enhancement. However, the textbook solutions for optimal data-dependent spatial filtering rest on the knowledge of second-order statistical moments of the signals, which have traditionally been difficult to acquire. In this contribution, we compare model-based, purely data-driven, and hybrid approaches to parameter estimation and filtering, where the latter tries to combine the benefits of model-based signal processing and data-driven deep learning to overcome their individual deficiencies. We illustrate the underlying design principles with examples from noise reduction, source separation, and dereverberation.
Published in: IEEE Signal Processing Magazine ( Volume: 41, Issue: 6, November 2024)
Page(s): 12 - 23
Date of Publication: 01 January 2025

ISSN Information:

Paderborn University, Paderborn, Germany
Reinhold Haeb-Umbach (haeb@nt.uni-paderborn.de) received his Dr.-Ing. degree from RWTH Aachen University. He is a professor of communications engineering at Paderborn University, Paderborn 33098, Germany. He has more than 30 years of experience in speech research, which he acquired in both an industrial and an academic environment. He coauthored more than 300 scientific publications, and his students have received several...Show More
Reinhold Haeb-Umbach (haeb@nt.uni-paderborn.de) received his Dr.-Ing. degree from RWTH Aachen University. He is a professor of communications engineering at Paderborn University, Paderborn 33098, Germany. He has more than 30 years of experience in speech research, which he acquired in both an industrial and an academic environment. He coauthored more than 300 scientific publications, and his students have received several...View more
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Tomohiro Nakatani (tnak@ieee.org) received his Ph.D. degree from Kyoto University, Kyoto, Japan, in 2002. He is a senior distinguished researcher at NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan. Since joining NTT in 1991, he has been investigating audio signal processing technologies for intelligent human-machine interfaces, including dereverberation, denoising, source separation, and rob...Show More
Tomohiro Nakatani (tnak@ieee.org) received his Ph.D. degree from Kyoto University, Kyoto, Japan, in 2002. He is a senior distinguished researcher at NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan. Since joining NTT in 1991, he has been investigating audio signal processing technologies for intelligent human-machine interfaces, including dereverberation, denoising, source separation, and rob...View more
NTT Communication Science Laboratories, Kyoto, Japan
Marc Delcroix (marc.delcroix@ieee.org) received his Ph.D. degree from Hokkaido University, Japan. He is a distinguished researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He was a recipient of the 2006 Student Paper Award from the IEEE Kansai Section, the 2006 Sato Paper Award from the Acoustical Society of Japan, and the 2015 IEEE Automatic Speech Recognition and Understanding Workshop Best Pape...Show More
Marc Delcroix (marc.delcroix@ieee.org) received his Ph.D. degree from Hokkaido University, Japan. He is a distinguished researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He was a recipient of the 2006 Student Paper Award from the IEEE Kansai Section, the 2006 Sato Paper Award from the Acoustical Society of Japan, and the 2015 IEEE Automatic Speech Recognition and Understanding Workshop Best Pape...View more
Paderborn University, Paderborn, Germany
Christoph Boeddeker (boeddeker@nt.upb.de) received his master’s degree in electrical engineering from Paderborn University. He is currently working toward his Ph.D. degree at Paderborn University, Paderborn 33098, Germany, under the supervision of Reinhold Haeb-Umbach. His research interests include multichannel speech separation, BF, and dereverberation as well as automatic speech recognition of meetings with a focus on ...Show More
Christoph Boeddeker (boeddeker@nt.upb.de) received his master’s degree in electrical engineering from Paderborn University. He is currently working toward his Ph.D. degree at Paderborn University, Paderborn 33098, Germany, under the supervision of Reinhold Haeb-Umbach. His research interests include multichannel speech separation, BF, and dereverberation as well as automatic speech recognition of meetings with a focus on ...View more
NTT Communication Science Laboratories, Kyoto, Japan
Tsubasa Ochiai (tsubasa.ochiai.ah@hco.ntt.co.jp) received his Ph.D. degree from Doshisha University. He is a researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He is a recipient of the 2014 Student Presentation Award from the Acoustical Society of Japan (ASJ), the 2015 Student Paper Award from the IEEE Kansai Section, the 2020 Awaya Prize Young Researcher Award from the ASJ, and the 2021 Itakura ...Show More
Tsubasa Ochiai (tsubasa.ochiai.ah@hco.ntt.co.jp) received his Ph.D. degree from Doshisha University. He is a researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He is a recipient of the 2014 Student Presentation Award from the Acoustical Society of Japan (ASJ), the 2015 Student Paper Award from the IEEE Kansai Section, the 2020 Awaya Prize Young Researcher Award from the ASJ, and the 2021 Itakura ...View more

Paderborn University, Paderborn, Germany
Reinhold Haeb-Umbach (haeb@nt.uni-paderborn.de) received his Dr.-Ing. degree from RWTH Aachen University. He is a professor of communications engineering at Paderborn University, Paderborn 33098, Germany. He has more than 30 years of experience in speech research, which he acquired in both an industrial and an academic environment. He coauthored more than 300 scientific publications, and his students have received several best student paper awards. His research interests include speech enhancement, acoustic BF, and source separation as well as automatic speech recognition and unsupervised learning from speech and audio. He is a Life Fellow of IEEE and a fellow of the International Speech Communication Association (ISCA).
Reinhold Haeb-Umbach (haeb@nt.uni-paderborn.de) received his Dr.-Ing. degree from RWTH Aachen University. He is a professor of communications engineering at Paderborn University, Paderborn 33098, Germany. He has more than 30 years of experience in speech research, which he acquired in both an industrial and an academic environment. He coauthored more than 300 scientific publications, and his students have received several best student paper awards. His research interests include speech enhancement, acoustic BF, and source separation as well as automatic speech recognition and unsupervised learning from speech and audio. He is a Life Fellow of IEEE and a fellow of the International Speech Communication Association (ISCA).View more
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Tomohiro Nakatani (tnak@ieee.org) received his Ph.D. degree from Kyoto University, Kyoto, Japan, in 2002. He is a senior distinguished researcher at NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan. Since joining NTT in 1991, he has been investigating audio signal processing technologies for intelligent human-machine interfaces, including dereverberation, denoising, source separation, and robust automatic speech recognition. He was a member of the IEEE Signal Processing Society (SPS) Audio and Acoustics Technical Committee from 2009 to 2014 and a member of the SPS Speech and Language Processing Technical Committee from 2016 to 2021. He is a Fellow of IEEE.
Tomohiro Nakatani (tnak@ieee.org) received his Ph.D. degree from Kyoto University, Kyoto, Japan, in 2002. He is a senior distinguished researcher at NTT Communication Science Laboratories, NTT Corporation, Kyoto 619-0237, Japan. Since joining NTT in 1991, he has been investigating audio signal processing technologies for intelligent human-machine interfaces, including dereverberation, denoising, source separation, and robust automatic speech recognition. He was a member of the IEEE Signal Processing Society (SPS) Audio and Acoustics Technical Committee from 2009 to 2014 and a member of the SPS Speech and Language Processing Technical Committee from 2016 to 2021. He is a Fellow of IEEE.View more
NTT Communication Science Laboratories, Kyoto, Japan
Marc Delcroix (marc.delcroix@ieee.org) received his Ph.D. degree from Hokkaido University, Japan. He is a distinguished researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He was a recipient of the 2006 Student Paper Award from the IEEE Kansai Section, the 2006 Sato Paper Award from the Acoustical Society of Japan, and the 2015 IEEE Automatic Speech Recognition and Understanding Workshop Best Paper Award honorable mention. His research interests include various aspects of speech and audio signal processing, including speech enhancement and robust speech recognition. He is a Senior Member of IEEE.
Marc Delcroix (marc.delcroix@ieee.org) received his Ph.D. degree from Hokkaido University, Japan. He is a distinguished researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He was a recipient of the 2006 Student Paper Award from the IEEE Kansai Section, the 2006 Sato Paper Award from the Acoustical Society of Japan, and the 2015 IEEE Automatic Speech Recognition and Understanding Workshop Best Paper Award honorable mention. His research interests include various aspects of speech and audio signal processing, including speech enhancement and robust speech recognition. He is a Senior Member of IEEE.View more
Paderborn University, Paderborn, Germany
Christoph Boeddeker (boeddeker@nt.upb.de) received his master’s degree in electrical engineering from Paderborn University. He is currently working toward his Ph.D. degree at Paderborn University, Paderborn 33098, Germany, under the supervision of Reinhold Haeb-Umbach. His research interests include multichannel speech separation, BF, and dereverberation as well as automatic speech recognition of meetings with a focus on combining statistical models and neural networks. In 2017 and 2022, he pursued research internships with Microsoft Research, Redmond, WA, USA, and MERL, Cambridge, MA, USA, respectively. He is a Graduate Student Member of IEEE.
Christoph Boeddeker (boeddeker@nt.upb.de) received his master’s degree in electrical engineering from Paderborn University. He is currently working toward his Ph.D. degree at Paderborn University, Paderborn 33098, Germany, under the supervision of Reinhold Haeb-Umbach. His research interests include multichannel speech separation, BF, and dereverberation as well as automatic speech recognition of meetings with a focus on combining statistical models and neural networks. In 2017 and 2022, he pursued research internships with Microsoft Research, Redmond, WA, USA, and MERL, Cambridge, MA, USA, respectively. He is a Graduate Student Member of IEEE.View more
NTT Communication Science Laboratories, Kyoto, Japan
Tsubasa Ochiai (tsubasa.ochiai.ah@hco.ntt.co.jp) received his Ph.D. degree from Doshisha University. He is a researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He is a recipient of the 2014 Student Presentation Award from the Acoustical Society of Japan (ASJ), the 2015 Student Paper Award from the IEEE Kansai Section, the 2020 Awaya Prize Young Researcher Award from the ASJ, and the 2021 Itakura Prize Innovative Young Researcher Award from the ASJ. His research interests include speech enhancement, array signal processing, and robust automatic speech recognition. He is a Member of IEEE.
Tsubasa Ochiai (tsubasa.ochiai.ah@hco.ntt.co.jp) received his Ph.D. degree from Doshisha University. He is a researcher at NTT Communication Science Laboratories, Kyoto 619-0237, Japan. He is a recipient of the 2014 Student Presentation Award from the Acoustical Society of Japan (ASJ), the 2015 Student Paper Award from the IEEE Kansai Section, the 2020 Awaya Prize Young Researcher Award from the ASJ, and the 2021 Itakura Prize Innovative Young Researcher Award from the ASJ. His research interests include speech enhancement, array signal processing, and robust automatic speech recognition. He is a Member of IEEE.View more

Contact IEEE to Subscribe

References

References is not available for this document.