Loading [MathJax]/extensions/MathMenu.js
M-MoE: Mixture of Mixture-of-Expert Model for CTC-based Streaming Multilingual ASR | IEEE Conference Publication | IEEE Xplore

M-MoE: Mixture of Mixture-of-Expert Model for CTC-based Streaming Multilingual ASR


Abstract:

The Mixture-of-Expert (MoE) structure has been effectively utilized in multilingual ASR tasks. However, the potential of external language information remains underutiliz...Show More

Abstract:

The Mixture-of-Expert (MoE) structure has been effectively utilized in multilingual ASR tasks. However, the potential of external language information remains underutilized. In this paper, we introduce the Mixture of MoE (M-MoE) structure, featuring multiple language-specific MoEs and a language-unknown MoE. The language-unknown MoE reuses experts from language-specific MoEs. Inputs with language IDs are directed to language-specific MoEs, while those without IDs go to the language-unknown MoE. We propose a two-stage training method for the M-MoE-based model. Our unified model structure is suitable for streaming ASR tasks in both language-known and language-unknown scenarios. Experiments on a three-language dataset show that compared to the Conformer baseline, our model achieves an average of 12% and 9% relative improvement in language-known and language-unknown scenarios. Compared to the strong MoE baseline, there is an average 5% relative improvement in the language-known scenario.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Contact IEEE to Subscribe

References

References is not available for this document.