Abstract:
End-to-end spoken language understanding requires speech data annotated with semantic information and may suffer from the shortage of annotated data. Recent progresses le...Show MoreMetadata
Abstract:
End-to-end spoken language understanding requires speech data annotated with semantic information and may suffer from the shortage of annotated data. Recent progresses leverage unlabelled speech data to pre-train a speech encoder. However, it remains a challenge for the pre-trained speech encoder to encode semantic information. Existing works explore transferring knowledge from a pre-trained text model with different alignment losses at a fixed granularity. In this paper, we address the variable granularity in transferring knowledge from texts to speech representation via APLY, an auxiliary pooling layer, that fuses the global information with the adaptively encoded local context. We demonstrate the effectiveness of APLY on three benchmarks of spoken language understanding.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Pooling Layer ,
- Spoken Language Understanding ,
- Auxiliary Layer ,
- Local Context ,
- Knowledge Transfer ,
- Semantic Information ,
- Global Information ,
- Speech Coding ,
- Speech Representations ,
- Alignment Loss ,
- Pre-trained Encoder ,
- Transformer ,
- Local Information ,
- Emotion Recognition ,
- Average Pooling ,
- Language Model ,
- Self-supervised Learning ,
- Information Fusion ,
- Global Representation ,
- Local Pool ,
- Pre-trained Language Models ,
- Phase Alignment ,
- Adaptive Window ,
- Pooling Function
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Pooling Layer ,
- Spoken Language Understanding ,
- Auxiliary Layer ,
- Local Context ,
- Knowledge Transfer ,
- Semantic Information ,
- Global Information ,
- Speech Coding ,
- Speech Representations ,
- Alignment Loss ,
- Pre-trained Encoder ,
- Transformer ,
- Local Information ,
- Emotion Recognition ,
- Average Pooling ,
- Language Model ,
- Self-supervised Learning ,
- Information Fusion ,
- Global Representation ,
- Local Pool ,
- Pre-trained Language Models ,
- Phase Alignment ,
- Adaptive Window ,
- Pooling Function
- Author Keywords