Loading [MathJax]/extensions/MathMenu.js
Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers | IEEE Conference Publication | IEEE Xplore

Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers


Abstract:

We present a novel resource-efficient end-to-end architecture for lipreading that achieves state-of-the-art results on a popular and challenging benchmark. In particular,...Show More

Abstract:

We present a novel resource-efficient end-to-end architecture for lipreading that achieves state-of-the-art results on a popular and challenging benchmark. In particular, we make the following contributions: First, inspired by the recent success of the EfficientNet architecture in image classification and our earlier work on resource-efficient lipreading models (MobiLipNet), we introduce Efficient-Nets to the lipreading task. Second, we show that the currently most popular in the literature 3D front-end contains a max-pool layer that prohibits networks from reaching superior performance and propose its removal. Finally, we improve our system’s back-end robustness by including a Transformer encoder. We evaluate our proposed system on the “Lipreading In-The-Wild” (LRW) corpus, a database containing short video segments from BBC TV broadcasts. The proposed network (T-variant) attains 88.53% word accuracy, a 0.17% absolute improvement over the current state-of-the-art, while being five times less computationally intensive. Further, an up-scaled version of our model (L-variant) achieves 89.52%, a new state-of-the-art result on the LRW corpus.
Date of Conference: 23-27 May 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information:

ISSN Information:

Conference Location: Singapore, Singapore

Contact IEEE to Subscribe

References

References is not available for this document.