Loading [MathJax]/extensions/MathZoom.js
ZeUS: An Unified Training Framework for Constrained Neural Machine Translation | IEEE Journals & Magazine | IEEE Xplore

ZeUS: An Unified Training Framework for Constrained Neural Machine Translation


The proposed method constructs in-domain constrained translation synthetic data from constrained words using LLMs. These synthetic examples are then used to enhance the p...

Abstract:

Unlike general translation, constrained translation necessitates the proper use of predefined restrictions, such as specific terminologies and entities, during the transl...Show More

Abstract:

Unlike general translation, constrained translation necessitates the proper use of predefined restrictions, such as specific terminologies and entities, during the translation process. However, current neural machine translation (NMT) models exhibit proficient performance solely in the domains of general translation or constrained translation. In this work, the author introduces the zero-shot unified constrained translation training framework, which adopts a novel approach of transforming constraints into textual explanations, thereby harmonizing the tasks of constrained translation with general translation. Furthermore, the author discovers the pivotal role of constructing synthetic data for domain-specific constrained translation in enhancing the model’s performance on constrained translation tasks. To this end, the author utilizes large language models (LLMs) to generate domain-specific synthetic data for constrained translation. Experiments across four datasets and four translation directions, incorporating both general and constrained translations, demonstrate that models trained with the proposed framework and synthetic data achieve superior translation quality and constraint satisfaction rates, surpassing several baseline models in both general and contrained translation. Notably, ZeUS also exhibits significant advantages over multitask learning in constrained translation, with an average improvement of 7.25 percentage points in translation satisfaction rate (TSR) and 8.50 percentage points in translation completeness (TC).
The proposed method constructs in-domain constrained translation synthetic data from constrained words using LLMs. These synthetic examples are then used to enhance the p...
Published in: IEEE Access ( Volume: 12)
Page(s): 124695 - 124704
Date of Publication: 05 September 2024
Electronic ISSN: 2169-3536

References

References is not available for this document.