Skip to Main Content
This paper presents a structural statistical machine translation (SSMT) model to deal with the data sparseness problem that occurs as a result of the necessarily small corpus to translate Chinese into Taiwanese Sign Language (TSL). A parallel bilingual corpus was developed, and linguistic information from the Sinica Treebank is adopted for Chinese sentence analysis. The synchronous context free grammar (SCFG) was adopted to convert a Chinese structure to the corresponding TSL structure and then extract a translation memory which comprises the thematic relations between the grammar rules of both structures. In structural translation, the statistical MT (SMT) approach was used to align the thematic roles in the grammar rules and the translation memory provides the reference templates for TSL structure translation. Finally, the agreement information for TSL verbs was labeled for enriching the expressiveness of the translated TSL sequence. Several experiments were conducted to evaluate the translation performance and the communication effectiveness for the deaf. The evaluation results demonstrate that the proposed approach outperforms a baseline statistical MT system using the same small corpus, especially for the translation of long sentences.