Abstract:
This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based m...Show MoreMetadata
Abstract:
This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based machine translation models. This is the first work that applies neural-based machine translation method to this problem and gives a thorough comparison to the phrase-based machine translation method which is the current state-of-the-art method for this problem. On a large dataset, the phrase-based approach has an accuracy of 97.32% while that of the neural-based approach is 96.15%. While the neural-based method has a slightly lower accuracy, it is about twice faster than the phrase-based method in terms of inference speed. Moreover, neural-based machine translation method has much room for future improvement such as incorporating pre-trained word embeddings and collecting more training data.
Date of Conference: 05-07 December 2017
Date Added to IEEE Xplore: 22 February 2018
ISBN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Word Embedding ,
- Machine Translation ,
- Pre-trained Word Embeddings ,
- Training Set ,
- Support Vector Machine ,
- Training Time ,
- Conditional Probability ,
- Recurrent Neural Network ,
- Syllable ,
- Accuracy Scores ,
- Language Model ,
- Hidden State ,
- Target Language ,
- Words In Language ,
- Approach For Problem ,
- Sequence Of Words ,
- Conditional Random Field ,
- Source Language ,
- Words In The Lexicon ,
- Target Sentence ,
- Encoder Module ,
- Sentence Pairs ,
- Input Sentence ,
- Decoding
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Word Embedding ,
- Machine Translation ,
- Pre-trained Word Embeddings ,
- Training Set ,
- Support Vector Machine ,
- Training Time ,
- Conditional Probability ,
- Recurrent Neural Network ,
- Syllable ,
- Accuracy Scores ,
- Language Model ,
- Hidden State ,
- Target Language ,
- Words In Language ,
- Approach For Problem ,
- Sequence Of Words ,
- Conditional Random Field ,
- Source Language ,
- Words In The Lexicon ,
- Target Sentence ,
- Encoder Module ,
- Sentence Pairs ,
- Input Sentence ,
- Decoding
- Author Keywords