Abstract:
A parallel corpus is one of the main resources for training and evaluating machine translation systems. By adapting parallel texts, it is possible to improve the translat...Show MoreMetadata
Abstract:
A parallel corpus is one of the main resources for training and evaluating machine translation systems. By adapting parallel texts, it is possible to improve the translation quality of machine translators, which allow people to use different languages freely. In addition, parallel corpora play an important role in the efficiency of natural language processing tasks such as searching engines, sentiment analysis, and object recognition. There are several stages in the formation of such corpora, one of them is the alignment process. Once the parallel texts are collected, they need to be aligned at the paragraph, sentence, word or phrase level in order to determine the correspondence between segments in different languages. Today, several Aligner tools are available for these tasks, automating this process by aligning and identifying translation equivalents based on neural or statistical models. But not all available tools are equally effective in different languages. This article provides information about the linguistic and software support of the Uzbek-English “Aligner” system, which aligns parallel texts in Uzbek and English, and the stages of its creation.
Date of Conference: 26-28 October 2024
Date Added to IEEE Xplore: 11 December 2024
ISBN Information: