Structural overview of the Extendable Neural Contextual Corrector (XNCC). The figure shows the model operating on an example input "I sea u #U". The English example is ch...
Abstract:
Neural-based sequence-to-sequence methods (Seq2Seq) have proven to be highly effective for Context-sensitive Thai spelling correction. However, they also inherit the draw...Show MoreMetadata
Abstract:
Neural-based sequence-to-sequence methods (Seq2Seq) have proven to be highly effective for Context-sensitive Thai spelling correction. However, they also inherit the drawbacks of Seq2Seq, such as a fixed vocabulary and large data requirements. However, dictionary-based methods and their typical applications are insufficiently robust to produce corrections with reduced error rates. These drawbacks inhibit the application of these methods in a broader range of use cases. In this paper, we provide a practical guide on how to build correction systems progressively and efficiently with three main contributions. First, we present a process for efficiently and progressively producing training data for both neural-based and dictionary-based methods. Our annotation process enables existing methods to be trained with only two percent of the data hand annotated. Second, we propose the Extendable Neural Contextual Corrector (XNCC), a novel text correction approach that decouples the dictionary from the neural model. This enables the dictionary to be extended post-training. Finally, we compare text correction systems with various configurations to demonstrate how these systems can be effectively used to produce corrections. Our experiments show that 1) minor changes to dictionary-based methods can significantly improve correction performance, 2) neural-based correction systems can be trained using a fraction of the data, and 3) XNCC can have the dictionary extended to generalize to new data without re-training. Lastly, we provide recommendations for progressively building text correction systems at multiple levels of implementation effort based on our findings.
Structural overview of the Extendable Neural Contextual Corrector (XNCC). The figure shows the model operating on an example input "I sea u #U". The English example is ch...
Published in: IEEE Access ( Volume: 11)