Journals & Magazines >IEEE Access >Volume: 12

Enhancing Sindhi Word Segmentation Using Subword Representation Learning and Position-Aware Self-Attention

The proposed SGNWS model includes three core parts: the encoder, self-attention, and decoder. Then, the training details of the proposed model are presented. The proposed...

Abstract:

Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists o...Show More

Metadata

Abstract:

Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features. However, these methods have limitations, such as difficulty handling out-of-vocabulary words, limited robustness for other languages, and inefficiency with large amounts of noisy or raw text. Neural network-based models, in contrast, can automatically capture word boundary information without requiring prior knowledge. In this paper, we propose a Subword-Guided Neural Word Segmenter (SGNWS) that addresses word segmentation as a sequence labeling task. The SGNWS model incorporates subword representation learning through a bidirectional long short-term memory encoder, position-aware self-attention, and a conditional random field. Our empirical results demonstrate that the SGNWS model achieves state-of-the-art performance in Sindhi word segmentation on six datasets.

The proposed SGNWS model includes three core parts: the encoder, self-attention, and decoder. Then, the training details of the proposed model are presented. The proposed...

Published in: IEEE Access ( Volume: 12)

Page(s): 183133 - 183142

Date of Publication: 27 November 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3507382

Funding Agency:

Contents

References is not available for this document.

Enhancing Sindhi Word Segmentation Using Subword Representation Learning and Position-Aware Self-Attention

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Enhancing Sindhi Word Segmentation Using Subword Representation Learning and Position-Aware Self-Attention

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?