Abstract:
Mobile devices and social media platforms make communication faster than humans have had before, thanks to the technologies such as automatic speech recognition(ASR). How...Show MoreMetadata
Abstract:
Mobile devices and social media platforms make communication faster than humans have had before, thanks to the technologies such as automatic speech recognition(ASR). However, the speed in text-based communication methods leads to several mistakes that could be solved. The two well-known mistakes are grammatical errors and forgotten punctuation usage. The punctuation restoration task is inherited from the automatic speech recognition domain. Understanding and restoring correct places of punctuation are challenging problems in speech recognition. However, no datasets exist to train a punctuation restoration model for the Turkish language. This paper focuses on restoring punctuations in Turkish texts and introduces a new Turkish dataset for punctuation restoration. Three transformer models: BERT, ELECTRA, and ConvBERT, are fine-tuned and tested on the newly created dataset for three distinct labels: PERIOD, COMMA, and QUESTION MARK. Benchmark results in the paper are reported regarding precision, recall, and F1 score due to imbalanced class distribution. Although each model shows similar performance scores, ELECTRA reaches 83.9% F1 score overall.
Date of Conference: 13-15 September 2023
Date Added to IEEE Xplore: 24 October 2023
ISBN Information: