Loading [MathJax]/extensions/MathMenu.js
Transliteration based Generative Pre-trained Transformer 2 Model for Tamil Text Summarization | IEEE Conference Publication | IEEE Xplore

Transliteration based Generative Pre-trained Transformer 2 Model for Tamil Text Summarization


Abstract:

Automatic text summarization plays a significant role in natural language processing to extract the useful information from the huge amount of online contents. Especially...Show More

Abstract:

Automatic text summarization plays a significant role in natural language processing to extract the useful information from the huge amount of online contents. Especially in human's day-to-day life, the online news article provides the information about current events around the world through the various media to the public. The research work focuses on addressing the problem of Tamil text summarization for online Tamil news article to create the summary for news data. In this research work, transliteration based Generative Pre-trained Transformer 2 (GPT-2) model is proposed to summarize the online news articles for Tamil language by extracting the great number of relevant features namely sentence position, one hot encoding, number of entities, term frequency, and inverse document frequency. We proposed Attention based LSTM- NMT Transliteration model for performing translation between Tamil and English Languages and it is compared with other transliteration models for analyzing the performance of the model. In addition, the comparative study is made between three pre trained transformer models known as fine-tuned Generative Pre-trained Transformer 2 (GPT-2) model, Text-To-Text Transfer Transformer (T-5) model and Bidirectional Encoder Representations from Transformers (BERT) model. To evaluate the performance of all models, the dataset is created for Tamil language by gathering the data from various online sources. The experiments are done to evaluate the enhanced transliteration model based on Bilingual Evaluation Understudy Score and also to assess the GPT-2 based text summarization model based on the ROUGE evaluation measures. The performance analysis shows that the transliteration based Generative Pre-trained Transformer 2 (GPT-2) model achieves better summarization performance by reducing the occurrence of repetition.
Date of Conference: 25-27 January 2022
Date Added to IEEE Xplore: 31 March 2022
ISBN Information:
Print on Demand(PoD) ISSN: 2329-7190
Conference Location: Coimbatore, India

Contact IEEE to Subscribe

References

References is not available for this document.