Abstract:
Plagiarism occurs when someone uses another person’s words or ideas without properly citing the original writers. It can be considered with or without the original author...Show MoreMetadata
Abstract:
Plagiarism occurs when someone uses another person’s words or ideas without properly citing the original writers. It can be considered with or without the original author’s consent. This includes all manuscripts, print works, electronic works, and published and unpublished works. Detecting Bengali plagiarism has been the topic of several previous research. Numerous scholars have attempted to detect and prevent plagiarism using various methodologies and models, each with limits and drawbacks. Plagiarism detection in Bangla documents is uncommon and unavailable. As a result, this article developed a more efficient way that uses deep learning techniques to encode Bengali words and phrases as vectors using the word2vec and fasttext method. The proposed method was implemented to a Bengali Wikipedia dataset as well as original datasets collected by authors totaling 112,183 entries. In this study, the proposed model works effectively and can find both plagiarized and actual documents with a high accuracy of 100% based on data similarity and the extrinsic evaluation of the embeddings on a downstream NLP task, which is sentiment analysis, with a high accuracy of 91.4% using word embeddings generated by fasttext. Cosine similarity outperforms Jaccard similarity techniques in this suggested model.
Published in: 2023 IEEE International Conference on Contemporary Computing and Communications (InC4)
Date of Conference: 21-22 April 2023
Date Added to IEEE Xplore: 29 September 2023
ISBN Information: