Skip to Main Content
Plagiarism is the practice of claiming, or implying, original authorship or incorporating material from someone else's written or creative work, in whole or in part, into one's own without adequate acknowledgement. Unlike cases of forgery, in which the authenticity of the writing, document, or some other kind of object, itself is in question, plagiarism is concerned with the issue of false attribution. Plagiarism has become a significant problem in the student community due to the fact that the wide accessability of digitalized information in the WWW. It has become difficult task for the teachers as well as adjudicators to catch the cheaters. There are many tools which are either internet based or pc based to detect plagiarism and both are having advantages and disadvantages. To detect plagiarism there is a need to find the extent of similarity between a pair of text documents for providing access to topically relevant documents on one hand and for identifying document replication on the other hand. In this paper the details of a Rough Set based Document Ranking system (RSDRS) developed by the authors are presented. The terms associated with related concepts are grouped together to form equivalence classes by clustering the terms of the vocabulary. The query passage and the documents are represented as rough sets using these equivalence classes of terms and further partitioned into families of rough sets in higher level approximation spaces which impose partial ordering on the families of documents with reference to the query passage. Documents falling in the same family are ordered in accordance with their similarity to the query to form the relevance ranking of the documents.