Skip to Main Content
This paper aims to specify a Case-Based Reasoning strategy for correctly classifying, storing and preventing duplication efforts of electronic text material. Preservation of complete source documents for checking similarity between them pose a daunting amount of spatial and computational complexity to researchers in this area. The problem is partially solved by applying certain preprocessing steps to reduce the volume of data handling substantially. Reduction of volume in text documents is achieved by applying some stemming algorithm and elimination of stop words from the document utilizing certain text-mining measures such as TF-IDF. A third technique involves extraction of keywords and storing them in a properly indexed base. These then can serve the dual purpose of providing solutions to Lazy Learning classification for automatic subject-wise archiving and formation of relevant word sequences for detection of plagiarism using Association Rule-mining techniques.