Skip to Main Content
Document copy detection is an important tool to protect author's intellectual property and to improve efficiency of digital library. It uses the extracted text features to identify copying between documents, therefore the feature extrac- tion method crucially affects the performance of a document copy detection system. This paper introduces a window- based feature extraction method and makes three contribu- tions: it can identify any matches of a certain length; it can produce the describing information where overlap occurs between documents; it can provide the results with different precision. We report the experimental result that validates the behaviors and properties of the proposed method.