By Topic

Detecting Text Similarity over Chinese Research Papers Using MapReduce

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Fan Xu ; Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China ; Qiaoming Zhu ; Peifeng Li

This paper proposes a novel method to detect text similarity over Chinese research papers using MapReduce paradigm. Our approach differs from the state-of-the-art methods in two aspects. First, we extract the key sentences from Chinese research papers by using some heuristic features and then generate 2-tuple, (document id, key phrase), as the representation of the documents. Second, we design 2-phrase MapReduce algorithm to verify the effectiveness of the generated 2-tuple. For evaluation, we compare the proposed method with other approaches on synthetic corpus. Experimental results review that our method much outperforms the state-of-the-art ones on running time performance while guarantee the Jaccard similarity coefficient.

Published in:

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2011 12th ACIS International Conference on

Date of Conference:

6-8 July 2011