Loading [a11y]/accessibility-menu.js
Performance Enhanced Multiset Similarity Joins | IEEE Conference Publication | IEEE Xplore

Performance Enhanced Multiset Similarity Joins


Abstract:

The amount of data produced on a daily basis isgrowing at an exponential rate. One method of filtering throughthis data is the use of similarity joins, or methods that ar...Show More

Abstract:

The amount of data produced on a daily basis isgrowing at an exponential rate. One method of filtering throughthis data is the use of similarity joins, or methods that areused to identify similar data. Such algorithms are used fora variety of applications ranging from plagiarism detection tomarketing. These methods are typically time-consuming andcomputationally expensive. This paper proposes an efficient threestageMapReduce algorithm named Adept Similarity Join (ASJ)for multisets. The main novelty in ASJ is to integrate suffixfiltering with positional filtering when performing similarity joins, in addition to incorporating prefix and size filtering. The proposedalgorithm is compared to the state-of-the-art Strategic and Suaveprocessing for performing similarity joins using MapReduce(SSS) algorithm, which it outperforms by lowering the numberof redundant comparisons. Experimental results on a Twitterdataset demonstrate that the proposed ASJ algorithm provides about a 25% to 40% decrease in execution time and 100x reduction in memory usage compared to SSS.
Date of Conference: 08-10 October 2016
Date Added to IEEE Xplore: 31 October 2016
ISBN Information:
Conference Location: Atlanta, GA, USA

Contact IEEE to Subscribe

References

References is not available for this document.