I. Introduction
Similarity join is an essential operation that finds all pairs of records from two data collections whose similarity scores are no less than a given threshold using a similarity function, e.g., Jaccard similarity [18]. Similarity joins are widely used in a variety of applications including data integration [6], data cleaning [7], duplicate detection [22], record linkage [20] and entity resolution [8].