Skip to Main Content
SPIDER, developed at AT&T Labs-Research, is a system that efficiently supports flexible string matching against attribute values in large databases, and is extensively used in AT&T. The scoring methodology is based on tf.idf weighting and cosine similarity, and SPIDER maintains indexes containing string tokens and their weights, for fast matching at query time. Given the "global" nature of the weights maintained in the indexes, even a few updates to the underlying database tables would necessitate a (near-complete recomputation of the indexes, which can be prohibitively expensive. In this paper, we explore novel techniques to considerably reduce the cost of propagating updates in SPIDER, without a significant degradation of answer accuracy or query performance. We present experimental evidence using real data sets to demonstrate the practical benefits of our techniques.
Date of Conference: 15-20 April 2007