Skip to Main Content
This paper presents and analyzes an effective and practical method of accomplishing partial-match retrieval on a computer file containing a large number of information records. In partial-match retrieval a subset of the records in the file is selected and retrieved by specifying a query set consisting of a small number of key values; the records selected and retrieved are those having a match to all the key values in the query set. Partial-match retrieval can be a powerful capability when used in information retrieval systems, and a stored file augmented with such a capability is equivalent to an associative or content-addressable store. The method presented in this paper is based upon the use of superimposed codes, a technique used previously with some success in notched-edge card filing systems in which card selection was accomplished mechanically with long metal needles. A new algorithm is presented for generating superimposed codes without the use of a stored code dictionary. The new algorithm executes rapidly on a digital computer and employs a hash function and a pseudo-random number generator; it allows the generation of binary codes having any desired width and weight. It is pointed out that the use of variableweight binary codes gives an advantage on files in which the key values occur with substantially different frequencies. It is shown that organizing the bits of the superimposed code words properly in storage leads to the property that only a small fraction of these bits need be retrieved and processed on each query; this substantially reduces the time required to execute a query. Also, because of the simplicity of the bit operations required in the processing of superimposed code words, a very fast query processor could be designed and built with currently available digital hardware devices and techniques. With the query processor implemented in such digital hardware, partial-match query rates as high as 100/s or even higher appear to be feasible. In order to demonstrate experimentally the power of the method of superimposed codes, the method was implemented in software on a file consisting of the Suffolk County, NY, telephone-directory listings.