Skip to Main Content
We survey a new way to get quick estimates of the values of simple statistks (like count, mean, standard deviation, maximum, median, and mode frequency) on a large data set. This approach is a comprehensive attempt (apparently the first) to estimate statistics without any sampling. Our "antisampling" techniques have analogies to those of sampling, and exhibit similar estimation accuracy, but can be done much faster than sampling with large computer databases. Antisampling exploits computer science ideas from database theory and expert systems, building an auxiliary structure called a "database abstract." We make detailed comparisons to several different kinds of sampling.