Skip to Main Content
This work presents a method for accelerating algorithms for computing common statistical operations like parameter estimation or sampling on B-tree indexed data; the work was carried out in the context of visualisation of large scientific data sets. The underlying idea is the following: the shape of balanced data structures like B-trees encodes and reflects data semantics according to the balance criterion. For example, clusters in the index attribute are somewhat likely to be present not only on the data or leaf level of the tree but should propagate up into the interior levels. The paper also hints at opportunities and limitations of this approach for visualisation of large data sets. The advantages of the method are manifold. Not only does it enable advanced algorithms through a performance boost for basic operations like density estimation, but it also builds on functionality that is already present to a large degree in current RDBMSs. Additionally, it is fully dynamic and avoids redundancy: when the underlying source data change, the index and therefore the estimations adapt accordingly. Furthermore, we show that the sample quality is data-independent and that it can be modelled by a uniform sampling process if some basic prerequisites are ensured.
Date of Conference: 7-9 July 2004