Skip to Main Content
We study the problem of choosing the "best" subset of k sensors to sample from among a sensor deployment of n > k sensors, in order to predict aggregate functions over all the sensor values. The sensor data being measured are assumed to be spatially correlated, in the sense that the values at two sensors can differ by at most a monotonically increasing, concave function of their distance. The goal is then to select a subset of sensors so as to minimize the prediction error, assuming that the actual values at unsampled sensors are worst-case subject to the constraints imposed by their distances from sampled sensors. Even selecting sensors for the optimal prediction of the mean, maximum or minimum is NP-hard; we present approximation algorithms to select near-optimal subsets of k sensors that minimize the worst-case prediction error. In general, we show that for any aggregate function satisfying certain concavity, symmetry and monotonicity conditions, the sensor selection problem can be modeled as a k-median clustering problem, and solved using efficient approximation algorithms designed for k-median clustering. Our theoretical results are complemented by experiments on two real-world sensor data sets; our experiments confirm that our algorithms lead to prediction errors that are usually less than the (normalized) standard deviation of the test data, using only around 10% of the sensors.