Skip to Main Content
Probabilistic and stochastic models are routinely used in performance, dependability and security evaluation, and determining appropriate values for model parameters is a long-standing problem in the practical use of such models. With the increasing emphasis on human aspects and business considerations, data collection to estimate parameter values often gets prohibitively expensive, since it may involve questionnaires, costly audits or additional monitoring and processing. In this paper we articulate a set of optimization problems related to data collection, and provide efficient algorithms to determine the optimal data collection strategy for a model. The main idea is to model the uncertainty of data sources and determine its influence on output accuracy by solving the model. This approach is particularly natural for data sources that rely on sampling, such as questionnaires or monitoring, since uncertainty can be expressed using the central limit theorem. We pay special attention to the efficiency of our optimization algorithm, using ideas inspired by importance sampling to derive optimal strategies for a range of parameter values from a single set of experiments.