Large-scale scientific data analysis projects have catalyzed service-based workflow management systems. We present an approach for integrating user preferences on completion time and workflow accuracy in a workflow composition system. The relationship between workflow execution time and the accuracy of results is exploited by our workflow system. Specifically, our system is equipped with a way for users to define cost models on service completion time and error propagation (prevalent in many scientific and data analysis applications). Together with these models and an ontology for describing Web service and data depedencies, our system plans service-based workflows to answer high level queries. Our system was evaluated under a real service-based environment against user constraints on time, accuracy, and network bandwidth variations. In the worst case in our experiments, we observed an average deviation of 14.3% below the desired time constraints, which suggests that our system is time-conservative. Within varying network bandwidth environments, we can also meet time constraints through sampling, and only a 12.4% deviation below time expectations are observed on average. We further show that, though negotiating with services' error models, our system is capable of planning data reduction measures (e.g., sampling) directly within workflow plans to achieve the desired accuracy.
Published in:
Services Computing, IEEE Transactions on
(Volume:PP
,
Issue:
99
)