Skip to Main Content
With a growing trend towards grid-based data repositories and data analysis services, scientific data analysis often involves accessing multiple data sources, and analyzing the data using a variety of analysis programs. One critical challenge in this, however, is that data sources often hold the same type of data in a number of different formats, and also, the formats expected and generated by various data analysis services are often distinct. We believe that the traditional approach for dealing with this problem, which is using hand-written wrappers, is not an effective and scalable solution for a grid environment. This paper presents a new approach, which involves generating wrappers automatically for enabling grid-based information integration and workflows. In this approach, a layout descriptor is used for describing the data format for each data source, as well as the input and output format for each tool or service. Efficient wrappers are then generated automatically for translation between any two data formats. Our design separates wrapper generation service from the wrapper execution. The wrapper generation service analyzes the layout descriptors and generates a WRAPINFO data structure. The wrapper comprises a set of application independent modules which take the WRAPINFO data structure as the input. We demonstrate our wrapper generation tool with two real case studies. Besides showing the effectiveness of our system, the experiments results from these two case studies show that the wrapper generation overhead is very small, automatically generated wrappers scale well to large datasets, and for the one case where this comparison was possible, the execution time of our wrapper was within 30% of that of a hand-written one.
Date of Conference: 13-14 Nov. 2005