Skip to Main Content
Data-intensive science is a scientific discovery process that is driven by knowledge extracted from large volumes of data rather than the traditional hypothesis driven discovery process. One of the key challenges in data-intensive science is development of enabling technologies to allow researchers to effectively utilize these large volumes of data in an effective manner. This paper introduces the concept of “data prospecting” to address the challenges of data intensive science. With data prospecting, we extend the familiar metaphor of data mining to describe an initial phase of data exploration used to determine promising areas for deeper analysis. Data prospecting enhances data selection through the use of interactive discovery engines. Interactive exploration enables a researcher to filter the data based on the “first look” analytics, discover interesting and previously unknown patterns to start new science investigations, verify the quality of the data, and corroborate whether patterns in the data match existing science theories or mental models. This paper describes our initial evaluation of the value of“data prospecting” to Earth Science researchers as part of their research process. The paper describes our discovery engine prototype to support data prospecting for specific data products along with its current limitations. Example science investigations from three different researchers using our prototype discovery engine to explore the Special Sensor Microwave/Imager and Sounder (SSM/I, SSMIS) data products are also presented.