Skip to Main Content
In current distributed systems, such as Grids, Clouds, or P2P systems, the amount of information to handle influences the way the system is managed. In P2P systems containing large quantities of data, or in Grid systems containing a large number of (often heterogeneous) resources, information about data or resources must be spread through the system in an efficient way in order to allow them to be found. An information discovery technique based on data summarization, via clustering, is presented. These summaries can be used to classify information to provide users with greater insight about documents or computing resources compared to raw data. Also, meta-schedulers or brokers would benefit from the proposed technique due to the fact that they would have to deal with less data from resources, thus aiding to the scalability of the system. An evaluation of the approach is subsequently provided to identify the impact of choosing particular parameters to be used as part of the summary.