Skip to Main Content
The big challenges facing the search techniques on Gnutella-like peer-to-peer networks are search efficiency and quality of search results. In this paper, leveraging information retrieval (IR) algorithms such as Vector Space Model (VSM) and relevance ranking algorithms, we present GES (Gnutella with Efficient Search) to improve search performance. The key idea is that GES uses a distributed topology adaptation algorithm to organize semantically relevant nodes into same semantic groups by using the notion of node vector. Given a query, GES employs an efficient search protocol to direct the query to the most relevant semantic groups for answers, thereby achieving high recall with probing only a small fraction of nodes. To the best of our knowledge, GES is the first to identify node vector size as an important role in impacting search performance and to show that the node vector size offers a good trade-off between search performance and bandwidth cost. Moreover, GES adopts automatic query expansion and local data clustering to improve search performance. We show that GES is efficient and even outperforms the centralized node clustering system SETS. For example, in the scenario where node capacity is heterogeneous, GES can achieve 73 percent recall when probing only 20 percent nodes, outperforming SETS by about 18 percent.