Skip to Main Content
Complex ad hoc join queries over enterprise databases are commonly used by business data analysts to understand and analyze a variety of enterprise-wide processes. However, effectively formulating such queries is a challenging task for human users, especially over databases that have large, heterogeneous schemas. In this paper, we propose a novel approach to automatically create join query recommendations based on input-output specifications (i.e.,input tables on which selection conditions are imposed, and output tables whose attribute values must be in the result of the query).The recommended join query graph includes (i) "intermediate'' tables, and (ii) join conditions that connect the input and output tables via the intermediate tables. Our method is based on analyzing an existing query log over the enterprise database. Borrowing from program slicing techniques, which extract parts of a program that affect the value of a given variable, we first extract "query slices'' from each query in the log. Given a user specification, we then re-combine appropriate slices to create a new join query graph, which connects the sets of input and output tables via the intermediate tables. We propose and study several quality measures to enable choosing a good join query graph among the many possibilities. Each measure expresses an intuitive notion that there should be sufficient evidence in the log to support our recommendation of the join query graph. We conduct an extensive study using the log of an actual enterprise database system to demonstrate the viability of our novel approach for recommending join queries.