By Topic

Data Engineering (ICDE), 2011 IEEE 27th International Conference on

Date 11-16 April 2011

Filter Results

Displaying Results 1 - 25 of 152
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (1248 KB)  
    Freely Available from IEEE
  • Hub page

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (182 KB)  
    Freely Available from IEEE
  • Session list

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (24 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): 1 - 35
    Save to Project icon | Request Permissions | PDF file iconPDF (126 KB)  
    Freely Available from IEEE
  • Brief author index

    Page(s): 1 - 11
    Save to Project icon | Request Permissions | PDF file iconPDF (53 KB)  
    Freely Available from IEEE
  • Detailed author index

    Page(s): 1 - 57
    Save to Project icon | Request Permissions | PDF file iconPDF (173 KB)  
    Freely Available from IEEE
  • The end of indexes

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (126 KB)  
    Freely Available from IEEE
  • [PDF Reader FAQ and support]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (126 KB)  
    Freely Available from IEEE
  • [PDF Reader FAQ and support]

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | PDF file iconPDF (455 KB)  
    Freely Available from IEEE
  • Message from the ICDE 2011 program chairs and general chairs

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • ICDE 2011 Committees

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (67 KB)  
    Freely Available from IEEE
  • Program Committee

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | PDF file iconPDF (87 KB)  
    Freely Available from IEEE
  • Sponsors

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (38 KB)  
    Freely Available from IEEE
  • Embarrassingly scalable database systems

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (154 KB) |  | HTML iconHTML  

    Summary form only given. Database systems have long optimized for parallel execution; the research community has pursued parallel database machines since the early '80s, and several key ideas from that era underlie the design and success of commercial database engines today. Computer microarchitecture, however, has shifted drastically during the intervening decades. Until the end of the 20th century Moore's Law translated into single-processor performance gains; today's constraints in semiconductor technology cause transistor integration to increase the number of processors per chip roughly every two years. Chip multiprocessor, or multicore, platforms are commodity; harvesting scalable performance from the available raw parallelism, however, is increasingly challenging for conventional database servers running business intelligence and transaction processing workloads. A careful analysis of database performance scaling trends on future chip multiprocessors [7] demonstrates that current parallelism methods are of bounded utility as the number of processors per chip increases exponentially. Common sense is often contradicted; for instance, increasing on-chip cache size or aggressively sharing data among processors is often detrimental to performance [6]. When designing high performance database systems, there are tradeoffs between single-thread performance and scalability; as the number of hardware contexts grows, favoring scalability wins [5]. In order to transform a database storage manager from a single-threaded Atlas into a multi-threaded Lernaean Hydra which scales to infinity, substantial rethinking of fundamental constructs at all levels of the system is in order. Primitives such as the mechanism to access critical sections become crucial: spinning wastes cycles, while blocking incurs high overhead [3]. At the database processing level, converting concurrency into parallelism proves to be a challenging task, even for transactional workloads that are inherent- - ly concurrent. Typical obstacles are by-definition centralized operations, such as locking; we need to ensure consistency by decoupling transaction data access from process assignment, while adapting lessons from the first parallel database machines on the multicore platforms of the future [1][4]. Often, parallelism needs to be extracted from seemingly serial operations such as logging; extensive research in distributed systems proves to be very useful in this context [2]. At the query processing level, service-oriented architectures provide an excellent framework to exploit available parallelism. In this talk, I present lessons learned when trying to scale database workloads on chip multiprocessors. I discuss the tradeoffs and trends and outline the above research directions using examples from the StagedDB/CMP and ShoreMT projects at EPFL. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ontological queries: Rewriting and optimization

    Page(s): 2 - 13
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB) |  | HTML iconHTML  

    Ontological queries are evaluated against an enterprise ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the underlying relational database. The focus here is on soundness and completeness. We review previous results and present a new rewriting algorithm for rather general types of ontological constraints (description logics). In particular, we show how a conjunctive query (CQ) against an enterprise ontology can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this process so to produce possibly small and cost-effective output UCQ. We review existing optimization methods, and propose an effective new method that works for Linear Datalog±, a description logic that encompasses well-known description logics of the DL-Lite family. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Playing games with databases

    Page(s): 14
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (53 KB) |  | HTML iconHTML  

    Scalability is a fundamental problem in the development of computer games and massively multiplayer online games (MMOs). Players always demand more - more polygons, more physics particles, more interesting AI behavior, more monsters, more simultaneous players and interactions, and larger virtual worlds. Scalability has long been a focus of the database research community. However, the games community has done little to exploit database research. Most game developers think of databases as persistence solutions, designed to store and read game state. But the advantages of database systems go way beyond persistence. Database research has dealt with the beautiful question of efficiently processing declarative languages, and the resulting research questions have touched many areas of computer science. In this talk, I will show how the idea of declarative processing from databases can be applied to computer games and simulations. I will describe our journey from declarative to imperative scripting languages for computer games and simulations, introducing the state-effect pattern, a design pattern that enables developers to design games and simulations that can be programmed imperatively, but processed declaratively. I will then show how we can use database techniques to run scalable games and simulations cheaply in the cloud. I will then discuss how we can use ideas from optimistic concurrency control to scale interactions between players in MMOs, and I will describe how we can use these techniques to build elastic transaction processing infrastructure for the cloud. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interactive itinerary planning

    Page(s): 15 - 26
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (937 KB) |  | HTML iconHTML  

    Planning an itinerary when traveling to a city involves substantial effort in choosing Points-of-Interest (POIs), deciding in which order to visit them, and accounting for the time it takes to visit each POI and transit between them. Several online services address different aspects of itinerary planning but none of them provides an interactive interface where users give feedbacks and iteratively construct their itineraries based on personal interests and time budget. In this paper, we formalize interactive itinerary planning as an iterative process where, at each step: (1) the user provides feedback on POIs selected by the system, (2) the system recommends the best itineraries based on all feedback so far, and (3) the system further selects a new set of POIs, with optimal utility, to solicit feedback for, at the next step. This iterative process stops when the user is satisfied with the recommended itinerary. We show that computing an itinerary is NP-complete even for simple itinerary scoring functions, and that POI selection is NP-complete. We develop heuristics and optimizations for a specific case where the score of an itinerary is proportional to the number of desired POIs it contains. Our extensive experiments show that our algorithms are efficient and return high quality itineraries. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CubeLSI: An effective and efficient method for searching resources in social tagging systems

    Page(s): 27 - 38
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (604 KB) |  | HTML iconHTML  

    In a social tagging system, resources (such as photos, video and web pages) are associated with tags. These tags allow the resources to be effectively searched through tag-based keyword matching using traditional IR techniques. We note that in many such systems, tags of a resource are often assigned by a diverse audience of causal users (taggers). This leads to two issues that gravely affect the effectiveness of resource retrieval: (1) Noise: tags are picked from an uncontrolled vocabulary and are assigned by untrained taggers. The tags are thus noisy features in resource retrieval. (2) A multitude of aspects: different taggers focus on different aspects of a resource. Representing a resource using a flattened bag of tags ignores this important diversity of taggers. To improve the effectiveness of resource retrieval in social tagging systems, we propose CubeLSI - a technique that extends traditional LSI to include taggers as another dimension of feature space of resources. We compare CubeLSI against a number of other tag-based retrieval models and show that CubeLSI significantly outperforms the other models in terms of retrieval accuracy. We also prove two interesting theorems that allow CubeLSI to be very efficiently computed despite the much enlarged feature space it employs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adding regular expressions to graph reachability and pattern queries

    Page(s): 39 - 50
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (526 KB) |  | HTML iconHTML  

    It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries based on extended graph simulation, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness, efficiency and scalability of these algorithms are experimentally verified using real-life data and synthetic data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient core decomposition in massive networks

    Page(s): 51 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (342 KB) |  | HTML iconHTML  

    The k-core of a graph is the largest subgraph in which every vertex is connected to at least k other vertices within the subgraph. Core decomposition finds the k-core of the graph for every possible k. Past studies have shown important applications of core decomposition such as in the study of the properties of large networks (e.g., sustainability, connectivity, centrality, etc.), for solving NP-hard problems efficiently in real networks (e.g., maximum clique finding, densest subgraph approximation, etc.), and for large-scale network fingerprinting and visualization. The k-core is a well accepted concept partly because there exists a simple and efficient algorithm for core decomposition, by recursively removing the lowest degree vertices and their incident edges. However, this algorithm requires random access to the graph and hence assumes the entire graph can be kept in main memory. Nevertheless, real-world networks such as online social networks have become exceedingly large in recent years and still keep growing at a steady rate. In this paper, we propose the first external-memory algorithm for core decomposition in massive graphs. When the memory is large enough to hold the graph, our algorithm achieves comparable performance as the in-memory algorithm. When the graph is too large to be kept in the memory, our algorithm requires only O(kmax) scans of the graph, where kmax is the largest core number of the graph. We demonstrate the efficiency of our algorithm on real networks with up to 52.9 million vertices and 1.65 billion edges. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • T-verifier: Verifying truthfulness of fact statements

    Page(s): 63 - 74
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (459 KB) |  | HTML iconHTML  

    The Web has become the most popular place for people to acquire information. Unfortunately, it is widely recognized that the Web contains a significant amount of untruthful information. As a result, good tools are needed to help Web users determine the truthfulness of certain information. In this paper, we propose a two-step method that aims to determine whether a given statement is truthful, and if it is not, find out the truthful statement most related to the given statement. In the first step, we try to find a small number of alternative statements of the same topic as the given statement and make sure that one of these statements is truthful. In the second step, we identify the truthful statement from the given statement and the alternative statements. Both steps heavily rely on analysing various features extracted from the search results returned by a popular search engine for appropriate queries. Our experimental results show the best variation of the proposed method can achieve a precision of about 90%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Flexible use of cloud resources through profit maximization and price discrimination

    Page(s): 75 - 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (346 KB) |  | HTML iconHTML  

    Modern frameworks, such as Hadoop, combined with abundance of computing resources from the cloud, offer a significant opportunity to address long standing challenges in distributed processing. Infrastructure-as-a-Service clouds reduce the investment cost of renting a large data center while distributed processing frameworks are capable of efficiently harvesting the rented physical resources. Yet, the performance users get out of these resources varies greatly because the cloud hardware is shared by all users. The value for money cloud consumers achieve renders resource sharing policies a key player in both cloud performance and user satisfaction. In this paper, we employ microeconomics to direct the allotment of cloud resources for consumption in highly scalable master-worker virtual infrastructures. Our approach is developed on two premises: the cloud-consumer always has a budget and cloud physical resources are limited. Using our approach, the cloud administration is able to maximize per-user financial profit. We show that there is an equilibrium point at which our method achieves resource sharing proportional to each user's budget. Ultimately, this approach allows us to answer the question of how many resources a consumer should request from the seemingly endless pool provided by the cloud. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intelligent management of virtualized resources for database systems in cloud environment

    Page(s): 87 - 98
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1078 KB) |  | HTML iconHTML  

    In a cloud computing environment, resources are shared among different clients. Intelligently managing and allocating resources among various clients is important for system providers, whose business model relies on managing the infrastructure resources in a cost-effective manner while satisfying the client service level agreements (SLAs). In this paper, we address the issue of how to intelligently manage the resources in a shared cloud database system and present SmartSLA, a cost-aware resource management system. SmartSLA consists of two main components: the system modeling module and the resource allocation decision module. The system modeling module uses machine learning techniques to learn a model that describes the potential profit margins for each client under different resource allocations. Based on the learned model, the resource allocation decision module dynamically adjusts the resource allocations in order to achieve the optimum profits. We evaluate SmartSLA by using the TPC-W benchmark with workload characteristics derived from real-life systems. The performance results indicate that SmartSLA can successfully compute predictive models under different hardware resource allocations, such as CPU and memory, as well as database specific resources, such as the number of replicas in the database systems. The experimental results also show that SmartSLA can provide intelligent service differentiation according to factors such as variable workloads, SLA levels, resource costs, and deliver improved profit margins. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extensibility and Data Sharing in evolving multi-tenant databases

    Page(s): 99 - 110
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (834 KB) |  | HTML iconHTML  

    Software-as-a-Service applications commonly consolidate multiple businesses into the same database to reduce costs. This practice makes it harder to implement several essential features of enterprise applications. The first is support for master data, which should be shared rather than replicated for each tenant. The second is application modification and extension, which applies both to the database schema and master data it contains. The third is evolution of the schema and master data, which occurs as the application and its extensions are upgraded. These features cannot be easily implemented in a traditional DBMS and, to the extent that they are currently offered at all, they are generally implemented within the application layer. This approach reduces the DBMS to a `dumb data repository' that only stores data rather than managing it. In addition, it complicates development of the application since many DBMS features have to be re-implemented. Instead, a next-generation multi-tenant DBMS should provide explicit support for Extensibility, Data Sharing and Evolution. As these three features are strongly related, they cannot be implemented independently from each other. Therefore, we propose FLEXSCHEME which captures all three aspects in one integrated model. In this paper, we focus on efficient storage mechanisms for this model and present a novel versioning mechanism, called XOR Delta, which is based on XOR encoding and is optimized for main-memory DBMSs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic stream query optimization exploiting dynamic metadata

    Page(s): 111 - 122
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (651 KB) |  | HTML iconHTML  

    Data stream management systems (DSMS) processing long-running queries over large volumes of stream data must typically deliver time-critical responses. We propose the first semantic query optimization (SQO) approach that utilizes dynamic substream metadata at runtime to find a more efficient query plan than the one selected at compilation time. We identify four SQO techniques guaranteed to result in performance gains. Based on classic satisfiability theory we then design a lightweight query optimization algorithm that efficiently detects SQO opportunities at runtime. At the logical level, our algorithm instantiates multiple concurrent SQO plans, each processing different partially overlapping substreams. Our novel execution paradigm employs multi-modal operators to support the execution of these concurrent SQO logical plans in a single physical plan. This highly agile execution strategy reduces resource utilization while supporting lightweight adaptivity. Our extensive experimental study in the CAPE stream processing system using both synthetic and real data confirms that our optimization techniques significantly reduce query execution times, up to 60%, compared to the traditional approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.