By Topic

Data Engineering (ICDE), 2013 IEEE 29th International Conference on

Date 8-12 April 2013

Filter Results

Displaying Results 1 - 25 of 158
  • [USB label]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (195 KB)  
    Freely Available from IEEE
  • ICDE 2013 Conference [cover]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (647 KB)  
    Freely Available from IEEE
  • Hub page

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (197 KB)  
    Freely Available from IEEE
  • Session list

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (19 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): 1 - 16
    Save to Project icon | Request Permissions | PDF file iconPDF (85 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 1 - 12
    Save to Project icon | Request Permissions | PDF file iconPDF (52 KB)  
    Freely Available from IEEE
  • Detailed author index

    Page(s): 1 - 63
    Save to Project icon | Request Permissions | PDF file iconPDF (171 KB)  
    Freely Available from IEEE
  • The end of indexes

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (129 KB)  
    Freely Available from IEEE
  • ICDE 2012 [Abstracts book]

    Page(s): 1 - 128
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2117 KB)  

    Presents abstracts for the articles comprising the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [PDF Reader FAQ and support]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (126 KB)  
    Freely Available from IEEE
  • Frequently asked questions

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | PDF file iconPDF (512 KB)  
    Freely Available from IEEE
  • Message from the ICDE 2013 program committee and general chairs

    Page(s): i - ii
    Save to Project icon | Request Permissions | PDF file iconPDF (87 KB)  
    Freely Available from IEEE
  • Panel: Big data for the public

    Page(s): iii
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (73 KB)  

    Summary form only given. While data are now being produced and collected on unprecedented scales, most of the "big data" remain inaccessible or difficult to use by the public. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proceedings [editors]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • Committees

    Page(s): iv - xi
    Save to Project icon | Request Permissions | PDF file iconPDF (116 KB)  
    Freely Available from IEEE
  • Patrons and supporters

    Page(s): xii
    Save to Project icon | Request Permissions | PDF file iconPDF (843 KB)  
    Freely Available from IEEE
  • Hardware killed the software star

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (76 KB) |  | HTML iconHTML  

    Until relatively recently, the development of data processing applications took place largely ignoring the underlying hardware. Only in niche applications (supercomputing, embedded systems) or in special software (operating systems, database internals, language runtimes) did (some) programmers had to pay attention to the actual hardware where the software would run. In most cases, working atop the abstractions provided by either the operating system or by system libraries was good enough. The constant improvements in processor speed did the rest. The new millennium has radically changed the picture. Driven by multiple needs - e.g., scale, physical constraints, energy limitations, virtualization, business models- hardware architectures are changing at a speed and in ways that current development practices for data processing cannot accommodate. From now on, software will have to be developed paying close attention to the underlying hardware and following strict performance engineering principles. In this paper, several aspects of the ongoing hardware revolution and its impact on data processing are analysed, pointing to the need for new strategies to tackle the challenges ahead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recent progress towards an ecosystem of structured data on the Web

    Page(s): 5 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (511 KB) |  | HTML iconHTML  

    Google Fusion Tables aims to support an ecosystem of structured data on the Web by providing a tool for managing and visualizing data on the one hand, and for searching and exploring for data on the other. This paper describes a few recent developments in our efforts to further the ecosystem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Re-thinking the performance of information processing systems

    Page(s): 9 - 13
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (191 KB) |  | HTML iconHTML  

    Recent advances in hardware and software technologies have enabled us to re-think how we architect databases to meet the demands of today's information systems. However, this makes existing performance evaluation metrics obsolete. In this paper, I describe SAP HANA a novel, powerful database platform that leverages the availability of large main memory and massively parallel processors. Based on this, I propose a new, multi-dimensional performance metric that better reflects the value expected from today's complex information systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CPU and cache efficient management of memory-resident databases

    Page(s): 14 - 25
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (947 KB) |  | HTML iconHTML  

    Memory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementations, bandwidth savings achieved by partial decomposition come at increased CPU costs. To achieve the aspired bandwidth savings without sacrificing CPU efficiency, we combine partially decomposed storage with Just-in-Time (JiT) compilation of queries, thus eliminating CPU inefficient function calls. Since existing cost based optimization components are not designed for JiT-compiled query execution, we also develop a novel approach to cost modeling and subsequent storage layout optimization. Our evaluation shows that the JiT-based processor maintains the bandwidth savings of previously presented hybrid query processors but outperforms them by two orders of magnitude due to increased CPU efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying hot and cold data in main-memory databases

    Page(s): 26 - 37
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (507 KB) |  | HTML iconHTML  

    Main memories are becoming sufficiently large that most OLTP databases can be stored entirely in main memory, but this may not be the best solution. OLTP workloads typically exhibit skewed access patterns where some records are hot (frequently accessed) but many records are cold (infrequently or never accessed). It is more economical to store the coldest records on secondary storage such as flash. As a first step towards managing cold data in databases optimized for main memory we investigate how to efficiently identify hot and cold data. We propose to log record accesses - possibly only a sample to reduce overhead - and perform offline analysis to estimate record access frequencies. We present four estimation algorithms based on exponential smoothing and experimentally evaluate their efficiency and accuracy. We find that exponential smoothing produces very accurate estimates, leading to higher hit rates than the best caching techniques. Our most efficient algorithm is able to analyze a log of 1B accesses in sub-second time on a workstation-class machine. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The adaptive radix tree: ARTful indexing for main-memory databases

    Page(s): 38 - 49
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (289 KB) |  | HTML iconHTML  

    Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART's performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and prefix lookup. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finding connected components in map-reduce in logarithmic rounds

    Page(s): 50 - 61
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1290 KB) |  | HTML iconHTML  

    Given a large graph G = (V, E) with millions of nodes and edges, how do we compute its connected components efficiently? Recent work addresses this problem in map-reduce, where a fundamental trade-off exists between the number of map-reduce rounds and the communication of each round. Denoting d the diameter of the graph, and n the number of nodes in the largest component, all prior techniques for map-reduce either require a linear, Θ(d), number of rounds, or a quadratic, Θ (n|V| + |E|), communication per round. We propose here two efficient map-reduce algorithms: (i) Hash-Greater-to-Min, which is a randomized algorithm based on PRAM techniques, requiring O(log n) rounds and O(|V | + |E|) communication per round, and (ii) Hash-to-Min, which is a novel algorithm, provably finishing in O(log n) iterations for path graphs. The proof technique used for Hash-to-Min is novel, but not tight, and it is actually faster than Hash-Greater-to-Min in practice. We conjecture that it requires 2 log d rounds and 3(|V| + |E|) communication per round, as demonstrated in our experiments. Using secondary sorting, a standard map-reduce feature, we scale Hash-to-Min to graphs with very large connected components. Our techniques for connected components can be applied to clustering as well. We propose a novel algorithm for agglomerative single linkage clustering in map-reduce. This is the first map-reduce algorithm for clustering in at most O(log n) rounds, where n is the size of the largest cluster. We show the effectiveness of all our algorithms through detailed experiments on large synthetic as well as real-world datasets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enumerating subgraph instances using map-reduce

    Page(s): 62 - 73
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (438 KB) |  | HTML iconHTML  

    The theme of this paper is how to find all instances of a given “sample” graph in a larger “data graph,” using a single round of map-reduce. For the simplest sample graph, the triangle, we improve upon the best known such algorithm. We then examine the general case, considering both the communication cost between mappers and reducers and the total computation cost at the reducers. To minimize communication cost, we exploit the techniques of [1] for computing multiway joins (evaluating conjunctive queries) in a single map-reduce round. Several methods are shown for translating sample graphs into a union of conjunctive queries with as few queries as possible. We also address the matter of optimizing computation cost. Many serial algorithms are shown to be “convertible,” in the sense that it is possible to partition the data graph, explore each partition in a separate reducer, and have the total computation cost at the reducers be of the same order as the computation cost of the serial algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable maximum clique computation using MapReduce

    Page(s): 74 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (295 KB) |  | HTML iconHTML  

    We present a scalable and fault-tolerant solution for the maximum clique problem based on the MapReduce framework. The key contribution that enables us to effectively use MapReduce is a recursive partitioning method that partitions the graph into several subgraphs of similar size. After partitioning, the maximum cliques of the different partitions can be computed independently, and the computation is sped up using a branch and bound method. Our experiments show that our approach leads to good scalability, which is unachievable by other partitioning methods since they result in partitions of different sizes and hence lead to load imbalance. Our method is more scalable than an MPI algorithm, and is simpler and more fault tolerant. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.