Scheduled System Maintenance
On Saturday, October 21, single article sales and account management will be unavailable until 6 PM ET.
Notice: There is currently an issue with the citation download feature. Learn more.

2014 IEEE International Congress on Big Data

June 27 2014-July 2 2014

Filter Results

Displaying Results 1 - 25 of 138
  • [Front cover]

    Publication Year: 2014, Page(s): C4
    Request permission for commercial reuse | PDF file iconPDF (1561 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2014, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (26 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2014, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (107 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2014, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (119 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2014, Page(s):v - xvi
    Request permission for commercial reuse | PDF file iconPDF (165 KB)
    Freely Available from IEEE
  • Message from the Organizing Committee

    Publication Year: 2014, Page(s): xvii
    Request permission for commercial reuse | PDF file iconPDF (131 KB) | HTML iconHTML
    Freely Available from IEEE
  • Technical Program Committee

    Publication Year: 2014, Page(s):xviii - xxii
    Request permission for commercial reuse | PDF file iconPDF (147 KB)
    Freely Available from IEEE
  • External reviewers

    Publication Year: 2014, Page(s): xxiii
    Request permission for commercial reuse | PDF file iconPDF (122 KB)
    Freely Available from IEEE
  • Satellite Sessions Organizing Committees

    Publication Year: 2014, Page(s):xxiv - xxvii
    Request permission for commercial reuse | PDF file iconPDF (137 KB)
    Freely Available from IEEE
  • IEEE Computer Society Technical Committee on Services Computing

    Publication Year: 2014, Page(s): xxviii
    Request permission for commercial reuse | PDF file iconPDF (280 KB)
    Freely Available from IEEE
  • Service society

    Publication Year: 2014, Page(s): xxix
    Request permission for commercial reuse | PDF file iconPDF (150 KB)
    Freely Available from IEEE
  • GraphLens: Mining Enterprise Storage Workloads Using Graph Analytics

    Publication Year: 2014, Page(s):1 - 8
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (11730 KB) | HTML iconHTML

    Conventional methods used to analyze storage workloads have been centered on relational database technology combined with attributes-based classification algorithms. This paper presents a novel analytic architecture, GraphLens, for mining and analyzing real world storage traces. The design of our GraphLens system embodies three unique features. First, we model storage traces as heterogeneous trace... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FSM-H: Frequent Subgraph Mining Algorithm in Hadoop

    Publication Year: 2014, Page(s):9 - 16
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (495 KB) | HTML iconHTML

    Frequent subgraph mining (FSM) is an important task for exploratory data analysis on graph data. Over the years, many algorithms have been proposed to solve this task. These algorithms assume that the data structure of the mining task is small enough to fit in the main memory of a computer. However, as the real-world graph data grows, both in size and quantity, such an assumption does not hold any... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rectangle Counting in Large Bipartite Graphs

    Publication Year: 2014, Page(s):17 - 24
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (532 KB) | HTML iconHTML

    Rectangles are the smallest cycles (i.e., cycles of length 4) and most elementary sub-structures in a bipartite graph. Similar to triangle counting in uni-partite graphs, rectangle counting has many important applications where data is modeled as bipartite graphs. However, efficient algorithms for rectangle counting are lacking. We propose three different types of algorithms to cope with different... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Parallel Spatial Co-location Mining Algorithm Based on MapReduce

    Publication Year: 2014, Page(s):25 - 31
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (450 KB) | HTML iconHTML

    Spatial association rule mining is a useful tool for discovering correlations and interesting relationships among spatial objects. Co-locations, or sets of spatial events which are frequently observed together in close proximity, are particularly useful for discovering their spatial dependencies. Although a number of spatial co-location mining algorithms have been developed, the computation of co-... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Aware Scheduling of MapReduce Jobs

    Publication Year: 2014, Page(s):32 - 39
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (485 KB) | HTML iconHTML

    The majority of large-scale data intensive applications executed by data centers are based on MapReduce or its open-source implementation, Hadoop. Such applications are executed on large clusters requiring large amounts of energy, making the energy costs a large fraction of the data center's overall costs. Therefore minimizing the energy consumption when executing MapReduce jobs is a critical conc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Vigiles: Fine-Grained Access Control for MapReduce Systems

    Publication Year: 2014, Page(s):40 - 47
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (495 KB) | HTML iconHTML

    Security concerns surrounding the rise of Big Data systems have stimulated myriad new Big Data security models and implementations over the past few years. A significant disadvantage shared by most of these implementations is that they customize the underlying system source code to enforce new policies, making the customizations difficult to maintain as these layers evolve over time (e.g., over ve... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Denial-of-Service Threat to Hadoop/YARN Clusters with Multi-tenancy

    Publication Year: 2014, Page(s):48 - 55
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (596 KB) | HTML iconHTML

    This paper studies the vulnerability of unconstrained computing resources in Hadoop and the threat of denial-of-service to a Hadoop cluster with multitenancy. We model the problem of how many nodes in a Hadoop cluster can be invaded by a malicious user with given allocated capacity as a k-ping-pong balls to n-boxes problem, and solve the problem by simulation. We construct a discrete event simulat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Big Data Architecture for Large Scale Security Monitoring

    Publication Year: 2014, Page(s):56 - 63
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (543 KB) | HTML iconHTML

    Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterpri... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Contextual Anomaly Detection in Big Sensor Data

    Publication Year: 2014, Page(s):64 - 71
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1434 KB) | HTML iconHTML

    Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the cont... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Performance Spatial Query Processing on Big Taxi Trip Data Using GPGPUs

    Publication Year: 2014, Page(s):72 - 79
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (541 KB) | HTML iconHTML

    City-wide GPS recorded taxi trip data contains rich information for traffic and travel analysis to facilitate transportation planning and urban studies. However, traditional data management techniques are largely incapable of processing big taxi trip data at the scale of hundreds of millions. In this study, we aim at utilizing the General Purpose computing on Graphics Processing Units (GPGPUs) tec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Compatible LZMA ORC-Based Optimization for High Performance Big Data Load

    Publication Year: 2014, Page(s):80 - 87
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (663 KB) | HTML iconHTML

    This paper presents several efficient ways to improve data loading and storage optimization in Hadoop cluster. We design a new method to leverage LZMA and ORC to gain performance edge, also improve ORC implementation in HDFS to have a higher compression ratio and better IO throughput. A complete optimization strategy for efficient big data loading, including byte array-oriented, record split, less... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Storing a Collection of Differentially Compressed Files Recursively

    Publication Year: 2014, Page(s):88 - 95
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (640 KB) | HTML iconHTML

    A collection of files can be compressed by storing each file in the collection as a delta file: one file refers to several other files. The copy instructions in a delta file could reference other files either in their encoded forms or in their (original) unencoded forms. Because files are stored compressed, the latter approach suffers from a blowout in the number of files that need to be decoded t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • XDB - A Novel Database Architecture for Data Analytics as a Service

    Publication Year: 2014, Page(s):96 - 103
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (941 KB) | HTML iconHTML

    Parallel shared-nothing database systems are major platforms for efficiently analyzing large amounts of structured data. However, in order to offer SQL-like services for data analytics in the cloud, providers such as Amazon and Google do not use these systems as a basis. A major reason for this trend is that existing parallel shared-nothing database systems are expensive and that they do not fulfi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DeltaDB: A Scalable Database Design for Time-Varying Schema-Free Data

    Publication Year: 2014, Page(s):104 - 111
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (486 KB) | HTML iconHTML

    DeltaDB is a model for a database consisting of records with no fixed schema whose entire history is captured over time. It is designed to support efficient queries against the current state of the database, any point in the history of the database, and historical data aggregations over time. In this paper, we present the DeltaDB data model, the associated query algebra, and highlight the fundamen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.