Scheduled System Maintenance
On Thursday, July 20, IEEE Xplore will undergo scheduled maintenance from 1:00-3:00 PM ET.
During this time, there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

2015 IEEE International Conference on Big Data (Big Data)

Oct. 29 2015-Nov. 1 2015

Filter Results

Displaying Results 1 - 25 of 410
  • [Front cover]

    Publication Year: 2015, Page(s): 1
    Request permission for commercial reuse | PDF file iconPDF (302 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2015, Page(s): 1
    Request permission for commercial reuse | PDF file iconPDF (54 KB)
    Freely Available from IEEE
  • Welcome message from the organizers

    Publication Year: 2015, Page(s):1 - 2
    Request permission for commercial reuse | PDF file iconPDF (69 KB)
    Freely Available from IEEE
  • Organization

    Publication Year: 2015, Page(s):1 - 2
    Request permission for commercial reuse | PDF file iconPDF (93 KB)
    Freely Available from IEEE
  • Main conference program committee members

    Publication Year: 2015, Page(s):1 - 7
    Request permission for commercial reuse | PDF file iconPDF (118 KB)
    Freely Available from IEEE
  • How big data changes statistical machine learning

    Publication Year: 2015, Page(s): 1
    Request permission for commercial reuse | PDF file iconPDF (113 KB) | HTML iconHTML
    Freely Available from IEEE
  • Moving past the "Wild West" era for Big Data

    Publication Year: 2015, Page(s): 2
    Request permission for commercial reuse | PDF file iconPDF (91 KB) | HTML iconHTML
    Freely Available from IEEE
  • Conquering Big Data with Spark

    Publication Year: 2015, Page(s): 3
    Request permission for commercial reuse | PDF file iconPDF (90 KB) | HTML iconHTML
    Freely Available from IEEE
  • Online and on-demand partitioning of streaming graphs

    Publication Year: 2015, Page(s):4 - 13
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (707 KB) | HTML iconHTML

    Many applications generate data that naturally leads to a graph representation for its modeling and analysis. A common approach to address the size and complexity of these graphs is to split them across a number of partitions, in a way that computations on them can be performed mostly locally and in parallel in the resulting partitions. In this work, we present a framework that enables partitionin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning to accurately COUNT with query-driven predictive analytics

    Publication Year: 2015, Page(s):14 - 23
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (274 KB)

    We study a novel solution to executing aggregation (and specifically COUNT) queries over large-scale data. The proposed solution is generally applicable, in the sense that it can be deployed in environments in which data owners may or may not restrict access to their data and allow only `aggregation operators' to be executed over their data. For this, it is based on predictive analytics, driven by... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Practical message-passing framework for large-scale combinatorial optimization

    Publication Year: 2015, Page(s):24 - 31
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (300 KB) | HTML iconHTML

    Graphical Model (GM) has provided a popular framework for big data analytics because it often lends itself to distributed and parallel processing by utilizing graph-based `local' structures. It models correlated random variables where in particular, the max-product Belief Propagation (BP) is the most popular heuristic to compute the most-likely assignment in GMs. In the past years, it has been pro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rewriting complex SPARQL analytical queries for efficient cloud-based processing

    Publication Year: 2015, Page(s):32 - 37
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (513 KB) | HTML iconHTML

    Many emerging Semantic Web applications combine and aggregate data across domains for analysis. Such analytical queries compute aggregates over multiple groupings of data, resulting in query plans with complex grouping-aggregation constraints. In the context of an RDF analytical query, each such grouping maps to a graph pattern subquery with multiple join operations, and related groups often resul... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concept hierarchies and human navigation

    Publication Year: 2015, Page(s):38 - 45
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (716 KB) | HTML iconHTML

    We are confronted with massive amounts of information at every turn. In order to efficiently reason about knowledge and information, humans have evolved efficient strategies for organizing complex concepts in order to form connections between and recall information. This behavior can be observed and codified when people search for objects within digital information networks. Current models of sear... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Iteratively refining SVMs using priors

    Publication Year: 2015, Page(s):46 - 52
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (395 KB) | HTML iconHTML

    Research on scalable machine learning algorithms has gained a considerable amount of traction since the exponential growth in data assets during the past decades. Many Big Data applications resort to somewhat "simple" data modelling techniques due to the computational constraints associated with more complex models. Simple models, while being very efficient to estimate, often fail to capture some ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards scalable quantile regression trees

    Publication Year: 2015, Page(s):53 - 60
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1063 KB) | HTML iconHTML

    We provide an algorithm to build quantile regression trees in O(N log N) time, where N is the number of instances in the training set. Quantile regression trees are regression trees that model conditional quantiles of the response variable, rather than the conditional expectation as in standard regression trees. We build quantile regression trees by using the quantile loss function in our node spl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Super-CWC and super-LCC: Super fast feature selection algorithms

    Publication Year: 2015, Page(s):1 - 7
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (477 KB) | HTML iconHTML

    Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain phenomena, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, advanced feature selection algorithms that can avoid redundant selection of feat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Considerations and recommendations for data availability for data analytics for manufacturing

    Publication Year: 2015, Page(s):68 - 75
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (522 KB) | HTML iconHTML

    Data analytics is increasingly becoming recognized as a valuable set of tools and techniques for improving performance in the manufacturing enterprise. However, data analytics requires data and a lack of useful and usable data has become an impediment to research in data analytics. In this paper, we describe issues that would help aid data availability including data quality, reliability, efficien... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ScaleGraph: A high-performance library for billion-scale graph analytics

    Publication Year: 2015, Page(s):76 - 84
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (645 KB) | HTML iconHTML

    Recently, large-scale graph analytics has become a very popular topic owing to the emergence of gigantic graphs whose number of vertices and edges is in millions, billions or even trillions. Many graph analytics libraries and frameworks have been proposed with various computational models and programming languages to deal with such graphs. X10 programming language is a PGAS language that aims at b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System and architecture level characterization of big data applications on big and little core server architectures

    Publication Year: 2015, Page(s):85 - 94
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB) | HTML iconHTML

    Emerging Big Data applications require a significant amount of server computational power. Big data analytics applications rely heavily on specific deep machine learning and data mining algorithms, and exhibit high computational intensity, memory intensity, I/O intensity and control intensity. Big data applications require computing resources that can efficiently scale to manage massive amounts of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data streaming algorithms for the Kolmogorov-Smirnov test

    Publication Year: 2015, Page(s):95 - 104
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (226 KB)

    We propose space-efficient algorithms for performing the Kolmogorov-Smirnov test on streaming data. The Kolmogorov-Smirnov test is a non-parametric test for measuring the strength of a hypothesis that some data is drawn from a fixed distribution (one-sample test), or that two sets of data are drawn from the same distribution (two-sample test). Unlike some other tests, Kolmogorov-Smirnov does not a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Techniques for fast and scalable time series traffic generation

    Publication Year: 2015, Page(s):105 - 114
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (438 KB) | HTML iconHTML

    Many IoT applications ingest and process time series data with emphasis on 5Vs (Volume, Velocity, Variety, Value and Veracity). To design and test such systems, it is desirable to have a high-performance traffic generator specifically designed for time series data, preferably using archived data to create a truly realistic workload. However, most existing traffic generator tools either are designe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-efficient acceleration of big data analytics applications using FPGAs

    Publication Year: 2015, Page(s):115 - 123
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (617 KB) | HTML iconHTML

    A recent trend for big data analytics is to provide heterogeneous architectures to allow support for hardware specialization. Considering the time dedicated to create such hardware implementations, an analysis that estimates how much benefit we gain in terms of speed and energy efficiency, through offloading various functions to hardware would be necessary. This work analyzes data mining and machi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workload scheduling in distributed stream processors using graph partitioning

    Publication Year: 2015, Page(s):124 - 133
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (345 KB) | HTML iconHTML

    With ever increasing data volumes, large compute clusters that process data in a distributed manner have become prevalent in industry. For distributed stream processing platforms (such as Storm) the question of how to distribute workload to available machines, has important implications for the overall performance of the system. We present a workload scheduling strategy that is based on a graph pa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating different distributed-cyber-infrastructure for data and compute intensive scientific application

    Publication Year: 2015, Page(s):134 - 143
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (162 KB) | HTML iconHTML

    Scientists are increasingly using the current state of the art big data analytic software (e.g., Hadoop, Giraph, etc.) for their data-intensive applications over HPC environment. However, understanding and designing the hardware environment that these data- and compute-intensive applications require for good performance is challenging. With this motivation, we evaluated the performance of big data... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join

    Publication Year: 2015, Page(s):144 - 153
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (315 KB)

    The inherently large and varying volumes of data generated to facilitate autonomous functionality in large scale cyber-physical systems demand near real-time processing of data streams, often as close to the sensing devices as possible. In this context, data streaming is imperative for data-intensive processing infrastructures. Stream joins, the streaming counterpart of database joins, compare tup... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.