2018 IEEE International Conference on Big Data (Big Data)

10-13 Dec. 2018

Filter Results

Displaying Results 1 - 25 of 773
  • [Front cover]

    Publication Year: 2018, Page(s): c1
    Request permission for reuse | PDF file iconPDF (211 KB)
    Freely Available from IEEE
  • [Front cover]

    Publication Year: 2018, Page(s):c1 - c3
    Request permission for reuse | PDF file iconPDF (125 KB)
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2018, Page(s):i - xiii
    Request permission for reuse | PDF file iconPDF (254 KB)
    Freely Available from IEEE
  • Committee Members

    Publication Year: 2018, Page(s):i - iii
    Request permission for reuse | PDF file iconPDF (78 KB)
    Freely Available from IEEE
  • Big Data 2018 Index

    Publication Year: 2018, Page(s):i - xxxvii
    Request permission for reuse | PDF file iconPDF (2064 KB)
    Freely Available from IEEE
  • Decentralized Machine Learning

    Publication Year: 2018, Page(s): 1
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (112 KB) | HTML iconHTML

    Summary form only given. In the past decade we have seen very rapid growth in two fields: cloud services, and neural networks. These two are connected, in that logs from services are the fuel that has powered data-hungry deep learning algorithms. However, there are several forces on the other side of the coin, pushing neural capabilities onto the device and out of the cloud. These include: the dev... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Big Data for Speech and Language Processing

    Publication Year: 2018, Page(s): 2
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (120 KB) | HTML iconHTML

    Amongst all creatures the human species stands unique in Darwin's natural selection process. It is no exaggeration that speech and language helped to differentiate human intelligence from animal intelligence in the evolution process. The impact of big data and cloud to speech and language evolution is foundational to realize the society's AI vision. This talk will review how Microsoft achieved hum... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transformational Role of Big Data in Society 5.0

    Publication Year: 2018, Page(s): 3
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (113 KB) | HTML iconHTML

    Japan is launching `Society 5.0', the vision for a future smarter society. One of the fundamental pillars of Society 5.0 is to help the society become smarter in a data-driven way. Through the advance of Internet of Things (IoT), the rapidly growing big data is substantially transforming our society, for example, through smarter commercial products and services. In this talk, we will focus on the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Three principles of data science: predictability, computability, and stability (PCS)

    Publication Year: 2018, Page(s): 4
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (125 KB) | HTML iconHTML

    In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title and the PCS workflow that is built on the three principles. The principles will be demonstrated in the context of two collaborative projects in neuroscience and genomics for interpretable data results and testable hypothesis generation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Metric Learning for Complex Data Analysis

    Publication Year: 2018, Page(s): 5
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (121 KB) | HTML iconHTML

    Comparing and measuring similarities or distances between pairs of instances is a basic but important step toward successes of many data mining and machine learning approaches. In this talk, I will discuss how both linear and nonlinear metric learning can be approached to capture various important relationships for complex data sets and how the learned metrics can be used for complex data analysis... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Heuristics Significance of Neuro-Ensemble-based Time Series Classification

    Publication Year: 2018, Page(s):6 - 15
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (691 KB) | HTML iconHTML

    Ensemble learning is a popular paradigm for improving the predictive performance of individual classifiers. In this work, we approach the problem of ensemble learning from an optimization perspective applied on time series data. We propose Neuro-Ensemble, a classifier fusion model based on a shallow Multi-Layer Perceptron (MLP) meta-learner. The neural network learns the expertise of each classifi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Best-Choice Edge Grafting for Efficient Structure Learning of Markov Random Fields

    Publication Year: 2018, Page(s):16 - 25
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1460 KB) | HTML iconHTML

    Incremental methods for structure learning of pairwise Markov random fields (MRFs), such as grafting, improve scalability by avoiding inference over the entire feature space in each optimization step. Instead, inference is performed over an incrementally grown active set of features. In this paper, we address key computational bottlenecks that current incremental techniques still suffer by introdu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting Latent Structure Uncertainty with Structural Entropy

    Publication Year: 2018, Page(s):26 - 35
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (373 KB) | HTML iconHTML

    This paper proposes a new method for detecting the uncertainty of a latent structure. We consider the case where the latent structure of dataset changes gradually over time, with the goal of selecting the optimal model at any given time. In selecting the optimal model, we use the minimum description length (MDL) principle, specifically the normalized maximum likelihood (NML), which is the optimal ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incorporating Prior Domain Knowledge into Deep Neural Networks

    Publication Year: 2018, Page(s):36 - 45
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1159 KB) | HTML iconHTML

    In recent years, the large amount of labeled data available has also helped tend research toward using minimal domain knowledge, e.g., in deep neural network research. However, in many situations, data is limited and of poor quality. Can domain knowledge be useful in such a setting? In this paper, we propose domain adapted neural networks (DANN) to explore how domain knowledge can be integrated in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets

    Publication Year: 2018, Page(s):46 - 55
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (842 KB) | HTML iconHTML

    Modern machine learning (ML) models are being used heavily in business domains to build effective decision support systems. As a primary requirement, supervised ML models need large labeled datasets. However, obtaining a high volume of labeled training data is both expensive and time-consuming. Researchers have proposed several labeling approaches to avoid manual labeling efforts. Active learning ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semi-supervised Deep Representation Learning for Multi-View Problems

    Publication Year: 2018, Page(s):56 - 64
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1264 KB) | HTML iconHTML

    While neural networks for learning representation of multi-view data have been previously proposed as one of the state-of-the-art multi-view dimension reduction techniques, how to make the representation discriminative with only a small amount of labeled data is not well-studied. We introduce a semi-supervised neural network model, named Multi-view Discriminative Neural Network (MDNN), for multi-v... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear Models with Many Cores and CPUs: A Stochastic Atomic Update Scheme

    Publication Year: 2018, Page(s):65 - 73
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (343 KB) | HTML iconHTML

    Linear models are fast to train, apply, and still state of the art for sparse and high dimensional problems. Their computational efficiency makes them difficult to parallelize, with the standard multi-core approaches often diverging after more than 8 cores are added. We propose a Stochastic Atomic Update Scheme (SAUS) for training linear models on many core machines. It is simple to implement, red... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Projection-SVM: Distributed Kernel Support Vector Machine for Big Data using Subspace Partitioning

    Publication Year: 2018, Page(s):74 - 83
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (469 KB) | HTML iconHTML

    The training of kernel support vector machine (SVM) is a computationally complex task for large datasets where the number of samples ranges in millions. This is because kernel matrix (in general not sparse) is both computation expensive and memory intensive. Existing methods hardly achieve a linear scale and suffer from high approximation loss. We propose Projection-SVM, a distributed implementati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DeepFP: A Deep Learning Framework For User Fingerprinting via Mobile Motion Sensors

    Publication Year: 2018, Page(s):84 - 91
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1304 KB) | HTML iconHTML

    In this paper, we propose a deep learning framework for user fingerprinting via mobile motion sensors, DeepFP, which can identify and track users based on their behavioral patterns while interacting with the smartphone. Existing machine learning techniques for user identification are classification-oriented and thus are not amenable easily to large-scale, real world deployment. They need to be tra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • AdaDIF: Adaptive Diffusions for Efficient Semi-supervised Learning over Graphs

    Publication Year: 2018, Page(s):92 - 99
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (453 KB) | HTML iconHTML

    Diffusion-based classifiers such as those relying on the Personalized PageRank and the Heat kernel, enjoy remarkable classification accuracy at modest computational requirements. Their performance however is affected by the extent to which the chosen diffusion captures a typically unknown label propagation mechanism, that can be specific to the underlying graph, and potentially different for each ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Topological approaches to skin disease image analysis

    Publication Year: 2018, Page(s):100 - 105
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (3869 KB) | HTML iconHTML

    Skin cancer is one of the most common cancers in the United States. As technological advancements are made, algorithmic diagnosis of skin lesions is becoming more important. In this paper, we develop algorithms for segmenting the actual diseased area of skin in a given image of a skin lesion, and for classifying different types of skin lesions pictured in a given image. The cores of the algorithms... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

    Publication Year: 2018, Page(s):106 - 111
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (227 KB) | HTML iconHTML

    Subspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA sequences. However, existing algorithms have limitations in finding clusters in non-disjoint subspaces and scaling to large data, which impinge their applicability ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Securing Behavior-based Opinion Spam Detection

    Publication Year: 2018, Page(s):112 - 117
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (406 KB) | HTML iconHTML

    Reviews spams are prevalent in e-commerce to manipulate product ranking and customers decisions maliciously. While spams generated based on simple spamming strategy can be detected effectively, hardened spammers can evade regular detectors via more advanced spamming strategies. Previous work gave more attention to evasion against text and graph-based detectors, but evasions against behavior-based ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scaling up Inference in MLNs with Spark

    Publication Year: 2018, Page(s):118 - 125
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (487 KB) | HTML iconHTML

    Typically, inference algorithms for big data address non-relational data. However, clearly, a lot of real-world data such as social network data, healthcare data, etc. are relational in nature. Therefore, we need more powerful techniques that can scale up richer inference algorithms on relational data. Markov Logic Networks (MLNs) are arguably one of the most popular statistical relational models ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Representation Learning for Question Classification via Topic Sparse Autoencoder and Entity Embedding

    Publication Year: 2018, Page(s):126 - 133
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (487 KB) | HTML iconHTML

    Deep learning models have achieved great successes these days. There are intensive studies of word representation learning for question classification. As questions are typically short texts, existing techniques are often not effective for extracting discriminative representations of questions just from a limited number of words. This motivates us to exploit additional information beyond words in ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.