IEEE Transactions on Knowledge and Data Engineering

Issue 5 • 1 May 2019

Filter Results

Displaying Results 1 - 14 of 14
  • A Cost Model for SPARK SQL

    Publication Year: 2019, Page(s):819 - 832
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1248 KB) | HTML iconHTML

    In this paper, we propose a novel cost model for Spark SQL. The cost model covers the class of Generalized Projection, Selection, Join (GPSJ) queries. The cost model keeps into account the network and IO costs as well as the most relevant CPU costs. The execution cost is computed starting from a physical plan produced by Spark. The set of operations adopted by Spark when executing a GPSJ query are... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Survey on Network Embedding

    Publication Year: 2019, Page(s):833 - 852
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (4201 KB)

    Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning

    Publication Year: 2019, Page(s):853 - 866
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (726 KB) | HTML iconHTML

    The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DLTA: A Framework for Dynamic Crowdsourcing Classification Tasks

    Publication Year: 2019, Page(s):867 - 879
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (2273 KB) | HTML iconHTML

    The increasing popularity of crowdsourcing markets enables the application of crowdsourcing classification tasks. How to conduct quality control in such an application to achieve accurate classification results from noisy workers is an important and challenging task, and has drawn broad research interests. However, most existing works do not exploit the label acquisition phase, which results in th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Feature Selection via $\ell _{2,0}$ℓ2,0-norm Constrained Sparse Regression

    Publication Year: 2019, Page(s):880 - 893
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (477 KB) | HTML iconHTML

    Sparse regression based feature selection method has been extensively investigated these years. However, because it has a non-convex constraint, i.e., $\ell _{2,0}$ℓ2,0-norm constraint, this problem is very hard to solve. In this paper, unlike most of the other methods which only solve its slack version by introducing sparsity regularization into objective function forcibly, a novel framework is p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FROG: A Fast and Reliable Crowdsourcing Framework

    Publication Year: 2019, Page(s):894 - 908
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1547 KB) | HTML iconHTML

    For decades, the crowdsourcing has gained much attention from both academia and industry, which outsources a number of tasks to human workers. Typically, existing crowdsourcing platforms include CrowdFlower, Amazon Mechanical Turk (AMT), and so on, in which workers can autonomously select tasks to do. However, due to the unreliability of workers or the difficulties of tasks, workers may sometimes ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HyperX: A Scalable Hypergraph Framework

    Publication Year: 2019, Page(s):909 - 922
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (2000 KB)

    Hypergraphs are generalizations of graphs where the (hyper)edges can connect any number of vertices. They are powerful tools for representing complex and non-pairwise relationships. However, existing graph computation frameworks cannot accommodate hypergraphs without converting them into graphs, because they do not offer APIs that support (hyper)edges directly. This graph conversion may create exc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Latent Ability Model: A Generative Probabilistic Learning Framework for Workforce Analytics

    Publication Year: 2019, Page(s):923 - 937
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (2022 KB)

    As more business workflow systems are being deployed in modern enterprises and organizations, more employee-activity log data are being collected and analyzed. In this paper, we develop a latent ability model (LAM) as a generative probabilistic learning framework for workforce analytics over employee-activity logs. The LAM development is novel in three aspects. First, we introduce the concept of l... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Customer Behaviors for Effective Load Forecasting

    Publication Year: 2019, Page(s):938 - 951
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (687 KB) |  Multimedia Media

    Load forecasting has been deeply studied because of its critical role in Smart Grid. In current Smart Grid, there are various types of customers with different energy consumption patterns. Customer’s energy consumption patterns are referred to as customer behaviors. It would significantly benefit load forecasting in a grid if customer behaviors could be taken into account. This paper proposes an i... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Near-accurate Multiset Reconciliation

    Publication Year: 2019, Page(s):952 - 964
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1201 KB)

    The mission of set reconciliation (also called set synchronization) is to identify those elements which appear only in exactly one of two given sets. In this paper, we extend the set reconciliation problem into three design rationales: (i) multiset support; (ii) near 100 percent reconciliation accuracy; and (iii) communication-friendly and time-saving. These three rationales, if realized, will lea... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Privacy Engineering for the Smart Micro-Grid

    Publication Year: 2019, Page(s):965 - 980
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1967 KB) | HTML iconHTML Multimedia Media

    In developing countries, reliable electricity access is often undermined by the absence of supply from the national power grid and/or load shedding. To alleviate this problem, smart micro-grid (SMG) networks that are small scale distributed electricity provision networks composed of individual electricity providers and consumers, are being increasingly deployed. To ensure the reliable operation of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PrivateGraph: Privacy-Preserving Spectral Analysis of Encrypted Graphs in the Cloud

    Publication Year: 2019, Page(s):981 - 995
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (738 KB) | HTML iconHTML

    Big graphs, such as user interactions in social networks and customer rating matrices in collaborative filters, possess great values for both businesses and research. They are not only big but often keep evolving, which requires a large amount of computing resources to maintain. With the wide deployment of public cloud resources, owners of big graphs may want to use cloud resources to obtain stora... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video

    Publication Year: 2019, Page(s):996 - 1009
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (2826 KB)

    Automatic text summarization is a fundamental natural language processing (NLP) application that aims to condense a source text into a shorter version. The rapid increase in multimedia data transmission over the Internet necessitates multi-modal summarization (MMS) from asynchronous collections of text, image, audio, and video. In this work, we propose an extractive MMS method that unites the tech... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-Time Change Point Detection with Application to Smart Home Time Series Data

    Publication Year: 2019, Page(s):1010 - 1023
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1195 KB) |  Multimedia Media

    Change Point Detection (CPD) is the problem of discovering time points at which the behavior of a time series changes abruptly. In this paper, we present a novel real-time nonparametric change point detection algorithm called SEP, which uses Separation distance as a divergence measure to detect change points in high-dimensional time series. Through experiments on artificial and real-world datasets... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Xuemin Lin
University of New South Wales

Associate Editor-in-Chief
Lei Chen
Hong Kong University of Science and Technology