• ### T-PCCE: Twitter Personality based Communicative Communities Extraction System for Big Data

Publication Year: 2019, Page(s): 1
The identification of social media communities has recently been of major concern, since users participating in such communities can contribute to viral marketing campaigns. In this work we focus on users' communication considering personality as a key characteristic for identifying communicative networks i.e. networks with high information flows. We describe the Twitter Personality based Communic... View full abstract»

• ### Active Online Learning for Social Media Analysis to Support Crisis Management

Publication Year: 2019, Page(s): 1
People use social media (SM) to describe and discuss different situations they are involved in, like crises. It is therefore worthwhile to exploit SM contents to support crisis management, in particular by revealing useful and unknown information about the crises in real-time. Hence, we propose a novel active online multiple-prototype classifier, called AOMPC. It identifies relevant data related t... View full abstract»

• ### Voice of Charity: Prospecting the Donation Recurrence & Donor Retention in Crowdfunding

Publication Year: 2019, Page(s): 1
Online donation-based crowdfunding has brought new life to charity by soliciting small monetary contributions from crowd donors to help others in trouble or with dreams. However, a crucial issue for crowdfunding platforms as well as traditional charities is the problem of high donor attrition, i.e., many donors donate only once or very few times within a rather short lifecycle and then leave. Thus... View full abstract»

• ### Translation-Based Sequential Recommendation for Complex Users on Sparse Data

Publication Year: 2019, Page(s): 1
Sequential recommendation is one of the main tasks in recommender systems, where the next action (e.g., purchase, visit, click) of the user is predicted based on his/her past sequence of actions. Translating Embeddings is a knowledge graph completion approach which was recently adapted to a translation-based sequential recommendation (TransRec) method. We observe a flaw of TransRec when handling c... View full abstract»

• ### Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation

Publication Year: 2019, Page(s): 1
Most previous work on outfit recommendation focuses on designing visual features to enhance recommendations. Existing work neglects user comments of fashion items, which have been proved to be effective in generating explanations along with better recommendation results. We propose a novel neural network framework, neural outfit recommendation (NOR), that simultaneously provides outfit recommendat... View full abstract»

• ### ROAM: A Fundamental Routing Query on Road Networks with Efficiency

Publication Year: 2019, Page(s): 1
Novel road-network applications often recommend a moving object (e.g., a vehicle) about interesting services or tasks on its way to a destination. A taxi-sharing system, for instance, suggests a new passenger to a taxi while it is serving another one. The traveling cost is then shared among these passengers. A fundamental query is: given two nodes $s$ and View full abstract»

• ### Cleaning Data with Forbidden Itemsets

Publication Year: 2019, Page(s): 1
Methods for cleaning dirty data typically employ additional information about the data, such as user-provided constraints specifying when data is dirty, e.g., domain restrictions, illegal value combinations, or logical rules. However, real-world scenarios usually only have dirty data available, without known constraints. In such settings, constraints are automatically discovered on dirty data and ... View full abstract»

• ### Model-Based Synthetic Sampling for Imbalanced Data

Publication Year: 2019, Page(s): 1
Imbalanced data is characterized by the severe difference in observation frequency between classes and has received a lot of attention in data mining research. The prediction performances usually deteriorate as classifiers learn from imbalanced data, as most classifiers assume the class distribution is balanced or the costs for different types of classification errors are equal. Although several m... View full abstract»

• ### Generative Adversarial Active Learning for Unsupervised Outlier Detection

Publication Year: 2019, Page(s): 1
Outlier detection is an important topic in machine learning and has been used in a wide range of applications. In this paper, we approach outlier detection as a binary-classification issue by sampling potential outliers from a uniform reference distribution. However, due to the sparsity of data in high-dimensional space, a limited number of potential outliers may fail to provide sufficient informa... View full abstract»

• ### Discovery and Recognition of Emerging Human Activities Using a Hierarchical Mixture of Directional Statistical Models

Publication Year: 2019, Page(s): 1
Human activity recognition plays a significant role in enabling pervasive applications as it abstracts low-level noisy sensor data into high-level human activities, which applications can respond to. With more and more activity-aware applications deployed in real-world environments, a research challenge emerges -- discovering and learning new activities that have not been pre-defined or observed i... View full abstract»

• ### Making Compiling Query Engines Practical

Publication Year: 2019, Page(s): 1
Compiling queries to machine code is a very efficient way for executing queries. One often overlooked problem with compilation is the time it takes to generate machine code. Even with fast compilation frameworks like LLVM, generating machine code for complex queries often takes hundreds of milliseconds. Such durations can be a major disadvantage for workloads that execute many complex, but quick q... View full abstract»

• ### Efficient Contour Computation of Group-based Skyline

Publication Year: 2019, Page(s): 1
Skyline, aiming at finding a Pareto optimal subset of points in a multi-dimensional dataset, has gained great interest due to its extensive use for multi-criteria analysis and decision making. However, conventional skyline queries, which return individual points, are inadequate in group querying case since optimal combinations are required. To address this gap, we study the skyline computation in ... View full abstract»

• ### Inferring Full Diffusion History from Partial Timestamps

Publication Year: 2019, Page(s): 1
Understanding diffusion processes in networks has emerged as an important research topic because of its wide range of applications. Analysis of diffusion traces can help us answer important questions such as the source(s) of diffusion and the role of each node during the diffusion process. However, in large-scale networks, due to the cost and privacy concerns, it is almost impossible to monitor th... View full abstract»

• ### Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach

Publication Year: 2019, Page(s): 1
Cyber-epidemics, the widespread of fake news or propaganda through social media, can cause devastating economic and political consequences. A common countermeasure against cyber-epidemics is to disable a small subset of suspected social connections or accounts to effectively contain the epidemics. An example is the recent shutdown of 125,000 ISIS-related Twitter accounts. Despite many proposed met... View full abstract»

• ### Modeling and Computing Probabilistic Skyline on Incomplete Data

Publication Year: 2019, Page(s): 1
The skyline query is important in database community. In recent years, the researches on incomplete data have been increasingly considered, especially for the skyline query. However, the existing skyline definition on incomplete data cannot provide users with valuable references. In this paper, we propose a novel skyline definition utilizing probabilistic model on incomplete data where each point ... View full abstract»

• ### Affinity Regularized Non-negative Matrix Factorization for Lifelong Topic Modeling

Publication Year: 2019, Page(s): 1
Lifelong topic model, an emerging paradigm for never-ending topic learning, aims to yield higher-quality topics as time passes by through knowledge accumulated from the past yet learned for the future. In this paper, we propose a novel lifelong topic model (LTM) based on non-negative matrix factorization (NMF), called Affinity Regularized NMF for LTM (NMF-LTM), which to our best knowledge is disti... View full abstract»

• ### Evaluation of the Sample Clustering Process on Graphs

Publication Year: 2019, Page(s): 1
An increasing number of networks are becoming large-scale and continuously growing, such that clustering on them in their entirety could be intractable. A feasible way to overcome this problem is to sample a representative subgraph and exploit its clustering structure (namely, sample clustering process). However, there are two issues that we should address in current studies. One underlying questi... View full abstract»

• ### Signed Clique Search in Signed Networks: Concepts and Algorithms

Publication Year: 2019, Page(s): 1
Mining cohesive subgraphs from a network is a fundamental problem in network analysis. Most existing cohesive subgraph models are mainly tailored to unsigned networks. In this paper, we study the problem of seeking cohesive subgraphs in a signed network, in which each edge can be positive or negative, denoting friendship or conflict respectively. We propose a novel model, called maximal <... View full abstract»

• ### Haery: a Hadoop based Query System on Accumulative and High-dimensional Data Model for Big Data

Publication Year: 2019, Page(s): 1
Column-oriented stores, known as their scalability and flexibility, are a common NoSQL database implementation and are increasingly used in big data management. In column-oriented stores, a “full-scan” query strategy is inefficient and the search space can be reduced if data is well partitioned or indexed, however there is no pre-defined schema for building and maintaining partitions and indexesat... View full abstract»

• ### Multi-view Scaling Support Vector Machines for Classification and Feature Selection

Publication Year: 2019, Page(s): 1
With explosive growth of data, the multi-view data is widely used in many fields, such as data mining, machine learning, computer vision and so on. Because such data always has complex structure, i.e. many categories, many perspectives of description and high dimension, how to formulate an accurate and reliable framework for multi-view classification is a very challenging task. In this paper, we p... View full abstract»

• ### Complication Risk Profiling in Diabetes Care: A Bayesian Multi-Task and Feature Relationship Learning Approach

Publication Year: 2019, Page(s): 1
Diabetes is a chronic disease that often results in multiple complications. Risk prediction and profiling of diabetes complications is critical for improved outcomes. In this paper, focusing on Type 2 diabetes mellitus (T2DM), we study the risk of developing complications after the initial T2DM diagnosis from longitudinal patient records. We propose a novel multi-task learning approach to simultan... View full abstract»

• ### Learning Distilled Graph for Large-scale Social Network Data Clustering

Publication Year: 2019, Page(s): 1
Spectral analysis is critical in social network analysis. As a vital step of the spectral analysis, the graph construction in many existing works utilizes content data only. Unfortunately, the content data often consists of noisy, sparse, and redundant features, which makes the resulting graph unstable and unreliable. In practice, besides the content data, social networks also contain link informa... View full abstract»

• ### CVTM: a Content-Venue-Aware Topic Model for Group Event Recommendation

Publication Year: 2019, Page(s): 1
Event recommendation is essential to help people find attractive events to attend, but it intrinsically faces cold-start problem. The previous studies exploit multiple contextual factors to overcome the cold-start problem in event recommendation. However, they do not consider the correlation among different contextual factors. Moreover, suggesting events for a group of users also has not been well... View full abstract»

• ### Top-k Dominating Queries on Skyline Groups

Publication Year: 2019, Page(s): 1
The top-$k$ dominating (TKD) query on skyline groups returns $k$ skyline groups that dominate the maximum number of points in a given data set. The TKD query combines the advantages of skyline groups and top-$k$ dominating queries, thus has been frequently used in decision making, recommendation systems, and ... View full abstract»

• ### A Review of Judgment Analysis Algorithms for Crowdsourced Opinions

Publication Year: 2019, Page(s): 1
The crowd-powered systems have been shown to be highly successful in the current decade to manage collective contribution of online workers for solving different complex tasks. It can also be used for soliciting opinions from a large set of people working in a distributed manner. Unfortunately, the online community of crowd workers might involve non-experts as opinion providers. As a result, such ... View full abstract»

