# IEEE Transactions on Knowledge and Data Engineering

## Issue 3 • 1 March 2019

• ### A General Theory of IR Evaluation Measures

Publication Year: 2019, Page(s):409 - 422
Interval scales are assumed by several basic descriptive statistics, such as mean and variance, and by many statistical significance tests which are daily used in IR to compare systems. Unfortunately, so far, there has not been any systematic and formal study to discover the actual scale properties of IR measures. Therefore, in this paper, we develop a theory of Information Retrieval (IR)<... View full abstract»

• ### C2Net: A Network-Efficient Approach to Collision Counting LSH Similarity Join

Publication Year: 2019, Page(s):423 - 436
Similarity join of two datasets $P$ and $Q$ View full abstract»

• ### Comments Mining With TF-IDF: The Inherent Bias and Its Removal

Publication Year: 2019, Page(s):437 - 450
Text mining have gained great momentum in recent years, with user-generated content becoming widely available. One key use is comment mining, with much attention being given to sentiment analysis and opinion mining. An essential step in the process of comment mining is text pre-processing; a step in which each linguistic term is assigned with a weight that commonly increases with its appearance in... View full abstract»

• ### Correlated Matrix Factorization for Recommendation with Implicit Feedback

Publication Year: 2019, Page(s):451 - 464
As a typical latent factor model, Matrix Factorization (MF) has demonstrated its great effectiveness in recommender systems. Users and items are represented in a shared low-dimensional space so that the user preference can be modeled by linearly combining the item factor vector $V$ View full abstract»

• ### Detecting Pickpocket Suspects from Large-Scale Public Transit Records

Publication Year: 2019, Page(s):465 - 478
Massive data collected by automated fare collection (AFC) systems provide opportunities for studying both personal traveling behaviors and collective mobility patterns in urban areas. Existing studies on AFC data have primarily focused on identifying passengers’ movement patterns. However, we creatively leveraged such data for identifying pickpocket suspects. Stopping pickpockets in the public tra... View full abstract»

• ### Nonintrusive Smartphone User Verification Using Anonymized Multimodal Data

Publication Year: 2019, Page(s):479 - 492
Smartphone user verification is important as personal daily activities are increasingly conducted on the phone and sensitive information is constantly logged. The commonly adopted user verification methods are typically active, i.e., they require a user's cooperative input of a security token to gain access permission. Though popular, these methods impose heavy burden to smartphone users to memori... View full abstract»

• ### Nonnegative Matrix Factorization with Side Information for Time Series Recovery and Prediction

Publication Year: 2019, Page(s):493 - 506
Motivated by the recovery and prediction of electricity consumption time series, we extend Nonnegative Matrix Factorization to take into account external features as side information. We consider general linear measurement settings, and propose a framework which models non-linear relationships between external features and the response variable. We extend previous theoretical results to obtain a s... View full abstract»

• ### Privacy-Preserving Social Media Data Publishing for Personalized Ranking-Based Recommendation

Publication Year: 2019, Page(s):507 - 520
Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user data, in particular users’ online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the use... View full abstract»

• ### Progressive Approaches for Pareto Optimal Groups Computation

Publication Year: 2019, Page(s):521 - 534
Group skyline query is a powerful tool for optimal group analysis. Most of the existing group skyline queries select optimal groups by comparing the dominance relationship between aggregate-based points; such feature creates difficulties for users to specify an appropriate aggregate function. Besides, many significant groups that have great attractions to users in practice may be overlooked. To ad... View full abstract»

• ### Representing Urban Forms: A Collective Learning Model with Heterogeneous Human Mobility Data

Publication Year: 2019, Page(s):535 - 548
Human mobility data refers to records of human movements, such as cellphone traces, vehicle GPS trajectories, geo-tagged posts, and photos. While successfully mining human mobility data can benefit many applications such as city planning, transportation, urban economics, and public safety, it is very challenging to model large-scale Heterogeneous Human Mobility Data (HHMD) that are generated from ... View full abstract»

• ### Robust Image Hashing with Tensor Decomposition

Publication Year: 2019, Page(s):549 - 560
This paper presents a new image hashing that is designed with tensor decomposition (TD), referred to as TD hashing, where image hash generation is viewed as deriving a compact representation from a tensor. Specifically, a stable three-order tensor is first constructed from the normalized image, so as to enhance the robustness of our TD hashing. A popular TD algorithm, called Tucker decomposition, ... View full abstract»

Publication Year: 2019, Page(s):561 - 574
Domain adaptation is the situation for supervised learning in which the training data are sampled from the source domain while the test data are sampled from the target domain that follows a different distribution. The key to solving such a problem is to reduce effects of the discrepancy between the training data and test data. Recently, deep learning methods that employ stacked denoising auto-enc... View full abstract»

• ### Towards Confidence Interval Estimation in Truth Discovery

Publication Year: 2019, Page(s):575 - 588
The demand for automatic extraction of true information (i.e., truths) from conflicting multi-source data has soared recently. A variety of truth discovery methods have witnessed great successes via jointly estimating source reliability and truths. All existing truth discovery methods focus on providing a point estimator for each object's truth, but in many real-world applications... View full abstract»

• ### Viral Cascade Probability Estimation and Maximization in Diffusion Networks

Publication Year: 2019, Page(s):589 - 600
People use social networks to share millions of stories every day, but these stories rarely become viral. Can we estimate the probability that a story becomes a viral cascade? If so, can we find a set of users that are more likely to trigger viral cascades? These estimation and maximization problems are very challenging since both rare-event nature of viral cascades and efficiency... View full abstract»

• ### Webpage Depth Viewability Prediction Using Deep Sequential Neural Networks

Publication Year: 2019, Page(s):601 - 614
Display advertising is the most important revenue source for publishers in the online publishing industry. The ad pricing standards are shifting to a new model in which ads are paid only if they are viewed. Consequently, an important problem for publishers is to predict the probability that an ad at a given page depth will be shown on a user's screen for a certain dwell time. This paper proposes d... View full abstract»

