<![CDATA[ IEEE Transactions on Knowledge and Data Engineering - new TOC ]]>
http://ieeexplore.ieee.org
TOC Alert for Publication# 69 2016June 23<![CDATA[A Novel Recommendation Model Regularized with User Trust and Item Ratings]]>28716071620687<![CDATA[Aggregating Crowdsourced Quantitative Claims: Additive and Multiplicative Models]]>categorical applications, such as image classification. They use the accuracy, i.e., rate of exactly correct claims, to capture the reliability of participants. As a consequence, they are not effective for truth discovery in quantitative applications, such as percentage annotation and object counting, where similarity rather than exact matching between crowdsourced claims and latent truths should be considered. In this paper, we propose two unsupervised Quantitative Truth Finders (QTFs) for truth discovery in quantitative crowdsourcing applications. One QTF explores an additive model and the other explores a multiplicative model to capture different relationships between crowdsourced claims and latent truths in different classes of quantitative tasks. These QTFs naturally incorporate the similarity between variables. Moreover, they use the bias and the confidence instead of the accuracy to capture participants’ abilities in quantity estimation. These QTFs are thus capable of accurately discovering quantitative truths in particular domains. Through extensive experiments, we demonstrate that these QTFs outperform other state-of-the-art approaches for truth discovery in quantitative crowdsourcing applications and they are also quite efficient.]]>287162116341152<![CDATA[Aspect-Level Influence Discovery from Graphs]]>287163516492661<![CDATA[CMiner: Opinion Extraction and Summarization for Chinese Microblogs]]>28716501663498<![CDATA[Entropy Optimized Feature-Based Bag-of-Words Representation for Information Retrieval]]>287166416771217<![CDATA[Improved Practical Matrix Sketching with Guarantees]]>FrequentDirections, under the size/error trade-off to match the performance of iSVD and retain its guarantees. We also demonstrate some adversarial datasets where iSVD performs quite poorly. In comparing techniques in the time/error trade-off, techniques based on hashing or sampling tend to perform better. In this setting, we modify the most studied sampling regime to retain error guarantee but obtain dramatic improvements in the time/error trade-off. Finally, we provide easy replication of our studies on APT, a new testbed which makes available not only code and datasets, but also a computing platform with fixed environmental settings.]]>287167816901707<![CDATA[Improving Construction of Conditional Probability Tables for Ranked Nodes in Bayesian Networks]]>28716911705906<![CDATA[Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search]]>spatio-textual objects collected in many applications such as location based services and social networks, in which an object is described by its spatial location and a set of keywords (terms). Consequently, the study of spatial keyword search which explores both location and textual description of the objects has attracted great attention from the commercial organizations and research communities. In the paper, we study two fundamental problems in the spatial keyword queries: top spatial keyword search (TOPK-SK), and batch top spatial keyword search (BTOPK-SK). Given a set of spatio-textual objects, a query location and a set of query keywords, the TOPK-SK retrieves the closest objects each of which contains all keywords in the query. BTOPK-SK is the batch processing of sets of TOPK-SK queries. Based on the inverted index and the linear quadtree, we propose a novel index structure, called inverted linear quadtree (IL-Quadtree), which is carefully designed to exploit both spatial and keyword based pruning techniques to effectively reduce the search space. An efficient algorithm is then developed to tackle top spatial keyword sea-
ch. To further enhance the filtering capability of the signature of linear quadtree, we propose a partition based method. In addition, to deal with BTOPK-SK, we design a new computing paradigm which partition the queries into groups based on both spatial proximity and the textual relevance between queries. We show that the IL-Quadtree technique can also efficiently support BTOPK-SK. Comprehensive experiments on real and synthetic data clearly demonstrate the efficiency of our methods.]]>287170617212264<![CDATA[K-Subspaces Quantization for Approximate Nearest Neighbor Search]]>28717221733560<![CDATA[Label Distribution Learning]]>label distribution learning (LDL) for such kind of applications. The label distribution covers a certain number of labels, representing the degree to which each label describes the instance. LDL is a more general learning framework which includes both single-label and multi-label learning as its special cases. This paper proposes six working LDL algorithms in three ways: problem transformation, algorithm adaptation, and specialized algorithm design. In order to compare the performance of the LDL algorithms, six representative and diverse evaluation measures are selected via a clustering analysis, and the first batch of label distribution datasets are collected and made publicly available. Experimental results on one artificial and 15 real-world datasets show clear advantages of the specialized algorithms, which indicates the importance of special design for the characteristics of the LDL problem.]]>28717341748890<![CDATA[Large Margin Distribution Learning with Cost Interval and Unlabeled Data]]>28717491763827<![CDATA[Learning to Find Topic Experts in Twitter via Different Relations]]>Twitter is an important problem because tweets from experts are valuable sources that carry rich information (e.g., trends) in various domains. However, previous methods cannot be directly applied to Twitter expert finding problem. Recently, several attempts use the relations among users and Twitter Lists for expert finding. Nevertheless, these approaches only partially utilize such relations. To this end, we develop a probabilistic method to jointly exploit three types of relations (i.e., follower relation, user-list relation, and list-list relation) for finding experts. Specifically, we propose a Semi-Supervised Graph-based Ranking approach () to offline calculate the global authority of users. In , we employ a normalized Laplacian regularization term to jointly explore the three relations, which is subject to the supervised information derived from Twitter crowds. We then online compute the local relevance between users and the given query. By leveraging the global authority and local relevance of users, we rank all of users and find top-N users with highest ranking scores. Experiments on real-world data demonstrate the effectiveness of our proposed approach for topic-specific expert finding in Twitt-
r.]]>287176417781074<![CDATA[Microblog Dimensionality Reduction—A Deep Learning Approach]]>28717791789649<![CDATA[Mining User-Aware Rare Sequential Topic Patterns in Document Streams]]>28717901804902<![CDATA[Online Subgraph Skyline Analysis over Knowledge Graphs]]>, to support more complicated analysis over graph data. Specifically, given a large graph and a query graph , we want to find all the subgraphs in , such that is graph isomorphic to and not dominated by any other subgraphs. In order to improve the efficiency, we devise a hybrid feature encoding incorporating both structural and numeric features based on a partitioning strategy, and discuss how to optimize the space partitioning. We also present a skylayer index to facilitate the dynamic subgraph skyline computation. Moreover, an attribute cluster-based -
ethod is proposed to deal with the curse of dimensionality. Extensive experiments over real datasets confirm the effectiveness and efficiency of our algorithm.]]>287180518191846<![CDATA[Personalized Influential Topic Search via Social Network Summarization]]>personalized influential topic search, or PIT-Search in a social network: Given a keyword query issued by a user in a social network, a PIT-Search is to find the top--related topics that are most influential for the query user . The influence of a topic to a query user depends on the social connection between the query user and the social users containing the topic in the social network. To measure the topics’ influence at the similar granularity scale, we need to extract the social summarization of the social network regarding topics. To make effective topic-aware social summarization, we propose two random-walk based approaches: random clustering and an L-length random walk. Based on the proposed approaches, we can find a small set of representative users with assigned influential scores to simulate the influence of the large number of topic users in the social network with regards to the topic. The selected representative users are denot-
d as the social summarization of topic-aware influence spread over the social network. And then, we verify the usefulness of the social summarization by applying it to the problem of personalized influential topic search. Finally, we evaluate the performance of our algorithms using real-world datasets, and show the approach is efficient and effective in practice.]]>287182018341082<![CDATA[Proxies for Shortest Path and Distance Queries]]>challenging task today. This article investigates a light-weight data reduction technique for speeding-up shortest path and distance queries on large graphs. To do this, we propose a notion of routing proxies (or simply proxies), each of which represents a small subgraph, referred to as deterministic routing areas (dras). We first show that routing proxies hold good properties for speeding-up shortest path and distance queries. Then, we design a linear-time algorithm to compute routing proxies and their corresponding dras. Finally, we experimentally verify that our solution is a general technique for reducing graph sizes and speeding-up shortest path and distance queries, using real-life large graphs.]]>28718351850622<![CDATA[Resolving Multi-Party Privacy Conflicts in Social Media]]>28718511863357<![CDATA[Scalable Semi-Supervised Learning by Efficient Anchor Graph Regularization]]>28718641877812<![CDATA[SEDEX: Scalable Entity Preserving Data Exchange]]>287187818901531<![CDATA[Stock Selection with a Novel Sigmoid-Based Mixed Discrete-Continuous Differential Evolution Algorithm]]>28718911904661<![CDATA[The Gamma Matrix to Summarize Dense and Sparse Data Sets for Big Data Analytics]]>28719051918290<![CDATA[Using Hashtag Graph-Based Topic Model to Connect Semantically-Related Words Without Co-Occurrence in Microblogs]]>[1] and latent semantic analysis [2] ) fail to learn high quality topic structures. Tweets are always showing up with rich user-generated hashtags. The hashtags make tweets semi-structured inside and semantically related to each other. Since hashtags are utilized as keywords in tweets to mark messages or to form conversations, they provide an additional path to connect semantically related words. In this paper, treating tweets as semi-structured texts, we propose a novel topic model, denoted as Hashtag Graph-based Topic Model (HGTM) to discover topics of tweets. By utilizing hashtag relation information in hashtag graphs, HGTM is able to discover word semantic relations even if words are not co-occurred within a specific tweet. With this method, HGTM successfully alleviates the sparsity problem. Our investigation illustrates that the user-contributed hashtags could serve as weakly-supervised information for topic modeling, and the relation between hashtags could reveal latent semantic relation between words. We evaluate the effectiveness of HGTM on tweet (hashtag) clustering and hashtag classification problems. Experiments on two real-world tweet data sets show that HGTM has strong capability to handle sparseness and noise problem in tweets. Furthermore, HGTM can discover more distinct and coherent topics than the state-of-the-art baselines.]]>28719191933859<![CDATA[Listwise Learning to Rank by Exploring Structure of Objects]]>28719341939494<![CDATA[OMASS: One Memory Access Set Separation]]> belongs, if any. For example, in a router, this functionality can be used to determine the next hop of an incoming packet. This problem is generally known as set separation and has been widely studied. Most existing solutions make use of hash-based algorithms, particularly when a small percentage of false positives is allowed. A known approach is to use a collection of Bloom filters in parallel. Such schemes can require several memory accesses, a significant limitation for some implementations. We propose an approach using Block Bloom Filters, where each element is first hashed to a single memory block that stores a small Bloom filter that tracks the element and the set or sets the element belongs to. In a naïve solution, when an element in a set is stored, it necessarily increases the false positive probability for finding that is in another set . In this paper, we introduce our One Memory Access Set Separation (OMASS) scheme to avoid this problem. OMASS is designed so that for a giv-
n element , the corresponding Bloom filter bits for each set map to different positions in the memory word. This ensures that the false positive rates for the Bloom filters for element under other sets are not affected. In addition, OMASS requires fewer hash functions compared to the naïve solution.]]>28719401943340<![CDATA[Correction to “Inference of Regular Expressions for Text Extraction from Examples”]]>2871944194442