By Topic

ALPOS: A Machine Learning Approach for Analyzing Microblogging Data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Dan Zhang ; Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA ; Yan Liu ; Richard D. Lawrence ; Vijil Chenthamarakshan

With the development of Internet, the increasing volume of information posted on micro-blogging sites like Twitter necessitates the need for efficient information filtering. In conventional text classification problems, it is assumed that the feature vectors extracted from the available documents are sufficient to learn good classifiers. However, this conventional approach is not likely to work for Twitter due to the limited number of characters on each tweet. From a higher level, each tweet can be viewed as an abbreviated abstraction of a long document, and we only have a partial observation of this document. To solve the problem caused by the partial observations, we introduce a novel domain adaption/transfer learning approach called Assisted Learning for Partial Observation (ALPOS). The basic idea is to use a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). In particular, we learn a hidden, higher-level abstraction space, which is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space by using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space for recovery and classification. We compare the performance of this method with existing approaches on synthetic data and the well-known Reuters-21578 dataset. We also present experimental results on twitter classification.

Published in:

2010 IEEE International Conference on Data Mining Workshops

Date of Conference:

13-13 Dec. 2010