By Topic

Understanding text corpora with multiple facets

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Lei Shi ; IBM Research - China, 19 Zhongguancun Software Park, Beijing 100193, China ; Furu Wei ; Shixia Liu ; Li Tan
more authors

Text visualization becomes an increasingly more important research topic as the need to understand massive-scale textual information is proven to be imperative for many people and businesses. However, it is still very challenging to design effective visual metaphors to represent large corpora of text due to the unstructured and high-dimensional nature of text. In this paper, we propose a data model that can be used to represent most of the text corpora. Such a data model contains four basic types of facets: time, category, content (unstructured), and structured facet. To understand the corpus with such a data model, we develop a hybrid visualization by combining the trend graph with tag-clouds. We encode the four types of data facets with four separate visual dimensions. To help people discover evolutionary and correlation patterns, we also develop several visual interaction methods that allow people to interactively analyze text by one or more facets. Finally, we present two case studies to demonstrate the effectiveness of our solution in support of multi-faceted visual analysis of text corpora.

Published in:

Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on

Date of Conference:

25-26 Oct. 2010