By Topic

The DIAsDEM framework for converting domain-specific texts into XML documents with data mining techniques

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
H. Graubitz ; Dept. of E-Business, Leipzig Graduate Sch. of Manage., Germany ; M. Spiliopoulou ; K. Winkler

Modern organizations are accumulating huge volumes of textual documents. To turn archives into valuable knowledge sources, textual content must become explicit and able to be queried. Semantic tagging with markup languages such as XML satisfies both requirements. We thus introduce the DIAsDEM* framework for extracting semantics from structural text units (e.g., sentences), assigning XML tags to them and deriving a flat XML DTD for the archive. DIAsDEM focuses on archives characterized by a peculiar terminology and by an implicit structure such as court filings and company reports. In the knowledge discovery phase, text units are iteratively clustered by similarity of their content. Each iteration outputs clusters satisfying a set of quality criteria. Text units contained in these clusters are tagged with semiautomatically determined cluster labels and XML tags respectively. Additionally, extracted named entities (e.g., persons) serve as attributes of XML tags. We apply the framework in a case study on the German Commercial Register

Published in:

Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on

Date of Conference: