By Topic

Part-of-speech tagging for table of contents recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
A. Belaid ; LORIA, CNRS, Vandoeuvre-les Nancy, France ; L. Pienon ; N. Valverde

A labeling approach to automatic recognition of tables of contents (TOC)s is described. A prototype is used for consulting electronically, scientific papers in a digital library system named Calliope. This method operates on an a roughly structured ASCII file, produced with OCR. Labeling is based on a part of speech tagging. Tagging is initiated by a primary labeling of text component using some specific dictionaries. Significant tags are then grouped in the title and author strings and reduced in canonical forms according to contextual rules. Non-labeled tokens are integrated in one or another field per either applying contextual correction rules or using a structure model generated from well detected articles. The designed prototype operates with a great satisfaction on different TOC layouts and character recognition qualities. Without manual intervention, 95.41% rate of correct segmentation was obtained on 38 journals including 2703 articles and 81.74% rate of correct field extraction

Published in:

Pattern Recognition, 2000. Proceedings. 15th International Conference on  (Volume:4 )

Date of Conference: