By Topic

Punctuating speech for information extraction

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Benoit Favre ; ICSI, 1947 Center St, Suite 600, Berkeley, CA 94704, USA ; Ralph Grishman ; Dustin Hillard ; Heng Ji
more authors

This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4% relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.

Published in:

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Date of Conference:

March 31 2008-April 4 2008