Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
3 Author(s)
Do-Gil Lee ; Korea Univ., Seoul ; Hae-Chang Rim ; Yook, Dongsuk

On the Internet, information is largely in text form, which often includes such errors as spelling mistakes. These errors complicate natural language processing because most NLP applications aren't robust and assume that the input data is noise free. Preprocessing is necessary to deal with these errors and meet the growing need for automatic text processing. One kind of such preprocessing is automatic word spacing. This process decides correct boundaries between words in a sentence containing spacing errors, which are a type of spelling error. Except for some Asian languages such as Chinese and Japanese, most languages have explicit word spacing. In these languages, word spacing is crucial to increase readability and to accurately communicate a text's meaning. Automatic word spacing plays an important role not only as a spell-checker module but also as a preprocessor for a morphological analyzer, which is a fundamental tool for NLP applications. Furthermore, automatic word spacing can serve as a postprocessor for optical-character-recognition systems and speech recognition systems

Published in:

Intelligent Systems, IEEE  (Volume:22 ,  Issue: 1 )