Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Japanese text compression using word-based coding

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Morihara, T. ; Fujitsu Labs. Ltd., Atsugi, Japan ; Satoh, N. ; Yahagi, H. ; Yoshida, S.

Summary form only given. Since Japanese characters are encoded in 16-bit, their large sizes have made compression using 8-bit character sampling coding methods difficult. At DCC'97, Satoh et al. (1997) reported that the 16-bit character sampling adaptive arithmetic is effective in improving the compression ratio. However, the adaptive compression method does not work well on small sized documents which are produced in the office by groupware and E-mail. The present paper studies a word-based semi-adaptive compression method for Japanese text for the purpose of obtaining good compression performance on various document sizes. The algorithm is composed of two stages. The first stage converts input strings into the word-index numbers (intermediate data) corresponding to the longest matching strings in the dictionary. The second stage reduces the redundancy of the intermediate data. We adopted a 16-bit word-index, and first order context 16-bit sampling PPMC2 (16 bit-PPM) for entropy coding in the second stage

Published in:

Data Compression Conference, 1998. DCC '98. Proceedings

Date of Conference:

30 Mar-1 Apr 1998