By Topic

Text mining of bilingual parallel corpora with a measure of semantic similarity

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Chung-Hong Lee ; Dept. of Inf. Manage., Chang Jung Univ., Tainan, Taiwan ; Hsin-Chang Yang

The paper describes a new application of a text-mining algorithm to the text sources of bilingual parallel corpora. The ultimate task, being undertaken in the context of a Chinese-English machine translation project, will be to develop a language-neutral method to discover similar documents from multilingual text collections. Using a variation of automatic clustering techniques which apply a neural net approach, namely the self-organizing maps (SOM), we have conducted several experiments to uncover associated documents based on Chinese-English bilingual parallel corpora, and a hybrid Chinese-English corpus. The experiments show some interesting results and a couple of potential ways for future work towards the field of multilingual information discovery. In addition, for exploring the impacts on linguistic issues with the machine learning approach to mining sensible linguistics elements from multilingual texts, we have examined the resulting term associations and text associations from the view of cross-lingual text similarity. To evaluate semantic relatedness of the mined bilingual texts, we applied a measure technique of semantic similarity in the resulting bilingual document clusters and word clusters. The paper presents algorithms that enable multilingual text mining based on the self-organizing map (SOM) for automatically grouping similar multilingual texts (i.e. Chinese and English texts), along with a means of measuring their semantic similarity to resolve the difficulties of syntactic and semantic ambiguity in multilingual information access

Published in:

Systems, Man, and Cybernetics, 2001 IEEE International Conference on  (Volume:1 )

Date of Conference: