Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

A method of mining bilingual resources from Web Based on Maximum Frequent Sequential Pattern

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Guiping Zhang ; Knowledge Eng. Res. Center, Shenyang Aerosp. Univ., Shenyang, China ; Yang Luo ; Duo Ji

The bilingual resources are indispensable and vital resources in the NPL fields, such as machine translation, etc. A large amount of electronic information is embedded in the Internet, which can be used as a potential information source of large-scale multi-language corpus, so it is a potential and feasible way to mine a great capacity of true bilingual resources from the Web. This paper proposes a method of mining bilingual resources from the Web based on Maximum Frequent Sequential Pattern. The method uses the heuristic approach to search and filter the candidate bilingual web pages, then mines patterns using maximum frequent sequential, and uses a machine learning method for extending the pattern base and verifying bilingual resources in accordance with the Japanese to Chinese word proportion. The experimental results indicate that the method could extract bilingual resources efficiently, with the precision rate over 90%.

Published in:

Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on

Date of Conference:

21-23 Aug. 2010