Skip to Main Content
Up to now, there are so many CLIR systems has been researched and built. Generally, These CLIR systems are built upon some search engine to skip building a crawler, an indexer and a searcher component. By this way, these CLIR systems do not have enough documents gathered for identifying pairs of similar content documents in languages and they have to send and receive too much data to and from the search engine and web sites while processing user queries. This is a big disadvantage which makes the CLIR systems inefficient. In this paper, we would like to introduce a model of Cross-lingual information retrieval system for Vietnamese-English web sites which include a crawler, an indexer and a searcher and show how gathered documents are processed to efficiently identify and retrieve the similar documents in languages.