Skip to Main Content
In this paper we proposed a new word sense disambiguation method, called multi-engine collaborative bootstrapping (MCB) that combines different types of corpora and also uses two languages to bootstrapping. MCB contains the bilingual bootstrapping as its kernel algorithm that leads to incremental knowledge acquisition. EM model is performed to train parameters in base learner. Feature translation model is improved by semantic correlation estimation. In addition we use multi-engine to produce qualified starting seeds from parallel corpora and monolingual corpora. Those seeds that are generated through unsupervised machine learning approaches can also ensure bootstrapping effectiveness in contrast with manual selected seeds in spite of their different selection mechanisms. Experimental results prove the effectiveness of MCB. Some factors including feature space and starting seed number are concerned in our experiments because EM algorithm is sensible to starting values. Limitation of resources is also concerned.