Skip to Main Content
We previously proposed a kana-to-kanji conversion method of non-segmented kana sentences by using Markov chain model of words in sentence. However, we could not obtain the enough accuracy rate for conversion by this method. The cause is considered that the total number of the rules is not saturated in the dictionary of Markov chain probabilities of words in sentence. Therefore, we take notice that the total number of the rules is almost saturated in the dictionary of Markov chain probabilities of words in bunsetsu. In this paper, we propose a new kana-to-kanji conversion method by using this Markov chain model. That is, the new proposed method detects simultaneously the boundaries of kana bunsetsu in sentence and the boundaries of kana word in bunsetsu by using Markov chain model of kana words in bunsetsu, and then converts kana words to the candidates of kanji-kana word and selects the maximum likely candidate by using Markov chain model of kanji-kana words in bunsetsu. Through the experiments by using statistical data of daily Japanese newspaper, the previous proposed method (called Method-B1) and the new proposed method (called Method-B2) are evaluated by the criteria of the accuracy rate for conversion. From the results of the experiments, it is concluded that Method-B2 is superior to Method-B1 in the accuracy rate for conversion and is effective in kana-to-kanji conversion of non-segmented kana sentences.
Date of Conference: 18-19 Oct. 2010