Chinese words segmentation is an important technique for Chinese Web data mining. After the research made on some Chinese word segmentation nowadays, an improved algorithm is proposed in this paper. The algorithm updates dictionary by using two-way Markov chain, and does word segmentation by applying an improved forward maximum matching method based on word frequency statistic. The simulation shows this algorithm can finish word segmentation for a given text quickly and accurately.
Published in:
Information Assurance and Security, 2009. IAS '09. Fifth International Conference on
(Volume:1
)
Date of Conference: 18-20 Aug. 2009