Skip to Main Content
Web syndication technologies help us easily aggregate daily news from diverse sources. However, the huge amount of information makes us more difficult to read let alone digest and focus on the most important events. Therefore, we need an efficient way of news extraction and mining. In this paper, we propose an unsupervised approach to multilingual concept discovery from daily online news extracts. First, key terms are extracted statistically from short news extracts. Second, similar term candidates are grouped into concrete concepts with unsupervised term clustering methods. Our goal is automatic news processing with minimum resources, which requires no training in advance. The experimental results show the potential of the proposed approach in efficiency and effectiveness. Further investigation is needed to study the cross-lingual relation between extracted concepts.
Date of Conference: 23-26 May 2010