Skip to Main Content
In this paper, we propose an document clustering algorithm based on formal concept analysis. In conventional clustering methods, numeric data are required and numeric processing is performed by cosine distance of numeric data as word or document vector. However, several documents of a cluster are not similar as a result of classification using conventional methods. In this paper, a novel clustering method is proposed by an application of formal concept analysis. Documents are classified into sets of documents shared same features by formal concept analysis. In addition, each set of documents can be selected in the method. We, thereby, propose document clustering which is suitable for expressing themes of documents based on information of documents as words. In this paper, formal concept analysis is applied to 100 documents of English news articles selected from Reuters-21578 database. Then, the document clustering is performed by selecting each concept on concept lattice. Elements of each article are included in all concepts connecting to lower layers of a selected concept. Those elements are set as a cluster. Each cluster has a shared topic. In addition, clusters of low-level connecting layers are set as a cluster by selecting concept on higher layers. Proposed clustering technique can be applied to text classification and summarization.
World Automation Congress (WAC), 2010
Date of Conference: 19-23 Sept. 2010