The process of summarization in the pre-processing stage in order to improve measurement of texts when clustering | IEEE Conference Publication | IEEE Xplore

The process of summarization in the pre-processing stage in order to improve measurement of texts when clustering


Abstract:

This work introduces the Cassiopeia model, which allows for knowledge discovery in textual bases, used for the purposes of text mining in distinct and/or antagonistic dom...Show More

Abstract:

This work introduces the Cassiopeia model, which allows for knowledge discovery in textual bases, used for the purposes of text mining in distinct and/or antagonistic domains. The most relevant contributions include the use of summarized texts as an entrance in pre-processing stage of clusterization, language independence with the use of stop words and the treatment of high dimensionality, a problem that is inherent to Text Mining. In the knowledge extraction, the texts are clustered and reclustered according to a similarity criterion. With the results obtained, the study hopes to show the impact of including summarization in the process of text clusterization. The experiments conducted in this study indicate that text clusterization using summaries is in fact much more effective than direct clusterization of texts in their entirety, as measured by internal and external measures traditionally employed in the field of text clusterization. Finally, the post-processing stage creates clusters of summarized texts with a high degree of informativity, a quality that is inherent to summarization. The clusters are highly esteemed with the indexed words. This fact is due to the process proposed by the Cassiopeia model, which allows for strong similarity among the clustered texts. In the future, this similarity will allow for the creation of categories based on the word indices of each cluster.
Date of Conference: 11-14 December 2011
Date Added to IEEE Xplore: 09 February 2012
ISBN Information:
Conference Location: Abu Dhabi, United Arab Emirates

Contact IEEE to Subscribe

References

References is not available for this document.