Theme network is a semantic network of document specific themes. So far Natural Language Processing (NLP) research patronized much of topic based summarizer system, unable to capture thematic semantic affinity of any text i.e. a news article containing the concepts, "gun," "convenience store," "demand money" and "make getaway" might suggest the topics "robbery" and "crime". In this paper the development of an opinion summarization system that works on Bengali News corpus has been described. The system identifies the sentiment information in each document, aggregates them and represents the summary information in text. The present system follows a topic-sentiment model for sentiment identification and aggregation. Topic-sentiment model is designed as discourse level theme identification and the topic-sentiment aggregation is achieved by theme clustering (k-means) and Document level Theme Relational Graph representation. The Document Level Theme Relational Graph is finally used for candidate summary sentence selection by standard page rank algorithms used in Information Retrieval (IR). As Bengali is a resource constraint language, the building of annotated gold standard corpus and acquisition of linguistics tools for lexico-syntactic, syntactic and discourse level features extraction are described in this paper. The reported accuracy of the Theme detection technique is 83.60% (precision), 76.44% (recall) and 79.85% (F-measure). The summarization system has been evaluated with Precision of 72.15%, Recall of 67.32% and F-measure of 69.65%.
Published in:
Social Computing (SocialCom), 2010 IEEE Second International Conference on
Date of Conference: 20-22 Aug. 2010