Skip to Main Content
In this paper, we propose a one step rhetorical structure parsing, chunking and extractive summarization approach to automatically generate meeting minutes from parliamentary speech using acoustic and lexical features. We investigate how to use lexical features extracted from imperfect ASR transcriptions, together with acoustic features extracted from the speech itself, to form extractive summaries with the structure of meeting minutes. Each business item in the minute is modeled as a rhetorical chunk which consists of smaller rhetorical units. Principal Component Analysis (PCA) graphs of both acoustic and lexical features in meeting speech show clear self-clustering of speech utterances according to the underlying rhetorical state-for example acoustic and lexical feature vectors from the question and answer or motion of a parliamentary speech, are grouped together. We then propose a Conditional Random Fields (CRF)-based approach to perform both rhetorical structure modeling and extractive summarization in one step, by chunking, parsing and extraction of salient utterances. Extracted salient utterances are grouped under the labels of each rhetorical state, emulating meeting minutes to yield summaries that are more easily understandable by humans. We compare this approach to different machine learning methods. We show that our proposed CRF-based one step minute generation system obtains the best summarization performance both in terms of ROUGE-L F-measure at 74.5% and by human evaluation, at 77.5% on average.