Skip to Main Content
This paper describes an automatic topic extraction, categorization, and relevance ranking model for multi-lingual surveys and questions that exploits machine learning algorithms such as topic modeling and fuzzy clustering. Automatically generated question and survey categories are used to build question banks and category-specific survey templates. First, we describe different pre-processing steps we considered for removing noise in the multilingual survey text. Second, we explain our strategy to automatically extract survey categories from surveys based on topic models. Third, we describe different methods to cluster questions under survey categories and group them based on relevance. Last, we describe our experimental results on a large group of unique, real-world survey datasets from the German, Spanish, French, and Portuguese languages and our refining methods to determine meaningful and sensible categories for building question banks. We conclude this document with possible enhancements to the current system and impacts in the business domain.