Skip to Main Content
Software requirements documents (SRDs) are often authored in general-purpose rich-text editors, such as MS Word. SRDs contain instances of logical structures, such as use case, business rule, and functional requirement. Automated recognition and extraction of these instances enables advanced requirements management features, such as automated traceability, template conformance checking, guided editing, and interoperability with requirements management tools such as RequisitePro. The variability in content and physical representation of these instances poses challenges to their accurate recognition and extraction. To address these challenges, we present a framework allowing 1) the specification of logical structures in terms of their content, textual rendering, and variability and 2) the extraction of instances of such structures from rich-text documents. Our evaluation involves 36 different logical structures identified in 43 SRDs and shows that the intended content, style, and variability of these structures can be specified in the framework such that their instances can be extracted from the documents with high precision and recall, both close to 100%.