Automatic Glossary Term Extraction from Large-Scale Requirements Specifications | IEEE Conference Publication | IEEE Xplore

Automatic Glossary Term Extraction from Large-Scale Requirements Specifications


Abstract:

Creating glossaries for large corpora of requirments is an important but expensive task. Glossary term extraction methods often focus on achieving a high recall rate and,...Show More

Abstract:

Creating glossaries for large corpora of requirments is an important but expensive task. Glossary term extraction methods often focus on achieving a high recall rate and, therefore, favor linguistic proecssing for extracting glossary term candidates and neglect the benefits from reducing the number of candidates by statistical filter methods. However, especially for large datasets a reduction of the likewise large number of candidates may be crucial. This paper demonstrates how to automatically extract relevant domain-specific glossary term candidates from a large body of requirements, the CrowdRE dataset. Our hybrid approach combines linguistic processing and statistical filtering for extracting and reducing glossary term candidates. In a twofold evaluation, we examine the impact of our approach on the quality and quantity of extracted terms. We provide a ground truth for a subset of the requirements and show that a substantial degree of recall can be achieved. Furthermore, we advocate requirements coverage as an additional quality metric to assess the term reduction that results from our statistical filters. Results indicate that with a careful combination of linguistic and statistical extraction methods, a fair balance between later manual efforts and a high recall rate can be achieved.
Date of Conference: 20-24 August 2018
Date Added to IEEE Xplore: 14 October 2018
ISBN Information:

ISSN Information:

Conference Location: Banff, AB, Canada

Contact IEEE to Subscribe

References

References is not available for this document.