Skip to Main Content
Many software development tools that assist with tasks such as testing and maintenance are specific to a particular development language and require a parser for that language. Because a grammar is required to develop a parser, construction of these software development tools is dependent upon the availability of a grammar for the development language. However, a grammar is not always available for a language and, in these cases, acquiring a grammar is the most difficult, costly, and time-consuming phase of tool construction. In this paper, we describe a methodology for grammar recovery from a hard-coded parser. Our methodology is comprised of manual instrumentation of the parser, a technique for automatic grammar recovery from parse trees, and a semi-automatic metrics-guided approach to refactoring an iterative grammar to obtain a recursive grammar. We present the results of a case study in which we recover and refactor a grammar from version 4.0.0 of the GNU C++ parser and then refactor the recovered grammar using our metrics-guided approach. Additionally, we present an evaluation of the recovered and refactored grammar by comparing it to the ISO C++98 grammar.