Quality Evaluation of Structural Design in Software Reverse Engineering: A Focus On Cohesion

Software reverse engineering (SRE) plays a crucial role in contemporary software environments. Software developers may implement a system first then use SRE tools to generate design content such as the Unified Modeling Language (UML) diagrams. In the literature of SRE, studies majorly focus on how precisely the conversion can reflect the system; there is, however, little or no research that further looks into the quality of the converted results. Therefore, this paper presents an online knowledge-based ontological SRE system, OntRECoh, for quality evaluation of converted UML structural models. OntRECoh features a domain-specific knowledge base that focuses on cohesion design and a rule-based inference engine for computing the cohesion scores of Java-based implemented systems and providing improvement recommendations through its Web-based interface. Furthermore, OntRECoh includes both static and dynamic cohesion measures from both the design and the implementation aspects, for the evaluation to be more comprehensive and synthetic in the SRE context.

Because of the highly uncertain and evolutionary nature of contemporary development environments, software projects are expected to be implemented in a shorter time so that users can see actual results rapidly and provide timely feedback for corrections [1], [2]. Therefore, software developers may adopt flexible and empirical process approaches (such as agile) that prioritize workable software over planningbased analysis and design. These approaches prevent the work for tedious documentation due to subsequent system revisions [1]- [3]. However, scholars have argued that the lack of system design documentation leads to adverse effects; numerous systems cannot undergo practical and affordable maintenance and reconstruction because the maintainers do not understand the systems sufficiently [4]- [6].
Given the importance of both system design and rapid development, a software project may implement a system first and then utilize software reverse engineering (SRE) to help trace back and generate the system's design-related The associate editor coordinating the review of this manuscript and approving it for publication was Ricardo Colomo-Palacios. content. In the context of modern system design language such as UML [7], software projects may use SRE to generate graphical UML designs that explain the software. This use of SRE effectively reduces the time and cost of repeated documentation work [5], [8], [9]. Therefore, existing SRE studies mostly report how to automatically and precisely convert implemented code into specific UML diagrams. Commercial or open-source SRE tools (e.g., ModelGoon, Object-Aid, and easyUML) have also been developed to realize this feature.
These studies and tools aim at specific object-oriented programming languages, such as Java, to generate UML diagrams to reflect the system's implemented contents. However, if the original code's quality is poor at the time of programming, the automatically generated UML graphical content can be inherently incorrect. However, in the literature of SRE, little or no research has addressed this challenge; there is a lack of a mechanism to assist in assessing the quality of the converted contents. Because software design verification (e.g., UML quality assessment) is a knowledge-intensive task [3], [10], [11], it requires continuous accumulation of organizational knowledge to help in performing the quality assessment. In the knowledge engineering area, ontology VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ is a crucial research topic and has been applied to various domains. The application of ontology to software development is receiving more attention as it provides a robust feature to facilitate communication and knowledge transfer of organization members in the development tasks [12].
With the aforementioned environment, this paper aims at developing a system for online assessment of design quality in SRE. To achieve this goal, this paper applies ontology and focuses on the structural models, e.g., UML class diagrams, of system design for evaluating implemented systems written in Java code. In the quality assessment of structural design, we focus on cohesion because it is a critical quality concern in system design to help understand, maintain, and modify software more effectively [13]- [18].
The system prototype, named OntRECoh, has the following features. In terms of architecture and concept, OntRECoh comprises a cohesion-centric quality ontology model and a rule-based inference engine to evaluate the cohesion of a tested system and provide improvement recommendations. In terms of cohesion assessment, the proposed work innovatively combines static cohesion, which measures the structure of a system's classes, and dynamic cohesion, which focuses on the runtime behavior of the classes, to assess cohesion quality from both the design and the implementation aspects in the SRE context. OntRECoh also features a Web-based interface to facilitate real-time collaboration in a decentralized development environment. The remainder of this paper is organized as follows. Section 2 reviews related literature; Section 3 presents the design of the proposed work; Section 4 demonstrates and validates the implemented results; Section 5 evaluates the proposed work; Section 6 concludes the study.

II. LITERATURE REVIEW A. SOFTWARE REVERSE ENGINEERING AND QUALITY
SRE entails analyzing a software system's implemented content to identify and establish a high-level abstraction of the system [19]. Traditional software development follows the process of planning, analysis, design, testing, and implementation. The use of SRE is contrary to the traditional development sequence by allowing developers to implement a software system first and then generate the associated design content from the implemented code [13]. SRE can effectively reduce the time and labor costs of a system's design documentation if the system has frequent updates or changes. In terms of the abstract overview of system design, Unified Modeling Language (UML) is a widely used diagramming approach of graphical system representation in software industries. In object-oriented system development, SRE research has focused on transforming low-level system code into highlevel UML diagrams. For example, Keschenau developed an SRE tool for converting Java bytecode into class diagrams that can reflect multiple associations and combined relationships in classes [20]. Shatnawi et al. proposed an approach for reengineering object-oriented APIs into component-based contents [21]. Sarkar et al. and Sunitha have developed methods that can automatically generate UML class diagrams for Java-based systems [22], [23]. SRE has gained increasing attention because of its usability in modern empirical and rapid software development. Despite this, a risk of SRE is the degree to which its produced output can precisely reflect the code. In this regard, few studies have investigated the quality of SRE outputs. For instance, Stepan et al. compared the quality of the SRE outputs from Java code or bytecode [24]. Gahalaut and Khandnor focused on Java code and integrated the two perspectives from developers and analysts in defining SRE performance. Although studies have investigated SRE performance and output quality, most of them have considered how to accurately convert and reflect the code in the form of produced UML diagrams. However, if the code itself is of poor quality, the poor quality can inherently affect the quality of the converted output. This challenge becomes critical in rapid software development when developers are encouraged to implement their systems rapidly without the guidance of methodical software and system design [25].

B. SOFTWARE DESIGN QUALITY: COHESION
In the design of software systems, cohesion is a massive quality concern. It refers to the correlation between members in a software system [16], [18]. A high degree of cohesion maximizes the understandability and maintainability of the system [13], [17]. Cohesion is a critical indicator to measure the quality of software design, and it contributes to the development of other software activities such as software failure prediction [15], [26], software modularization [14], identifying reusable components, and numerous other activities. In object-oriented programming languages, the class is often the design unit, and the members of a class refer to the methods and attributes in the class. In this context, Al Dallal and Briand define cohesion by looking at three types of relationships among class members: (1) methodmethod, (2) method-attribute, and (3) attribute-attribute. A class exhibits a higher degree of cohesion when it has at least two of the above three relationships [18]. Many studies have developed metrics for measuring cohesion. For instance, Briand et al. suggested four quantitative properties for class cohesion measures in object-oriented systems [27]. In particular, the non-negativity and normalization properties denote that the cohesion measure must belong to a specific interval (e.g., [0, 1]); the monotonicity property means that a system's cohesion does not decrease when the cohesion increases in its modules; the property of cohesive modules means that the overall cohesion is not increased when two unrelated modules are combined into one. Al Dallal and Briand also proposed two cohesion measures, namely High-Level Design (HLD) and Low-Level Design (LLD), for different stages of measurement. In particular, HLD is utilized to identify potential cohesion problems in the initial design stage to improve them proactively and reduce the development cost. The LLD cohesion metric can be used for refactoring after system implementation; LLD can produce more system information than HLD to identify less cohesive members for refactoring. Because this study handles implemented system contents in the SRE context, we focus on the LLD measurement [18], [28].

C. STATIC AND DYNAMIC MEASURE OF COHESION
The measurement of LDD cohesion has two categories: static and dynamic. Concerning static cohesion, most measures include two properties of software: (1) model-level information such as classes and their members (i.e., methods and attributes) and relationships (e.g., inheritance and association) between classes; and (2) code-level information regarding message passing, which can be obtained by looking into the code content of the model constructs. One of the earliest static measures, LCOM1 (Lack of Cohesion of Methods), can be traced back to Chidamber and Kemerer's work that calculates the number of paired methods without shared variables. Scholars have extended such work to develop LCOM-related measures [29]. For example, Chidamber and Kemerer proposed LCOM2, which measures the gap between the number of paired methods without shared variables and the number of paired methods with shared attributes [30]. LCOM3, proposed by Li and Henry, expresses LCOM in the form of a graph, in which each method is a vertex, and any instance of attribute sharing is an edge between methods [31]. LCOM4, proposed by Hitz and Montazeri, is similar to LCOM3, except that edges represent the calls between methods [32]. Briand et al. proposed LCOM5, which only considers direct use between methods and attributes to calculate the number of methods used for each attribute [33].
The above cohesion measures focus mainly on the methodattribute relationship in a class. Bieman and Kang proposed Tight Class Cohesion (TCC) and Loose Class Cohesion (LCC); these computed the number of direct or indirect associations between attributes and pairwise methods in the calculation of cohesion [34]. Badri and Badri extended TCC and LCC to propose DC D (Degree of Cohesion-Direct) [35]. Later, Al Dallal and Briand proposed LSCC (LLD Similaritybased Class Cohesion), which uses similarity analysis and comprehensively considers the four quantitative properties to compute the degree of cohesion of a class [28], [33]. Qu  Unlike static cohesion (which measures the structure of classes), dynamic cohesion measures the runtime behavior of classes (i.e., the activated cohesive linkages in classes during system execution) [37]. Dynamic cohesion is useful in various contexts. For example, it can empirically identify unnecessary linkages or calls among class members; and the measurement of polymorphic methods can be obtained only by dynamic cohesion measurements because they are determined during system runtime. The literature has several reports on dynamic cohesion measurement. For example, Mitchell & Power modified the static measurement LCOM into Dynamic LCOM (DLCOM) and Dynamic Call-weighted LCOM (DCLCOM) for measuring class cohesion under a dynamic execution environment [38]. Later, Gupta & Chhabra proposed Dynamic Class Cohesion (DCC). DCC identifies four different types of associations of class members with four corresponding dynamic measures: (1) DC_MA x (Dynamic cohesion due to the 'read' dependency of methods on attributes); (2) cohesion measure DC_AM x (Dynamic cohesion due to the 'write' dependency of attributes on methods); (3) DC_MM x (Dynamic Cohesion due to the 'call' dependency between methods); and (4) DC_AA x (Dynamic Cohesion due to the 'reference' dependency between attributes). Their study also experimentally showed that the accuracy of DCC is better than DLCOM and DCLCOM [37].

D. ONTOLOGY AND SOFTWARE ENGINEERING
Ontology is derived from the field of philosophy; it is used to describe entities or things that exist in the real world and infer their implicit associations. The definition of ontology varies, and it is widely known as a clear specification of conceptual models [39], [40]. Ontological modeling is suitable for describing complex knowledge management [41]. To establish an ontology in a specific domain, one must address the fundamental elements, including concepts, properties, instances, etc. Web Ontology Language (OWL) is a widely used language for implementing practical ontologies [42]. With the emergence of the semantic web, rule-based inference has drawn more attention in ontological applications. Among many rule-based languages, Semantic Web Rule Language (SWRL) is the major W3C rule-based inference language that can effectively make associations with ontological models implemented with OWL [41]. The inference rules in SWRL and SQWRL (Sematic Query-enhanced Web Rule Language) comprise two parts: head and body. In particular, the header refers to the inference result, and the body is the premise of the inference. In describing the body of a rule with SWRL/SQWRL, the notation ''?'' refers to a variable and can represent an instance of a concept or class.
Ontology has been applied to various domains (including artificial intelligence, medical engineering, library science, and software engineering) because ontology can help to formalize a conceptual model of a specific domain to establish and share explicit and implicit knowledge in this field between people and machines [43], [44]. Software engineering and development is a complex and knowledge-intensive process, and research on applying ontology to the software engineering domain has drawn much attention for effectively sharing domain-related knowledge in the collaborative development of software [45]. Numerous software engineering ontologies have been developed. Borghini et al. divided existing software engineering ontologies into several categories: software process ontology, software document ontology, software maintenance ontology, and software quality ontology. In applying ontology to the SRE quality assessment in this study, the subsequent reviews the ontological works in the software quality category further [46].
Regarding software quality ontology, Kayed et al. surveyed 80 research reports related to software quality and extracted a list of terminologies of quality attributes to establish an ontology to provide a general semantic framework [47]. Chen and Tsai developed a requirement quality ontology (OWQFunc) to help identify the hidden requirements that require experiences and knowledge to unveil [48]. Ciancarini et al. established an ontological model (SQuAP-Ont) to define software quality relational factors for financial information systems. In summary, these works provide useful references for modeling the quality of knowledge products in various stages of software development [49]. For the software development that applies SRE to generate UML-based design contents that are highly subjective and knowledgeintensive [50], researchers can use ontology effectively to ensure the quality of the produced design.

III. RESEARCH METHODOLOGY A. FRAMEWORK
This section presents the design of the proposed work, Online Ontological Cohesion Assessment for SRE Design Quality, abbreviated as OntRECoh. As a research prototype with a focus on cohesion, OntRECoh has the following features. First, the cohesion measurement innovatively comprises both the static and the dynamic aspects for the quality assessment to be highly comprehensive. Second, OntRECoh is designed for operating in the client-server Internet environment so that users to use it without temporal and geographical restrictions. Third, to enable knowledge sharing, OntRECoh features an ontology model with the code-first and cohesion design concepts and a rule-based inference engine for providing improvement recommendations for refactoring. Fourth, users can review the generated improvement recommendations and provide feedback for updating OntRECoh to increase the correctness and completeness of its knowledge body.
The design of OntRECoh comprises three main functions: (a) loading data, (b) instance creation, and (c) results demonstration. Fig. 1 illustrates the operational workflow associated with the three functions. Specifically, the function of loading data refers to the first three steps in the figure, including the collection of the static and dynamic cohesion data of a tested software system. That is, OntRECoh obtains (1) static cohesion data by converting the structural diagramming information that is from an existing SRE tool and (2) dynamic cohesion data that is from the local development environment. Then, instance creation refers to Step 4 that instantiates the ontological model based on the abovecaptured cohesion information. Finally, results demonstration refers to Step 5, which consists of two parts: (i) the computation of static and dynamic cohesion scores and (ii) the recommendation for the structural reconstruction of the tested system to improve cohesion.

B. DATA COLLECTION: CAPTURE OF STATIC AND DYNAMIC COHESION INFORMATION
The first step of the proposed work is to collect the static and dynamic cohesion information in SRE contexts. After converting the implemented code of a system into the UML structural design outputs. For the consistency of the collected data with the structural content of the cohesion assessment, this paper first defines the structure of to-be-assessed software systems into the following levels: project, package, class, and class member. The class member level further consists of attributes and methods. By following the three relationships suggested by Al Dallal and Briand [18], cohesion among class members is formed based on the following scenarios: method call, attribute call (e.g., a method retrieves the data from an attribute), and assignment between two attributes. The cohesion between methods and attributes further includes two subtypes: reading data from attributes and assigning data to attributes. With the structure mentioned above, static cohesion information is captured from the structural UML class diagrams generated by an SRE tool. The information includes the data regarding the members of a tested system as well as the data of method calls, attribute calls, and assignments between two attributes. As for the dynamic cohesion information, it is captured from the execution of the tested system. For doing so, we apply AOP (Aspect-Oriented Programming) to record the following four system behaviors of point-cuts: method call, method execution, reading values from attributes, and writing values to attributes. For instance, as the simple program in Fig. 2. indicates, the a.display() method call in main_class represents a pointcut. Therefore, a set of dynamic cohesion information can be obtained: main_class.main() and Aspect.display().

C. MODEL CONSTRUCTION AND INSTANCE CREATION
After the server receives the data for assessment, OntRECoh creates the instance content for the tested system based on the established ontological cohesion model and then evaluates the design quality through rule-based inference. The establishment of the ontological model comprises two steps: building concepts and defining rules [40], [48]. After the concepts and rules are set, instances can be created accordingly. To facilitate the establishment process, this study utilized Protégé, an ontological model editor developed by Stanford University, and then used Jess, a plugin module associated with Protégé, for the development of the rule-based inference engine [3].
Specifically, in this study, four ontology concepts represent the static design structure of a system: Project, Package, Class, and Member. The Member concept further consists two sub-concepts: Attribute and Method. Fig. 3. shows the structural relationships between these concepts and highlights the associated properties (the black dots in the figure) to be used in the proposed inference process. In particular, static cohesion can be identified based on a call between methods and attributes and the attribute reference. Moreover, because the classes of an objectoriented system may form an inheritance relationship, OntRECoh identifies the inheritance relationship through rule inference. The three inference properties is_Descendant, can_Refactoring, and is_Special are further explained. First, the is_Descendant property denotes whether the underlying class or the member in a class inherits from another class. Because it is difficult to know whether they inherit from a parent class (if any) when recording the members for a class, it is necessary to infer the inheritance feature of the members through the is_Descendant property. The can_Refactoring property denotes whether the inference method is relocatable. According to Tsantalis and Chatzigeorgiou's study, refactoring is not available to the static or inherited methods [49]. The is_Special property denotes if the inferred method is of a special type (e.g., constructors and service methods) that should be exclusive in computing cohesion [37].
As for the dynamic cohesion, the model defines the following properties for collecting dynamic cohesion information: Invoked (Read/Write), is_Executed, and Referenced. Specifically, the two properties, Invoked_Read and Invoked_Write, are used to identify if a method during the runtime reads data from or writes data to an attribute. The is_Executed property denotes the actual usage of a method to justify the necessity of the linkage of method-call. The Referenced property indicates whether two attributes are interconnected, including assigning the value of one variable to another. Such an interconnection between two attributes is an indirect relationship and is identifiable through the method in which the interconnection occurs. Hence, such a property is also an inferred property, and the inference starts with finding out the attributes used by the methods whose is_Executed property is true and then sets the Referenced property to be true for the paired attributes.
To perform the inference and to set the properties, a set of rules are defined for the ontology model to generate instances and then infer the values of the properties stated above. Through the ontological model and the associated rules, we can obtain implicit knowledge that hides in the relationships [50], [51]. Table 1 presents the rules in SWRL and the associated functionality for the ontological inference in this study.

D. RESULTS DEMONSTRATION: COHESION SCORES AND IMPROVEMENT RECOMMENDATION
This study evaluates both the static and dynamic aspects of the cohesion design of an implemented system. Besides the static measures that evaluate the structural design of a tested system, dynamic cohesion measures the performance of the design by looking at the real-time operation of the system. Hence, the work of evaluation in OntRECoh comprises the computation of static and dynamic measures and the generation of improvement suggestions based on the measurement.
The proposed static cohesion measurement is based on LSCC [28] so that it not only satisfies the four quantitative measurement requirements but considers the implicit relationship due to inheritance and the special types of methods [33]. However, because LSCC is limited to a single class only, the static cohesion measure in this study extends to measure the cohesion for the entire system.
In particular, the measurement of static cohesion in this study begins with looking into the shared attributes between  In a MAR matrix, the similarity value between two columns quantifies the cohesion between a pair of methods (i and j). Al Dallal and Briand (2012) define ns(i, j) as the number between i and j rows. The degree of similarity is indicated in formula (2), where Y is the number of columns of the matrix. The degree of similarity is used here to highlight the cohesion between the two methods. The higher the degree of similarity, the greater is the cohesion.
In LSCC, the formula, Moreover, the pairwise method similarity of the i th column can be expressed as , where x i represents the number of 1s in the MAR matrix; therefore, lk(k−1) . In addition, as indicated in formula (3), the value of LSCC is between 0 and 1, and k and l represent the number of methods and attributes in the tested class C, respectively. The higher the value, the stronger is the correlation between method and attribute in the class.
As for the dynamic measure, it is developed to comprehend the performance of the structural linkages and present a system's actual cohesion during its runtime. This study extends DCC to establish DOC x , an overall dynamic cohesion measure of a class for comprehensively considering the four cohesive properties of the data and method members in the class [37]. In particular, the first property refers to dynamic cohesion due to the write-in dependency of attributes on methods, namely DC_AMx. Suppose that m and n are the number of attributes and methods in a class. As indicated in formula (4) below, the parameter r R W (e R i , e R j ) represents the attributes that are accessed in a method of class c by writing data to the attributes, and O is the collection of all classes in the system. The value of DC_AMx is between zero and one. In the case that has a method with no attributes or an attribute but no method, the value for such a write-in property is zero.
The second property is dynamic cohesion due to the read dependency of methods on attributes, and is denoted as DC_MAx. As indicated in formula (5) below, the parameter r R R (e R i , e R j ) represents the attributes in a class that has been read by a method. The value of DC_MAx his between zero and one. The zero value denotes that the read behavior does not occurs during the system's execution time.
The third property, dynamic cohesion due to the call dependency between methods and denoted as DC_MMx, is expressed in the following formula (6). The expression r R C (e R i , e R j ) represents the methods in this class that are actually called. The value of DC_MMx ranges from 0 to 1, and the values of zero or one occur when there exist no methods or only one method in the class.
The last property, dynamic cohesion due to the reference dependency between attributes, is denoted as DC_AAx. As indicated in formula (7), r R RF (e R i , e R j ) represents the number of paired attributes that are referenced in the same method. In Java code, the linkage between two variables is realized through the operation of an assignment inside a method. If the class has no attributes, such a linkage would not occur and thus the value of this property is 0. If the class has only one attribute, the cohesion is the maximum value, i.e., 1.
Taken together, formula (8), shown at the bottom of the page, denotes that the dependence relationship mentioned aforementioned with a weight value wi to highlight different emphases on the properties. For example, if the designer prioritizes the information security of an entity class and the data of its objects may be altered by its own methods. In this case, a larger weight can be assigned to the property DC_AMx to highlight this emphasis, (8) Because the LSCC mentioned above and DOC can only measure cohesion for individual classes, this study extends the measure to consider Mean Absolute Deviation (MAD) and define LSCC MAD and DOC MAD , for a holistic understanding of the system's cohesion, as indicated in formula (9) and (10) respectively. In computing the degree of dispersion of cohesion score for n classes in a system, the smaller the value of LSCC MAD or DOC MAD , the lower the degree of dispersion is.
After computing the cohesion value, OntRECoh provides the improvement recommendation for refactoring, i.e., changing the internal structure to improve cohesion [52]. To provide improvement recommendations, OntRECoh applies the suggestion of Al Dallal [53], that is, to identify candidate classes for remodeling and relocating certain members in the classes. In consideration of class members for relocation, the method members with lower cohesion were selected. Such an identification process may repeat until LSCCMAD and DOCMAD meet the designated values.

IV. SYSTEM IMPLEMENTATION A. OPERATING ENVIRONMENT
This section presents the implementation of OntRECoh. First of all, Fig. 4. shows the screenshots when entering the system. The following content demonstrates the system by following the workflow in Fig. 1. Specifically, in the implementation of Step 1, this study uses Eclipse, a freely IDE for Java software development environment. To capture the static information, i.e., Step 2 in the workflow, OntRECoh is associated with ModelGoon, an open-sourced SRE tool that is suitable for the academic and research purpose of prototype development. For the function that captures the dynamic information, the system utilizes the AspectJ package that is from Eclipse to help record the execution traces when executing a tested software system. We also developed an Eclipse plugin tool named OntRE-Coh_plugin that is operated in the local Java development's environment. OntRECoh_plugin converts the extracted static and dynamic information into the JavaScript Object Notation format through HTTP and then transfers the converted data to OntRECoh. At the OntRECoh server-side, this study implements Step 4 by utilizing Apache Jena to formalize the ontology model based on the loaded static and dynamic data and perform rule inference accordingly. Then, OntRECoh computed the cohesion scores and generated corresponding suggestions for improvement, i.e., Step 5 in the workflow. The subsequent sections further demonstrate the implementation of the proposed work.

B. EXTRACT THE STATIC AND DYNAMIC INFORMATION OF AN EXAMPLE CASE
To simplify the demonstration and focus on the work proposed, we use a simple case, the ATM system from [54]. For a better demonstration of the performance of OntRECoh, we modify the code in the original system case to decrease the cohesion quality of the system. Specifically, three methods, i.e., displayMenuOfAmounts(), debit(), and validatePIN(), are moved from Withdraw, BankDatabase, and Account to BankDatabase, BalanceInquiry, and CashDispenser, respectively. By doing so, we can test whether OntRECoh can identify the mistakes and provide expected improvement recommendations, i.e., suggesting moving the misplaced method members back to the original classes. Table 2 below shows the members of the ATM system case after the modification: After loading the tested system (the ATM case) into the local IDE (i.e., Eclipse), ModelGoon is invoked to generate the UML class diagram of the system. Then, the dynamic static information is collected by running the ATM system through the IDE environment, as illustrated in Fig. 5. Note that the tested system is executed according to the user requirements and the associated operating scenarios. After the completion of execution, both the static and dynamic information sets are transferred to OntRECoh for the next step of performing the SRE quality evaluation. Table 3 below is a screenshot of the computed static and dynamic cohesion scores. The second column in the screen refers to the static cohesion score (LSCC) of each class. The third column is the dynamic cohesion score (DOCx) of each class. The row at the bottom of the screen is the two MAD values representing the overall static and dynamic cohesion scores. The results shown on the screen are further verified. Specifically, for the static cohesion scores, they are inspected with formula (3) as follows:    6 is a screenshot of the refactoring recommendation result generated by OntRECoh. In particular, OntRECoh identifies three candidate methods (debit(), displayMenu-Amounts(), and validatePIN()) for improvement and further suggests moving these methods to the classes (BankDatabase, Withdrawal, and Account) to increase the cohesion of the ATM system. Such a result is consistent with the original content of the system.

V. EVALUATION AND RESULTS
In this section, the proposed work is evaluated. The evaluation in this section focuses on three parts, i.e., the design, the expected output, and the performance of the proposed work. Specifically, the assessment of the methodological design focuses on the proposed cohesion metric, the validation of the expected output is performed by walking through a sample case, and the performance evaluation uses several actual cases for statistically analyzing the performance of the work.

A. EVALUATION OF THE METRIC DESIGN
In evaluating the design of the measures, two approaches were employed, i.e., property-based evaluation and principal component analysis [18], [28], [55], [56]. In the propertybased evaluation, we utilize the six property requirements for evaluating cohesion metrics [56]. As Property 1 states, a usable metric should be calculated without having complex mathematical functions. In OntRECoh, LSCCMAD and DOCMAD extend two existing metrics, i.e., LSCC and DOC, which are scoped to a single class only, to compute the overall cohesion score for the entire system. Based on the inherited property, the mathematical body of the metrics is understandable, and the extended part of the proposed metrics is not VOLUME 9, 2021 complicated. For Property 2, i.e., the measurement should be language-independent, although OntRECoh measures Javabased systems, it evaluates the cohesion of the systems by looking at the class diagrams that are the abstract-level of system representation and is language independent. Property 3 and 4 state that a measure should be developed on a proper scale and the metrics in a metric suite should be consistent. In this study, the measurement scale of the proposed metrics produces has a rational range that is from zero to one, and the resulting values of both metrics indicate that the higher the value, the better the cohesion design of the tested system. Therefore, the two properties are satisfied. For Property 5, i.e., the foundation of a metric should be able to be explained and visualized. In the proposed work, LSCC-MAD and DOCMAD are derived from LSCC and DOC. The theoretical foundation of the computation is therefore traceable and verifiable. For visualization, the computed scores of two metrics are presented along with the cohesion values of the comprising classes of the tested system. Lastly, for Property 6, i.e., the metric should give a cohesion as a positive number, is satisfied since the values of the metrics proposed are positive or zero.
In addition, we performed principal component analysis (PCA), as it is commonly used to compare and verify newly developed metrics with the existing ones [18], [28], [57]. PCA is a statistical analysis method that linearly transforms the observations of a series of values for possible related variables (i.e., the compared metrics in this study) to analyze their main components [18], [28]. It can effectively reduce the complexity of the observed variables to reveal the data's internal structure and scrutinize the variability of the data [58]. Thus, in this study, we utilize PCA to observe the differences between the compared metrics and then conclude the more representative measurement effect [18]. The analysis comprises two levels, the product, and its component level. At the product level, OntRECoh is the first measurement that comprehensively includes both the static and dynamic aspects in evaluating the cohesion of implemented systems in the context of SRE. Looking into the component level, the static aspect of the proposed cohesion measurement is analyzed, and the existing metrics LCOM5, DCC, TCC, and SCOM [18], [59] are selected for the comparison since they are highly related to the proposed metric suite of OntRECoh. To report the PCA analysis, we continue to use the ATM case. The comparison is analyzed using a correlation matrix and loading matrix [18]. Table 4 below shows the correlation matrix calculated by the nonparametric Spearman correlation coefficient, in which the justification of a p-value (0.05) indicates that the components are higher correlated. The correlation values of DCC and LCOM5, i.e., 0.53 and 0.04, are closer to the p-value, meaning that the two metrics correlate to OntRECoh and thus are considered as the same group. As for the correlation among OntRECoh, TCC, and SCOM, the values are -0.33 and -0.42, indicating that they are not significantly related to OntRECoh.
Next, we utilize the results in Table 4 to compute further the loading matrix of the PCA experiment. The loading matrix helps us to group the observed components and identify the differences effectively. We obtain the principal components (PCs) by applying the varimax rotation technique [28], [60]. In this technique, the eigenvectors and eigenvalues are calculated and used to form the PCA loading matrix that transforms the original variables into weights for calculating the correlations between the variable and the principal components and highlighting the feature of the proposed metric. Based on the analyses of three different principal component groups (i.e., PC1∼PC3), Table 5 shows the results for the three analyses, in which each PC indicates the influential metrics contributing to the captured dimension. According to Hair [61], when the coefficient value is above 0.4 and below 1 in the loading matrix, the factor loading is significant and requires verifying the commonality of the variables of the group, i.e., each column of Table 5. Specifically, in the table, for PC1, the values of OntRECoh, LCOM5, DCC, and TCC are 0.72, 0.8, 0.53, and 0.42 respectively, thus they are classified as one group. These metrics similarly consider the number of attributes or methods and their interactive calls in classes as the basis in measuring cohesion. For PC2, the values of OntRECoh and DCC are 0.64 and 0.41. The commonality of these two metrics is that they both have dynamic cohesion for measuring the runtime behavior of classes. For PC3, the value of OntRECoh is 0.81 while others are insignificant, indicating that OntRECoh is structurally independent of the other metrics. In this study, OntRECoh innovatively includes both static and dynamic cohesion measures, and such an independence characteristic is identified and highlighted by PC3. In other words, the measurement information of cohesion captured by OntRECoh is more comprehensive and is not accomplished by other cohesion metrics, thus confirming its innovativeness.

B. EVALUATION OF THE SYSTEM AND THE OUTPUTS
In this section, the process and the output of the proposed work are evaluated. To do so, we use walkthrough [60], [61], a known verification approach in software engineering that is useful in examining how the execution and the output of a proposed system meet the expectation. Specifically, we continue the scenario in Section 4 for the same contextual basis to examine the cohesion scores and the associated suggestion output, i.e., moving the three methods back to the classes where they should correctly reside. Fig. 7 below shows the reproduced class diagram of the ATM case that was improved based on the suggestion illustrated in Fig. 6. Fig. 8 is a screenshot of the evaluation results. Specifically, the values of LSCCMAD and DOCMAD increase (The static cohesion score increases from 0.7014 to 0.8723, the dynamic cohesion value from 0.9725 to 0.9875), and the average values of LSCC and DOC increases from 0.4363 to 0.6113 and from 0.215 to 0.24 respectively, meaning that the dispersion of classes in the system improved. Regarding the cohesion of the member classes individually, the static cohesion scores of BankDatabase, Withdrawal, and Account increased from 0 to 0.6, from 0.2 to 0.67, and from 0.33 to 0.67, respectively, indicating that their static cohesion significantly improved. The dynamic cohesion of the subject classes also increased, i.e., the class BankDatabase is increased from 0.18 to 0.23, class Account from 0.21 to 0.3, and class Withdrawal from 0.2 to 0.33.

C. EVALUATION OF ACCUMULATED SYSTEM PERFORMANCE
In this section, the system's performance is empirically evaluated. In software and reliability engineering, the experimental approach [62] is often used to examine statistically the performance of a proposed software work through a number of test cases. Therefore, we conducted an experiment with hypotheses to examine the performance of the proposed work. In doing so, six real software cases were utilized to examine the accumulated performance of OntRECoh. The cases were from a known open-source Java empirical project on Github. These cases were selected due to different sizes, for the purpose to examine the empirical performance of OntRECoh. Notably, the largest case in the case list, Elasticsearch, is one of the most popular Java projects on Github, with 53,000 GitHub stars and 18.4K GitHub forks. The primary function of Elasticsearch is a distributed search engine built for the cloud. Table 7 shows the information of these examples.  The analysis comprises two steps. The first step began with performing OntRECoh to compute the cohesion scores for the six cases and generate the corresponding improvement suggestions. Then we modified the structural contents for each of the cases according to the improvement suggestions. After that, OntRECoh was performed again to evaluate the modified cases to obtain the improved cohesion values. In the second step, the values generated by OntRECoh were manually inspected and computed. The verification and the results are illustrated in Fig. 8 and are discussed in the following.
As the figures show, cohesion is improved for all of the cases tested. Specifically, after refactoring by following the improvement suggestions, the cohesion values of LSCCMAD, LSCCAverage, DOCMAD, and DOCAverage were increased by 28%, 44%, 21.5%, and 63%, respectively. To further verify this result, a null hypothesis set was established as follows: H0: OntRECoh does not significantly help improve system cohesion H1: OntRECoh significantly improves system cohesion The Paired Sample T-test was employed to test the hypotheses. Specifically, the samples of the cases are pairdependent, i.e., to evaluate the effect of the proposed system by comparing the cohesion scores of the cases before and after using the system. Therefore, Paired Sample T testing is eminently suitable for testing the p-value, and if the p-value is <0.05, the null hypothesis is rejected. According to the result of the test for static cohesion LSCC, the two-tailed test p-values for the cases tested are 0.003, 0, 0.012, 0.005, 0.003, 0, and 0.023, which are all <0.05. In terms of the dynamic cohesion DOC, the two-tailed test p-values are 0, 0.001, 0.013, 0.03, 0, 0.011, and 0.017, which are all <0.05 also. Therefore, the null hypothesis of H0 is rejected and H1 is accepted, indicating that the system (OntRECoh) has a statistically significant effect on the improvement of software cohesion.

D. VALIDITY ANALYSIS OF THE EVALUATION
Because the evaluation involved multiple cases, this paper addressed four types of validity threats [63]- [65] with the following treatments. Construct validity relates to the research design, i.e., how we create the measurement instruments to measure the effects [63], [66]. In the evaluation, the composition of the test cases was reviewed to help ensure the validity of the design. Specifically, the five cases had a comprehensive composition of implemented Java classes that realized the three principal structural associations for effective cohesion assessment such that the examination can completely cover the scope of the proposed work. External validity is addressed in two aspects. First, when operating a new system, familiarity is critical to the stability of system performance. In the evaluation, the testers who operated the system were trained to be familiar with the selected SRE tool, the Java IDE, and the proposed system. As the system is widely used, we suggest providing a proper user manual instructing the cooperation of the software components in OntRECoh to minimize this threat. The second aspect refers to the generalizability of system output [64], [65]. Although the evaluation demonstrated a good result, it was conducted with the cases scoped to Java applications; this scoping constituted a limitation concerning the application of the performance result.

VI. CONCLUSION A. RESEARCH CONTRIBUTION
This paper has presented a knowledge-based ontological system for online quality evaluation on the converted UML structural models in the SRE environment. Several research contributions are summarized as follows. First, this study contributes to extant SRE literature by innovatively adding quality assurance to the automated reverse engineering process for the work of SRE to be more comprehensive. As existing SRE research majorly focuses on the automation of conversion from software code to design contents, this study extends to provide an online quality evaluation approach and an implemented system to ensure the design quality of the reversed contents. Second, in evaluating the design quality in terms of cohesion, this research contributes to including both static and dynamic cohesion and extends to examine the structure of a system from not only the design (static) perspective but also the runtime performance (dynamic) aspect.
Third, in measuring the cohesion, the proposed work extends the existing metrics that focus on a single class to compute the degree of dispersion of the classes for the entire system, for showing the mutual relationship among classes when considering the system as a whole in the attempt to improve the overall performance of the system. Fourth, the assessment of software design quality is knowledgeintensive, and it requires continuous feedback and updates about newer experiences and knowledge. This study has provided an ontology for not only assimilating and exploiting an explicit knowledge model of cohesion design in the assessment of software design quality but also enabling case-based implicit knowledge inference for future cohesion improvement.

B. RESEARCH LIMITATIONS AND FUTURE DEVELOPMENT
Based on the current development of this study and the proposed system, we highlight several research limitations for future research and development. First, the dynamic cohesion metric in this study includes the four parameters in observing the runtime activation of the linkages among class members. Future studies can extend to allow users to customize the weights of the parameters. Second, some classes are inherently less cohesive, such as abstract classes or interfaces, which may intrinsically affect the validity of the quality assessment result. In the future, the formulation of the computation can further allow users to assign weights for various classes to avoid this potential confounding effect and for the outcome of cohesion scores to be more explanatory. Third, due to a research prototype, the proposed work is evaluated with an experimental approach. Readers can expect that the further development of the work will extend to include an empirical investigation. Besides, the improvement suggestion information is text-based in the current version of the prototype; further research can extend to present the information on the displayed UML diagram.
Lastly, the proposed design quality evaluation in the SRE context focuses on cohesion, which is also a limitation of this study. However, as system design quality includes other factors, such as security, complexity, and coupling, future research can extend to include these quality concerns and integrate with current work for establishing a more comprehensive scheme of design quality evaluation in SRE.
[66] D. A. Broniatowski  His research interests include enhancement of information systems, such as software development and process improvement, project management, and quality management systems, such as data/information quality, software quality, and software engineering education.
KUANG-YEN TAI received the M.S. degree from the Department of Information Management, Tunghai University, Taiwan. He is currently pursuing the Ph.D. degree with the Department of Information Management, National Taiwan University, Taiwan. He is currently a Lecturer with the Center for General Education, Tunghai University, Taiwan. His main research interests include software engineering, artificial intelligence, information security, and parallel computing.
SIN-SIAN CHONG received the M.S. degree from the Department of Information Management, National Central University, Taiwan. He is currently served as a Software Engineer with TITANSOFT. His research interests include software engineering and project management. VOLUME 9, 2021