Revisiting Scenarios of Using Refactoring Techniques to Improve Software Systems Quality

Refactoring is one of the most widely used techniques in practice to improve the quality of existing software. However, it is observed that refactoring does not continually improve all software quality attributes. Recent studies indicated that different refactoring techniques have significantly different, sometimes opposite, and conflicting effects on software quality attributes. In other words, there is contradictory evidence on the refactoring benefit. As a result, developers face challenges in selecting appropriate refactoring techniques when they use them to improve software quality. To the best of our knowledge, no study has investigated factors that may explain inconsistent or diverging results concerning the effect of refactoring techniques on software quality. Therefore, in this study, scenarios of using refactoring techniques factor have been identified, investigated, and thoroughly analyzed. Ten of the most commonly used refactoring techniques in practice have been chosen and individually applied in seven case studies of varying sizes (small, medium, and large). The Quality Model for Object-Oriented Design (QMOOD) is used to assess how refactoring techniques affect quality attributes. The findings provide strong evidence that this factor plays a significant role in producing the various effects of refactoring techniques on quality attributes. These findings can help software developers understand how to use refactoring techniques to improve software quality while taking this factor into account. The best scenario for using each refactoring technique to improve software system quality has been identified. The findings can provide guidelines for software developers to use refactoring techniques to improve the quality of software systems based on the best scenarios of using the refactoring techniques.


I. INTRODUCTION
Software systems always undergo modification to codes and associated documentation because of a problem or the necessity for improvement [1], [2]. Thus, software maintenance has The associate editor coordinating the review of this manuscript and approving it for publication was Hailong Sun . become an integral component of software development and management [3], [4] and an essential activity for any software system [5]. The maintenance process includes the essential tasks that aim to preserve the integrity of the existing software system [6]. These modifications are incremental and aim to either update some functionalities or correct some design flaws and fix some bugs [2], [6].
These software maintenance activities become more complex when the size of the system and the number of requirements increase over the time [6]. In fact, the cost of software maintenance and evolution activities represents more than 80% of total software development costs [2], [7], [8], [9]. It has also been shown that software developers typically spend around 60% of their time understanding the code they are maintaining [9], [10]. An unstructured code (i.e., poor software design) increases the complexity and is considered one of the major maintenance problems that lead to raising significantly the maintenance cost [2], [11]. It has been reported that poor software design costs more than USD 150 billion yearly in the United States and more than USD 500 billion yearly worldwide [12].
Fortunately, the cost of the software maintenance and evolution activities can be significantly reduced by the software refactoring process [7], [13], [14]. Refactoring is considered one of the most significant practices of software maintenance and evolution [15], [16] and it has become a crucial part of software development practice, especially with the everchanging landscape of IT and user requirements [10]. Software refactoring is defined as an approach that aims to improve software design quality by restructuring the internal design of a software system without changing its functionality. There are 68 original refactoring techniques categorized into six categorizations as proposed by Fowler et al. [17], [18]. Based on this definition of software refactoring, it has a strong relationship with software quality attributes [5]. In this regard, previous empirical studies have tested the effect of refactoring techniques on software quality attributes [16], [19]. In other words, these studies investigated whether applying refactoring techniques improves internal and external quality attributes or not. Based on an analysis of the related literature, these studies reported mixed results which included: • The refactoring techniques have positive effects on the software quality [20], [21], [22], [23], [24].
• The refactoring techniques have negative effects on the software quality [25], [26].
• Effect of the refactoring techniques on the software quality is unclear [30], [31], [32], [33]. It can be noted that refactoring is one of the most widely used techniques in practice to improve the quality of existing software. However, it is observed that refactoring does not continually improve all software quality attributes. Some refactoring techniques have appeared with different effects on software quality in different studies.
Therefore, the consensus among researchers regarding the effect of refactoring techniques on internal and external quality attributes is debatable. While some researchers confirmed that applying refactoring techniques would have a positive effect on software quality, other researchers have argued that this is not always true [13], [34]. In other words, there is conflicting evidence on refactoring benefits [35]. As a result, the developers encounter challenges to select a suitable refactoring technique that can improve certain software quality by removing poor design flaws [36], [37], [38].
The rest of this paper is structured as follows: Section II discusses related work. Section III describes the experimental design, while Section IV includes the results and discussion. Section V discusses threats to validity. Section VI concludes and outlines our future research directions.

II. RELATED WORK
This section discusses the techniques used to identify opportunities to use refactoring techniques based on Machine Learning (ML) algorithms. Then, we discuss the various effects of refactoring techniques on software quality attributes.

A. MACHINE LEARNING-BASED REFACTORING PREDICTIONS
The prediction of refactoring refers to identifying the locations (i.e., classes, methods) in the software system that need to be refactored. Machine learning (ML) techniques have been used to detect code smells and predict refactoring. Ratzinger et al. [39] conducted an empirical study on two open-source software projects to identify where refactoring can be applied. They used classification algorithms (J482, LMT, Rip, and NNge) to predict where in the code should be refactored based on the project development history obtained from the Concurrent Versions System (CVS). Al Dallal [40] carried out an empirical study to predict the opportunities of using the Move Method refactoring technique in the classes. Logistic regression models were applied to create the prediction models on seven open-source projects of different sizes (one large-scale project and the others were medium and small projects).
Kumar and Sureka [41] discussed the need to create an automated tool to assist software developers in automatically identifying methods and classes that require refactoring. They proposed a technique for predicting the opportunities for applying refactoring techniques at the class level based on the application of ML methods. They used LSSVM as the learning algorithm, PCA for feature extraction, and SMOTE for dealing with imbalanced data. They ran a set of experiments on seven open-source software projects of different sizes (two large-scale projects and the others were medium and small projects), where the classes refactored were manually validated. In another study conducted by Kumar, Satapathy, and Krishna [42], LSSVM and SMOTE machine learning techniques were applied to five software projects to predict the opportunities for applying refactoring techniques at the method level.
Kumar, Satapathy, and Murthy [43] conducted research on five software projects of different sizes to identify methods that require refactoring. Ten ML techniques were considered, and 25 code metrics were used to predict refactoring opportunities at the method level. The results show that SMOTE and VOLUME 11, 2023 RUSBoost outperform other data sampling techniques, while AdaBoost and ANN + GD outperform others.
Panigrahi, Kuanar, and Kumar [44] proposed a model for predicting the opportunities of using refactoring at the method level by using three Naïve Bayes classifiers (Bernoulli (GNB, MNB, BNB), Gaussian, and Multinomial). The results of the experiment on the performance of the three Nave Bayes classifiers demonstrated that the Bernoulli Nave Bayes classifier outperforms the other two classifiers in terms of accuracy.
Aniche et al. [45] examined the efficacy of six ML methods (LR, NB, SVM, DT, RF, and NN) in predicting refactoring opportunities at the variable, method, and class levels. The results demonstrated that Random Forests (RF) was the best model for predicting. Alenezi, Akour, and Al Qasem [46] used the GRU algorithm to predict the opportunities for using refactoring at the class level on seven open-source projects of various sizes. The outcomes demonstrated promising performance results.
Sagar et al. [47] used LSTM models and several ML methods to predict refactoring opportunities at the class level in any project. According to the results, the random forest algorithm trained with software metrics produced a better accuracy rate. Nevertheless, there was a variation in the result obtained per class, indicating that some refactoring techniques are more difficult to detect than others.
Recently, Akour, Alenezi, and Alsghaier [48] investigated the efficacy of using a Support Vector Machine (SVM) in conjunction with two optimization algorithms (genetic and whale algorithms) in predicting refactoring opportunities at the class level on four open-source projects of different sizes. The obtained results outperformed the predictions of the other machine learning methods.
Nyamawe [49] proposed the ML technique and trained to utilize the historical data of previously executed refactorings that were found using both conventional refactoring detectors and commit message analysis. The strategy used a multi-label classifier to suggest necessary refactorings and a binary classifier to predict which refactoring was needed. The outcomes demonstrated that the method greatly outperformed the alternatives.
There is no automated tool to help software developers in identifying methods and classes that require refactoring based on ML-based refactoring prediction techniques. ML-based refactoring prediction techniques are concerned with identifying opportunities for using refactoring in software systems based on code metrics analysis. However, the majority of these ML techniques do not identify which of the 68 refactoring techniques should be used at a specific location in a software system. Move Method or Extract Method refactoring techniques, for example, can be used to remove feature envy code smell. However, ML-based refactoring prediction techniques were unable to determine which better refactoring techniques could be used to eliminate feature envy. When choosing appropriate refactoring techniques, it is necessary to investigate their effects on internal and external quality attributes. In other words, ML-based refactoring prediction techniques are unable to analyze and predict the positive effect of refactoring techniques on software system quality.
Therefore, this study focuses on analyzing the effect of refactoring techniques on internal and external quality attributes in order to enable software developers to select the best refactoring techniques that can improve the quality of a software system based on the best scenario for using them.

B. DIFFERENT EFFECTS OF REFACTORING TECHNIQUES
Researchers referred to some factors that may play a role in different effects of the refactoring techniques on quality attributes. Some of these factors are described as follows:

1) REFACTORING TOOLS
Kim et al. [35], [50] indicated that the existing refactoring tools are error-prone and, therefore, the use of these tools may result in incorrectly refactored code parts being produced. As a result, the use of these tools sometimes negatively affects the quality of the code [34]. For example, Kaur and Kaur [51] used the Miner refactoring tool and found that the Move Method increased complexity. On the other hand, Chavez et al. [52] used JDeodorant tool to apply this refactoring and found that the Move Method did not affect complexity. That means applying refactoring techniques using different tools may produce different effects of refactoring techniques on quality attributes.

2) SOFTWARE SIZE
In object-oriented programming, the size of the software system can be determined by the number of classes included in the software system. Table 1 summarizes the size categories of the software systems [13], [53], [54], [55]. Different software sizes (small, medium, and large) were used in the studies. Therefore, the application of refactoring techniques through different sizes of software systems may result in different effects of these techniques on quality attributes. According to Kaur and Singh [13], the use of different sizes of software systems may be one of the reasons behind inconsistent or divergent results regarding the effect of overall or individual refactoring on the quality of software. Elish and Alshayeb [36], [56] applied refactoring techniques to small-scale systems only at the class level to categorize refactoring techniques based on certain attributes of software quality. They, therefore, indicated that the use of small-scale systems would be a concern when studying the effect of refactoring techniques on the system level [31], [36], [56], [57].
Kumari and Saha [58] applied refactoring techniques at the class level and indicated that the results may vary when implemented across the system. However, no empirical study has been presented in the literature to confirm or deny the effect of the software size factor.

3) SCENARIOS OF USING REFACTORING TECHNIQUES
Fowler et al. [17], [18] described the mechanics to apply each refactoring technique. However, for some refactoring techniques, there are different scenarios to apply these mechanics. For example, the Extract Method has different scenarios for extracting it based on the method access modifier (i.e., public, private, protected). The Move Method also has different scenarios to move it, such as removing the method from the source class and placing it in the target class or leaving the method as a delegating method in the source class when it has many references. According to Al Dallal and Abdin [34], for some refactoring scenarios, there are several existing techniques to apply refactoring scenarios, and these techniques could potentially produce different refactored code pieces. Oliveira et al. [59], [60] showed that refactoring mechanics of the refactoring technique have different results in the context of IDE developers. Kumari and Saha [58] indicated that the results of refactoring effects on software quality may vary when the scenario of using the refactoring technique is changed. Therefore, different scenarios for using a refactoring technique may have different effects on quality attributes. However, no scientific work has been found in the literature to support or refute the effect of scenarios for using refactoring techniques on quality attributes.
To the best of our knowledge, no study in the literature has investigated and analyzed the effect of the scenarios of using the refactoring techniques factor on the quality of software systems. Therefore, in this study, we have presented an experimental study to investigate and thoroughly analyze the effect of the scenarios of using refactoring techniques on the use of the refactoring techniques to understanding how this factor play role in producing different effects of refactoring techniques on quality attributes.

III. METHODOLOGY
This section details the methodology followed to conduct this study. As illustrated in Figure 1, the most commonly used refactoring techniques were chosen first, followed by the collection of software codes from five case studies with different software sizes (small, medium, and large). The values of object-oriented metrics were then collected, and external quality attributes were computed before and after the application of the refactoring techniques to the code. Following that, the individual effects of each refactoring technique on each of the internal and external quality attributes were thoroughly investigated, and the cumulative effect of refactoring was analyzed using multi-case analysis. Finally, the effect of each refactoring technique was identified by considering the factor investigated. The steps involved in this experimental design are discussed in detail in the following sub-sections.

A. SELECTING REFACTORING TECHNIQUES
68 original refactoring techniques were proposed by Fowler et al. [17]. Ten refactoring techniques were chosen for this study based on the findings of comprehensive literature reviews on commonly used refactoring techniques conducted by [13], [34], as well as survey findings on the most commonly used refactoring techniques in current practices among software practitioners conducted by [61].
The following are descriptions of the ten refactoring techniques that were chosen: 1. Extract Method (EM): This technique creates a new method from a complex and long method by extracting a set of statements that can be put together into the new method.
2. Move Method (MM): This technique is used when a method exists in a class, but it is used more in another class. Consequently, the method is moved from the original class to the relevant class.
3. Introduce Parameter Object (IPO): When identical groups of parameters are frequently encountered in multiple methods, this technique is used.
4. Remove Setting Method (RSM): This technique is used to prevent any changes to the value of a field.
5. Pull Up Field (PUF): When the same field exists in two subclasses, this technique deletes the field from them and transfers it to the superclass.
6. Pull Up Method (PUM): When methods that have similar work with identical results exist in subclasses, this technique moves those methods to the superclass. 7. Push Down Field (PDF): When a field is used only in some subclasses, this technique moves this from the superclass to the related subclasses.
8. Push Down Method (PDM): This technique moves a method that existed in a superclass into related subclasses in case of this method is used only in one or few subclasses. 9. Extract Subclass (ESb): A class has fields or methods that are utilized only in some specific cases. This technique is used to create a subclass that contains those fields and methods.
10. Extract Interface (EI): When multiple clients use the same part of a class interface or when a part of the interface in two classes is the same, this technique is used.

B. SELECTING CASE STUDIES
The experiments were carried out by researchers in the field who took into account software projects of various sizes (small, medium, and large) because the refactoring process is the primary maintenance practice used by software practitioners in small, medium, and large companies [16], [19], [61], [62], [63]. Selecting software projects of different sizes allows for the exploration of the effects of refactoring techniques through varying designs of software systems (ranging from simple to complex) and expands the benefits of refactoring in the industry as software practitioners can use the refactoring to improve the quality of software systems regardless of whether these systems are small, medium, or large (according to the size of the companies in which they work). In this regard, Akour et al. [48] selected four open-source projects of different sizes. Kumar et al. [43] chose five different-sized software projects. Kumar and Sureka [41] chose seven open-source software projects of different sizes (two large-scale projects and the others were medium and small projects). Al Dallal [40] chose seven open-source projects of differing sizes (one large-scale project and the others were medium and small projects).
Therefore, for the experimental analysis and similar to the researchers [40], [41], [43], [48], [64], seven case studies from two different environments (academic and realworld) and of varying sizes (two large-scale, two medium sizes, and three small sizes) were chosen in this study. The description of the case studies examined is summarized in Table 2. The rationale for incorporating academic student projects is due to their limited scalability and the opportunity to study poor design in the source code [65]. The jHotDraw, JGraphX, Xerces, and jEdit case studies were chosen for this study because they were widely used in the refactoring research according to several systematic literature reviews [5], [13], [34], [53].
In addition, these case studies were selected with different sizes (small, medium, large) in order to investigate the effect of the scenarios of using refactoring techniques on software quality through different sizes of case studies. The selected case studies are described as follows: 1. Payroll management system (PMS): The payroll management system [66] is a small size software system (12 classes), which was developed by three master students in Information Technology program. The purpose of this system is to provide an easy way not only to automate all functionalities involved in managing payroll for the employees but also to provide a fully functional system to help the management of an organization.
2. Library management system (LMS): LMS [67] is a small size software system (19 classes) written in Java to manage a library. The system provides features to organize and manage library tasks. It does have MySql as database support, which makes it able to maintain the database in terms of entering new books and the record of books that have been retrieved or issued, with their respective dates.
3. Bank management system (BMS): BMS [68] is a small size (34 classes) computer-based system written in Java. It is designed to manage all primary information required to calculate monthly statements of customers' accounts. It provides different types of services for customers including fulfilling all the process requirements of any bank and increasing the productivity of the bank. 4. jHotDraw [69]: It is an open-source project with a medium size (250 classes). jHotDraw is a two-dimensional graphics framework for structured drawing editors. It defines a basic skeleton for a GUI-based editor with tools in a tool palette, different views, user-defined graphical figures, and support for saving, loading, and printing drawings. 5. JGraphX [70]: It is a Java Swing diagramming (graph visualization) library for graph drawing. It is open-source software with a medium size (367 classes), and it can be downloaded from GitHub. JGraphX allows the creation of Java Swing apps with interactive diagramming capabilities.
6. Xerces [71]: It is Apache's collection of software libraries for parsing, validating, serializing, and manipulating XML. In the Apache Xerces group, Xerces delivers high-speed, fully compatible XML parsers. It is open-source software with a large-scale size (1036 classes) and can be downloaded from GitHub. 7. jEdit [72]: It is an open-source project with a large-scale size (1153 classes) that can run on any operating system with Java support. It is a programmer's text editor written in Java. jEdit is a cross-platform text editor that has many features such as a sophisticated plugin system, syntax highlighting for 130 languages, a built-in macro language, and extensive encoding support.

C. SELECTING QUALITY ATTRIBUTES
There are several metric suites that measure the quality of object-oriented systems. According to Kaur and Singh [13], Jabangwe et al. [73], and Pham et al. [74], the most common four metrics suites are Chidamber and Kemerer (C&K), Lorenz and Kidd metrics suite (L&K), Metrics for Object-Oriented Designs (MOOD), and Quality Model for Object-Oriented Design (QMOOD) metrics suite.
Chidamber and Kemerer [75] suggested that six metrics, widely known as the C&K metrics suite, be used to measure the internal properties of the object-oriented software system C&K include six metrics (LCOM, WMC, RFC, CBO, DIT, NOC). These metrics are designed to measure the cohesion of LCOM, the complexity of WMC and RFC, the coupling of CBO, and the inheritance of DIT and NOC).
Lorenz and Kidd [76] proposed four metrics (NM, NF, NMI, AMS), commonly referred to as the L&K suite, to be used to evaluate the external attributes of the object-oriented software system using three internal properties of size, complexity, and inheritance. The proposed metrics measure the complexity by NM, the coupling by NF, the inheritance by NMI, and the size by AMS.
Abreu and Carapuça [77] proposed MOOD metrics (CF, MHF, AHF, MIF, AIF, PF). The MOOD metrics are capable of measuring the coupling by CF, the encapsulation by MHF and AHF, the inheritance by MIF and AIF, and the polymorphism by (PF).
Bansiya and Davis [78] proposed a QMOOD model that was designed to take into account the unique internal properties of object-oriented designs. The top-down approach was used to develop the QMOOD. Importantly, the external quality attributes in the QMOOD were defined on the basis of ISO/IEC-9126. These external quality attributes were then linked to specific internal attributes built-in object-oriented designs. This model included 11 internal attributes and six external quality attributes. In addition, QMOOD provided 11 metrics for the measurement of internal attributes and the estimation of external quality attributes.
To accomplish the aims of this study, there is a need for a quality model that can be utilized to evaluate the quality of object-oriented software systems and assess the effect of refactoring techniques on the quality of those systems. To do that, the quality model should be able to measure the internal quality attributes and quantitatively estimate external quality attributes. Therefore, the QMOOD [78] is more suitable for this study as it is a comprehensive model that can assess the design quality of software [79].
The QMOOD coverage is achieved through its six external quality attributes and 11 internal design properties, which together provide a broader overview of the quality of the software compared to other quality metrics for object-oriented design [80]. Its metrics are widely used, and these metrics aim to assess software designs at both the system and VOLUME 11, 2023 class levels [81] and these metrics are highly effective in predicting software defects in both traditional and iterative software development processes [82], [83], [84]. Moreover, it has the possibility to evaluate the overall quality of the object-oriented systems. Therefore, all external quality attributes, internal quality attributes, and object-oriented metrics included in the QMOOD were selected in this study. Table 3 describes the internal attributes with their associated metrics and how these metrics are used to measure the associated internal attributes. The six external attributes included are described as follows: 1. Reusability: the degree to which the software component can be used in other software systems.
2. Flexibility: the ease with which the software component can be modified to be used in other environments other than the environment that was designed specifically for it.
3. Effectiveness: the degree to which the design can achieve preferred functionality and behavior, using object-oriented concepts.
4. Extendibility: the ease with which new requirements can be incorporated into the existing design. The software component can be modified to enhance its functional capability. 5. Understandability: the design properties that make the design of the software easy to comprehend. 6. Functionality: the responsibilities assigned to the design classes, which the classes make available through their public interface. Table 4 summarizes the mathematical formulas used to estimate the selected external quality attribute quantitively based on related internal quality metrics. The QMOOD metrics were collected, and the external quality attributes and TQI were estimated using the mathematical formulas provided by QMOOD before and after the refactoring techniques were applied to measure their effect on the internal and external quality attributes. The positive, negative, or ineffective effect of a refactoring technique is computed based on the differences between quality measurement values after and before using the refactoring technique by subtracting the quality measurement values before the refactoring from the quality measurement values after the refactoring. If the difference is a positive value, the refactoring technique has a positive effect on a quality attribute (except coupling and complexity). If the difference is a negative value, the refactoring technique negatively affects the quality attribute (except coupling and complexity). If the difference is zero, the refactoring technique does not affect the internal quality attribute, the estimated external quality attributes, or the TQI. The Eclipse Metrics 1.3.8 tool [85] has been used to collect the QMOOD metrics in this study because it is one of the most commonly used Java tools in many research applications and works with the most widely used platforms, including Windows, Mac, and Linux [86].

D. APPLYING REFACTORING TECHNIQUES
Each refactoring technique selected was individually performed to identify its effect on TQI, external quality attributes, and internal quality attributes. Fowler described the mechanics for using each refactoring technique [17], [18]. The mechanics of application of refactoring techniques can be performed manually or with the help of tools (for a few refactoring techniques). The JDeodorant tool [87] has been used to perform two refactoring techniques (Move Method, and Extract Method). The JDeodorant is the most cited refactoring tool used to apply specific refactoring [5]. However, a manual validation was carried out to ensure that the refactoring techniques were performed in accordance with the mechanics proposed by Fowler [17], [18].
The other refactoring techniques (Introduce Parameter Object, Remove Setting Method, Pull Up Field, Pull Up Method, Push Down Field, Push Down Method, Extract Subclass, and Extract Interface) were performed manually based on the mechanics proposed by Fowler [17], [18] as unavailability of the automated tools. On the other hand, two activities were carried out sequentially to ensure the preservation of the behavior of the software system after applying each refactoring technique, including [13]: 1) compiling the source code and 2) checking the system outputs. In the compiling activity, after each refactoring technique was used on the relevant version of the case study, the source code of the software project was compiled automatically using the Java compiler in the Eclipse IDE [88] to ensure that it is error-free. In the process of checking the system outputs, the system was run, and its outputs were checked using its interfaces to ensure that it worked as it did before refactoring.

E. MULTI-CASE ANALYSIS FOR ANALYZING THE SCENARIOS OF USING THE REFACTORING TECHNIQUES FACTOR ON QUALITY ATTRIBUTES THROUGH THE SEVEN CASE STUDIES
The multi-case analysis is a useful method for determining the general mechanisms of complex phenomena or systems [89], [90]. Researchers can gain an understanding of theoretical constructs of new phenomena or systems by using this approach. In this study, the main aim of the multi-case analysis is to analyze the effect of the factor identified (the scenario of using refactoring techniques) that cause the different effects of refactoring techniques. Regarding the scenarios of using the refactoring technique factor, using the refactoring techniques with different scenarios, based on different obtained opportunities, through the multiple case studies were identified and investigated. As there are different designs of case studies (software projects), different scenarios for using the refactoring techniques were found. Then, the effects of refactoring techniques with different scenarios on software quality attributes were identified to determine whether using refactoring techniques with different scenarios output different effects on software quality.

IV. RESULTS AND DISCUSSION
We used a set of seven case studies (LMS, BMS, PMS, JGraphX, JHotDraw, Xerces, and jEdit) to study the possible factor that causes different effects of refactoring techniques on quality attributes. First, we collected the QMOOD metrics and calculated the values of the quality attributes before applying the refactoring techniques. Table 5 shows the numerical values of the 11 object-oriented metrics (DSC, NOH, ANA, DAM, DCC, CAM, MOA, MFA, NOP, CIS, NOM) used to measure the internal quality attributes (design size, hierarchies, abstraction, encapsulation, coupling, cohesion, composition, inheritance, polymorphism, messaging, and complexity) that were collected before applying the ten refactoring techniques (EM, MM, IPO, RSM, PUF, PUM, PDF, PDM, ESb, EI) through the seven case studies. Table 5 also shows the numerical values of the external quality attributes (reusability, flexibility, effectiveness, extendibility, functionality, and understandability) and Total Quality Index (TQI) that were computed before applying the ten refactoring techniques through the seven case studies.
Second, we applied each refactoring technique individually in each case study to determine its effects on the internal and external quality attributes. Accordingly, we conducted 46 experiments through the seven case studies: 2 experiments in LMS, 6 in BMS, 10 in PMS, 4 in JGraphX, 10 in jHotDraw, 4 in Xerces, and 10 in jEdit. The experiment refers to investigating the effect of each refactoring technique individually on quality attributes through a case study. The total number of experiments conducted on a case study is equal to the number of refactoring techniques applied in the case study according to the opportunities found to use these refactoring techniques in the case study. For example, we found opportunities to use the ten refactoring techniques in jEdit, so we ran ten experiments in jEdit. Similarly, we identified opportunities to use two refactoring techniques in the LMS, so we ran two experiments in the LMS. Table 6 shows a descriptive statistic of the number of times each refactoring technique has been used in the seven case studies. The ten refactoring techniques were used 657 times in the seven case studies.
Third, we collected metrics and calculated the values of external quality attributes and TQI after applying each refactoring technique in the seven case studies. Tables 7 and  Table 8 present numerical values for metrics and external quality attributes (including TQI) after applying the refactoring techniques, respectively. Fourth, we identified the effects of each refactoring technique individually in the seven case studies. Table 9 and Table 10 show the effects of the refactoring techniques on the object-oriented metrics and external quality attributes (including TQI), respectively. In Table 9 and Table 10, the symbol (↑) denotes that the refactoring technique improves the quality attribute (except for DCC and NOM), the symbol (↓) denotes that the refactoring technique impairs quality attributes (except for DCC and NOM), and the symbol (−) denotes no change in quality. We found that VOLUME 11, 2023   refactoring techniques have varying effects on the internal and external quality attributes Then, we conducted the multi-case analysis using the seven case studies to determine the factor that causes the different effects of refactoring techniques. Based on the findings, we found that each refactoring has different scenarios for use depending on the various opportunities of using this technique. Using case studies from various environments (academic and real-world) and sizes (large-scale, medium, and small) enabled us to explore different software system designs, which in turn created numerous opportunities to use refactoring techniques.The subsection that follows goes over the scenarios for using each refactoring technique factor in detail.

A. SCENARIOS OF USING REFACTORING TECHNIQUES FACTOR
Fowler et al. [17] and Fowler and Beck [18] proposed mechanics for using each refactoring technique. These mechanics describe the steps that should be followed to perform the refactoring techniques correctly. However, these mechanics encounter different internal designs of software systems, and therefore, these mechanics can be performed in different scenarios based on the different internal designs of software systems. The internal components of the software are classes, methods, and attributes. The internal design of classes within the same system differs from class to class in terms of the number of attributes and their types, the number of methods and their types, and the relationship between them. In addition, the relationship of each class to the other classes varies from class to class within the same system. Each class has a specific degree of composition, coupling, and cohesion that differs from the other classes.
Therefore, the ten refactoring techniques (EM, MM, IPO, RSM, PUF, PUM, PDF, PDM, ESb, EI) encounter different internal designs of classes; as a result, they have different scenarios (i.e., method access modifier, variable access modifier, data types, method types, method signatures, object references, classes interactions) to be used. The number of scenarios for each refactoring technique depends on its mechanics.
The different scenarios for the use of the ten refactoring techniques, based on their opportunities, have been investigated and analyzed. The obtained findings have shown that the use of these refactoring techniques in different scenarios has a different effect on quality attributes. In other words, different scenarios for the use of refactoring techniques play a role in generating different effects of refactoring techniques on quality attributes. Subsections below discuss the different scenarios for the use of related refactoring techniques.

1) EXTRACT METHOD (EM)
There are two types of modifiers in Java: 1) access modifiers, and 2) non-access modifiers. The access modifier, such as public, private, and protected, specifies the accessibility or scope of the method. The non-access modifiers, such as static, provide information about their behavior to Java Virtual Machine (JVM). Therefore, there are three basic scenarios (S) for performing the Extract Method: 1) extracting public methods (S1), 2) extracting private/protected methods (S2), and 3) extracting static methods (S3). In addition, the extracted method of any scenario (Sn), where n refers to S1, S2, or S3, maybe coupled (Cp) to other classes by attributes VOLUME 11, 2023  or methods of those classes that consider another scenario derived from the three scenarios.
All scenarios of EM have been investigated whenever opportunities have been found to apply the EM to the case studies. Findings show that different scenarios have different effects on messaging (CIS), cohesion (CAM), coupling (DCC), complexity (NOM), reusability, flexibility, extendibility, functionality, understandability, and TQI. Table 11 shows the effect of EM on the four scenarios where the symbol (↑) refers to quality improvement (except for DCC and NOM), the symbol (↓) refers to quality reduction (except for DCC and NOM), the symbol (−) does not indicate any change in quality, and Total Applied denotes the total number of times the EM has been used in each scenario across the five case studies.
The best and most common scenario for using EM is S1 (extracting public method) in which EM improves the overall quality of the system measured by TQI. S3 (extracting static methods) does not affect TQI or other quality attributes. The worst scenario is the extracted methods coupled to other classes (Sn & Cp) where CIS values are based on the type of extracted method (public, private, static). The second worst scenario is S2 where TQI is impaired. We recommend software developers use the EM only in the S1 scenario and avoid using it in the other scenarios.

2) MOVE METHOD (MM)
Fowler et al. [17] and Fowler and Beck [18] described several scenarios in the mechanics of the use of the Move Method. The opportunities determine the appropriate scenario for the use of the MM. The scenarios are described as follows: 1) Scenario 1 (S1): moving the target method from the source class to the target class; 2) Scenario 2 (S2): moving the target method and modifying it to work in its new home. To make it work in this scenario, some private methods must be adjusted to the public in the new home. 3) Scenario 3 (S3): when the target method has many references, leaving it as a delegating method in the source class, and 4) Scenario 4 (S4): passing the reference of the source object to the target method as a parameter when the method uses its sources. This scenario (S4) always increases coupling, which, in turn, impairs the TQI. The effect of the MM through different scenarios has been identified. Findings show that different scenarios for the use of MM have different effects on quality attributes. Table 12 shows the effect of MM on four scenarios.
We found that S1 and S4 deteriorate all internal and external quality attributes, reducing overall software system quality (TQI). Sn in S4 refers to the fact that the effect of MM on CIS, CAM, NOM, reusability, and functionality in this scenario follows its effect on them over S1, S2, or S3 based on the situation of the target method (removing, removing and updating to the public, delegation method). However,  while using MM in S3 improves messaging (CIS), cohesion (CAM), reusability, functionality, and overall quality (TQI), it increases complexity (NOM) and reduces understandability. All internal and external quality attributes, as well as overall quality (TQI), are improved by the MM in S2. The S2 is the best scenario. As a result, we recommend software developers limit their use of the MM to the S1 scenario and avoid using it in the other scenarios.

3) INTRODUCE PARAMETER OBJECT (IPO)
The IPO replaces a set of parameters with an object. These parameters exist in two scenarios (S): 1) scenario 1 (S1): the parameters exist in several methods within one class, and 2) scenario 2 (S2): the parameters exist in several methods distributed across several classes. In both scenarios, the effect of the IPO has been identified. The results show that using the IPO in two different scenarios has a different effect on TQI. Table 13 depicts the effects of the IPO in two scenarios.
Coupling (DCC) in S1 is increased one time since only the IPO creates a data clump class and coupled to only one class; therefore, flexibility is not affected and the TQI is improved. In S2, the data clump class created by IPO is coupled to several classes, and therefore the coupling is dramatically increased, which in turn reduces the reusability, flexibility, and TQI of the system. As a consequence, we recommend software developers use the IPO only in the S1 scenario and avoid using it in the S2 scenario.

4) REMOVE SETTING METHOD (RSM)
There are two scenarios for removing the setting method that have been identified. The first scenario (S1) is when the setting method is private or protected. The second scenario (S2) is where the target method is public. The effect of RSM on quality attributes in both scenarios has been identified. Findings show that the different scenarios have a different effect on quality attributes. Table 14 shows the effect of RSM on quality attributes in different scenarios. In S1, RSM improves reusability, functionality, understandability, and the TQI of the system by removing the private/protected setting method in which the messaging (CIS) is not changed, the cohesion (CAM) is increased, and the complexity (NOM) is reduced. In S2, RSM impairs the reusability, functionality, and TQI by removing the public setting method by which CIS and CAM are reduced. As a result, we recommend software developers use the RSM only in the S1 scenario and avoid using it in the S2 scenario.

5) PULL UP FIELD (PUF)
The PUF moves the common attribute from subclasses and places it in the superclass. There are two types of attributes, VOLUME 11, 2023 either Java data types (integer, float, string, double) or userdefined class attributes. Therefore, two scenarios for using the PUF are 1) scenario 1 (S1): pull up the Java data types, and 2) scenario 2 (S2): pull up the user-defined class attributes. The effect of the PUF on the quality attributes has been identified in both scenarios. Findings indicate that PUF has a different effect on the quality attributes in each scenario. Table 15 shows the effect of PUF in both scenarios. In S2, DCC is increased as setter and getter methods for object attributes are provided, and therefore extendibility is weakened. Composition (MOA) is reduced in S2 by pulling common object attributes in subclasses to be one object attribute in the superclass; consequently, flexibility and effectiveness are impaired. The first scenario (S1) is common because most of the attributes defined in any system are Java data types. In S1, the messaging (CIS) is increased while the complexity (NOM) is reduced, improving reusability, functionality, and overall quality (TQI). As a result, software developers are recommended to use the PUF only in the S1 scenario and avoid using it in the S2 scenario.

6) PULL UP METHOD (PUM)
The PUM moves a common method in subclasses and places it in a superclass. Common methods have either the same signatures and bodies or the same signatures and different bodies. Therefore, there are two scenarios for using the PUM, which are 1) scenario 1 (S1): pull up the methods with the same signatures and bodies, and 2) scenario 2 (S2): pull up the methods with the same signatures and different bodies. The effect of the PUM has been identified through different scenarios. Findings show that using the PUM in different scenarios produces different effects on the quality attributes. Table 16 shows the effect of the use of PUM in both scenarios.
Polymorphism (NOP) increases in S2 because the common method in subclasses becomes an abstract method in the superclass and has different implementations in the subclasses. Consequently, flexibility, effectiveness, extendibility, and the overall quality (TQI) are improved. The S2 scenario is the best. As a result, software developers should only use the PUM in the S2 scenario and avoid using it in the S1 scenario.

7) PUSH DOWN METHOD (PDM)
The PDM moves the target method from the superclass to the related subclasses. The target method is either an abstract method implemented only in one or some subclasses, or a normal method used only in one or some subclasses. Therefore, there are two scenarios for using the PDM, which are 1) scenario 1 (S1): push down the abstract method to be implemented only by the related subclass(es), 2) scenario 2 (S2): push down the normal public method into the related subclass(es). Both scenarios have been identified and the findings show that the use of PDM in different scenarios has different effects on quality attributes. Table 17 shows the effect of the PDM in both scenarios. Polymorphism (NOP) is decreased in S1 when the abstract method is removed from the superclass as the target method is implemented only by the related subclasses; therefore, CIS and NOM are decreased. In S2, the normal method used by only one or some subclasses is pushed down to the related subclasses, which in turn increases the CIS and the CAM. Consequently, reusability, functionality, the TQI are improved. The best scenario is S2. As a result, software developers should only use PDM in S2 scenarios and avoid using it in S1.

8) PUSH DOWN FIELD (PDF)
The PDF moves a field from a superclass into the related subclass(es). Its effect depends on the internal design of the hierarchy. There are two scenarios have been identified, which are 1) scenario 1 (S1): all the fields in the superclass and its subclasses are of the same type either private/protected or public and have more than one field; and 2) scenario 2 (S2): a superclass has only one private/protected field, regardless of the types and number of fields in subclasses. Table 18 shows the effect of PDF on quality attributes over different scenarios. The DAM is calculated by dividing the number of private and protected fields by the total number of fields in the class. The effect on the encapsulation (DAM) directly reflects on flexibility, effectiveness, understandability, and TQI. The first scenario (S1) is the most common, while the second scenarios (S2) are rare to be found. Using PDF in S1 does not change the quality of the software system. The PDF in S1 can be used if it is necessary. The software developers should not use the PDF in the S2 scenario.

9) EXTRACT SUBCLASS (Esb)
The ESb is used when a class has features (attributes and methods) that are only used in some instances. The target class has two design cases, either simple design or complex design. The Simple Design Case (SDC) means that the class has few features and is not linked to other classes. The Complex Design Case (CDC) means that the class has many features and is strongly linked to other classes. Accordingly, there are two different scenarios for the use of ESb:1) scenario 1 (S1) for SDC, and 2) scenario 2 (S2) for CDC. The effect of ESb on the quality attributes has been identified through S1 and S2 and the findings obtained are summarized in Table 19.
ESb improves the TQI of the system in both design cases (S1 and S2). In S1, the subclass is independently extracted from the target class and, therefore, messaging (CIS), composition (MOA), polymorphism (NOP), coupling (DCC), and complexity (NOM) are not affected. While in S2, the methods have a different form of use, link to other class(es), or need to leave the delegated method, this leads to increase CIS, NOM, MOA, and DCC. The S2 scenario is the best and most common. As a result, software developers are recommended to use ESb in all scenarios, with a preference for S2.

10) EXTRACT INTERFACE (EI)
The EI is used when one or more classes have part of their methods in common. There are two scenarios for the EI based on the design of the common methods, which are 1) scenario 1 (S1): the common methods are not coupled to other classes, 2) scenario 2 (S2): the common methods are coupled with other class(es). The effect of the EI has been identified in both scenarios and the findings obtained are shown in Table 20. The EI improves the TQI of the system in both scenarios. In S1, EI does not affect coupling (DCC) since the common methods placed into the extracted interface are not coupled with other classes by attributes or parameters of the methods; consequently, the flexibility has not been affected. The EI increases the coupling in S2 because one or more methods are coupled to other class(es) by attributes or parameters of the method(s) and therefore the flexibility is weakened. Therefore, the best scenario is S1. As a result, software developers should limit their use of EI to the S1 scenario and avoid it in the S2.
Based on the multi-case analysis, Table 21 summarizes the best scenario for using each refactoring technique to improve the quality of software systems.
To improve the quality of software systems, software developers are recommended to use the ten refactoring techniques in accordance with their best scenarios.

V. THREATS TO VALIDITY
Threats to construct validity concern with selection of the refactoring techniques and quality measurement model VOLUME 11, 2023  (QMOOD). The ten most used refactoring techniques in literature and practice were selected in order to prevent a too-biased and subjective selection. This study also used a valid quality model (QMOOD) to measure the effect of refactoring techniques on the total quality index, external attributes, and internal quality attributes. Threats to conclusion validity concern the relation between the treatment and the outcome. In this study, 46 independent experiments were conducted through seven case studies of different sizes. This makes the findings of the study sufficient to draw a conclusion.
Internal validity is the extent to which a study establishes a trustworthy cause-and-effect relationship between a treatment and an outcome. The case studies sampled for analysis are mostly studied in the context of software refactoring and they are not exposed to any treatment except refactoring to observe only the effect of refactoring techniques through case studies. The experiments were executed starting with small-size case studies and then moving to larger-size case studies, allowing the researcher to gain experience as the experiment proceeded.
External validity refers to the generalizability of the findings. To increase the external validity of this study, experiments were carried out on various open-source and academic case studies from various application domains and sizes. This study focuses on investigating the refactoring techniques of Java projects as these refactoring techniques are mainly proposed for Java systems. However, it cannot be asserted that the results can be generalized to other programming languages where refactoring techniques and tools support may be different.

VI. CONCLUSION AND FUTURE WORK
This paper presents an experimental study to investigate and thoroughly analyze the factors that cause different effects of refactoring techniques on quality attributes. Scenarios of using refactoring techniques factor have been identified, investigated, and analyzed. Ten refactoring techniques have been individually used in seven case studies.
Scenarios of using refactoring techniques factor play an important role in producing different effects of the ten refactoring techniques (Extract Method, Move Method, Introduce Parameter Object, Remove Setting Method, Pull Up Field, Pull Up Method, Push Down Field, Push Down Method, Extract Subclass, Extract Interface). Each of these refactorings has different scenarios to use, and each scenario has a different effect (positive, negative, or not change) on quality attributes. Based on the multi-case analysis, we identified the best scenarios for using the ten refactoring techniques. Thus, software developers should choose the best scenario to use each refactoring technique to improve the quality of software systems.
These findings can help software practitioners understand how to use refactoring techniques to improve software quality while taking this factor into account. The findings can also be used as guidelines for software developers to use refactoring techniques to improve the quality of software systems based on the best scenarios. We intend to investigate the relationship between this factor and other common refactoring techniques such as Add Parameter, Encapsulate Field, Extract Class, Extract Superclass, Inline Class, Inline Method, Move Field, Remove Parameter, and Hide Method as part of our future work. Other factors, such as refactoring tools, software system size, developer's programming skills, and quality measurement models, can also be investigated in future empirical studies.