Automatizing Software Cognitive Complexity Reduction

Software plays a central role in our life nowadays. We use it almost anywhere, at any time, and for everything: to browse the Internet, to check our emails, and even to access critical services such as health monitoring and banking. Hence, its reliability and general quality is critical. As software increases in complexity, developers spend more time fixing bugs or making code work rather than designing or writing new code. Thus, improving software understandability and maintainability would translate into an economic relief over the total cost of a project. Different cognitive complexity measures have been proposed to quantify the understandability of a piece of code and, therefore, its maintainability. However, the cognitive complexity metric provided by SonarSource and integrated in SonarCloud and SonarQube is quickly spreading in the software industry due to the popularity of these well-known static code tools for evaluating software quality. Despite SonarQube suggests to keep method’s cognitive complexity no greater than 15, reducing method’s complexity is challenging for a human programmer and there are no approaches to assist developers on this task. We model the cognitive complexity reduction of a method as an optimization problem where the search space contains all sequences of Extract Method refactoring opportunities. We then propose a novel approach that searches for feasible code extractions allowing developers to apply them, all in an automated way. This will allow software developers to make informed decisions while reducing the complexity of their code. We evaluated our approach over 10 open-source software projects and was able to fix 78% of the 1,050 existing cognitive complexity issues reported by SonarQube. We finally discuss the limitations of the proposed approach and provide interesting findings and guidelines for developers.


I. INTRODUCTION
M OST of the cost during software development is due to its maintenance [1], [2]. In complex software systems the time spent for validation could even be longer than the development time. Previous studies showed that debugging errors could be up to the 50% of the total cost of software projects [3]. This is due to the fact that software maintenance tasks are usually performed by hand instead of using automatic approaches.
Software metrics provide a quantitative basis for the development and validation of models of software development process. Information gained from metrics can be used in managing the development process in order to improve the reliability and quality of software products [4]. Cognitive informatics plays an important role in understanding the fundamental characteristics of software and cognitive complexity metrics are a good indicator for this [5]. A number of such measures have been proposed in the literature. However, there is no single metric which has the capability of measuring the complexity of a program based on multiple objectoriented concepts [6]. One of the most popular software metrics is the Cyclomatic Complexity, proposed by Thomas McCabe in 1976 [7]. This metric quantifies the control flow complexity of a piece of code and it has been extensively used in Object Oriented Programming (OOP) to compute the minimum number of test cases to cover a method. However, this metric is not adequate to quantify the understandability and maintainability of a code and, therefore, its cognitive complexity. Recently, a novel cognitive complexity metric has been proposed and integrated in the well-known static code tools SonarCloud 1 and SonarQube 2 , an open-source service and platform, respectively, for continuous inspection of code quality, which are extensively used by developers and software factories today. This cognitive complexity metric, which we refer to as SonarSource Cognitive Complexity (SSCC), has been defined as a measure of how hard the control flow of a method is to understand and maintain [8]. It breaks from the practice of using mathematical models to assess software maintainability. It starts from the precedents set by Cyclomatic Complexity, but uses human judgment to assess how structures should be counted and to decide what should be added to the model as a whole. The SSCC is given by a positive number which is increased every time a control flow sentence appear. Their nested levels also contribute to the SSCC of a method. Note that SonarQube suggests to keep methods' cognitive complexity no greater than 15, although this threshold can be set by the user to a different value.
As an introductory example, let us compare the sumOfPrimes and getWords methods, both shown in Fig. 1. Although they have equal Cyclomatic Complexity (4), it is intuitively obvious that the control flow of sumOfPrimes is more difficult to understand than that of getWords. The sumOfPrime method has a more complex control flow than that of getWords, mainly due to the nested loop and the continue statement. Thus, the cognitive effort required by developers to understand and maintain both codes is not the same, being sumOfPrimes much harder (SSCC=7) than getWords (SSCC=1). r e t u r n " two " ; c a s e 3 : r e t u r n " t h r e e " ; d e f a u l t : r e t u r n " > t h r e e " ; } } FIGURE 1: Two methods with equal Cyclomatic Complexity (4) but different SSCC.
Therefore, the cognitive complexity metric integrated in SonarQube yields method complexity scores which strike programmers as fairer relative assessments of understandability than have been available with previous models [8]. This assessment of understandability is also valid at the class level, just aggregating methods' cognitive complexity. Despite the fact that SSCC correlates with source code understandability in a meaningful way [9], software developers lack support to reduce the cognitive complexity of their code to a given threshold.
There are different refactoring operations to handle different tasks. However, Extract Method is the most versatile refactoring operation serving 11 different purposes [10]. In addition, identification of Extract Method refactoring opportunities for the decomposition of methods can be performed in an automatic way [11]. Due to its many uses [12], Extract Method has been recognized as the "Swiss army knife of refactorings" [10], [13]. It has also been recently used to reduce the complexity of code [14], [15] as we do in this paper. The main differences with already existing approaches are the following: they performed a more limited experimental validation, they do not impose any threshold for the cognitive complexity of the methods, and, most important, they are not able to generate sequences of feasible extractions. For example, in the project Knowage-Core, one of the projects analyzed as part of the case of study in this paper, there are more than 100 methods which require a sequence of code extractions to reduce their SSCC. However, previous approaches are not able to reduce the SSCC of those methods in a single execution.
We model the reduction of the SSCC to a given threshold as an optimization problem. The search space contains all feasible sequences of Extract Method refactoring opportunities. An optimal solution is one which reduces SSCC to the chosen threshold while minimizing the number of method extractions. 3 Note that the new extracted methods must be below the threshold too. We here propose an approach to reduce the SSCC of software projects in an automated way. We finally implement the proposed approach as a software tool for Java code and we apply it over 10 open-source software projects to reduce their SSCC. The developed tool will be available as an open-source project in public repositories, as part of our replication package. To the best of our knowledge, the SSCC metric has not been properly validated as a cognitive complexity measure. We additionally perform a theoretical validation of this metric, which we include as an appendix of this paper.
We thus make the following contributions: • Modeling the SSCC reduction to a given threshold as an optimization problem.
• Providing a software tool to reduce the SSCC of Java projects in an automated way.
• Validating the proposed approach over 10 real world open-source applications.
• Defining best practices to improve software readability and maintainability while benefiting the SSCC reduction task.
• Performing a theoretical validation of the SSCC metric. The remainder of this paper is organized as follows. Section II discusses related work. Section III formulates the SSCC reduction as an optimization problem. Section IV, introduces our approach for reducing the SSCC to a given threshold. In Section V we present the case of study and summarize the experimental setting for evaluating our proposal. Section VI provides the results of our experiments. Section VII discusses the limitations of our approach, reports interesting findings, and identifies open research gaps. Section VIII discusses the threats to the validity of our work. Finally, Section IX presents conclusions and future work.

II. RELATED WORK
Probably the oldest and most intuitively obvious notion of software complexity is the number of statements in the program, or the statement count. However, a large number of software complexity measures have been proposed in the past. In the 70s, the number of program statements, McCabe's cyclomatic number [7], Halstead's programming effort [16], and the Knot measure [17] were the most frequently cited measures. In the 90s, Douce et al. introduced a set of metrics that help in calculating the complexity of a given system or program code based on the object-oriented concepts such as the object and class [18]. All those metrics were based on the spatial abilities which measure the complexity by calculating the distances between the program elements in the code.
In 2003, Shao and Wang proposed cognitive complexity as a new measure of the cognitive and psychological complexity of software by examining the cognitive weights of basic control structures (BCS) of software. Based on this approach a new concept of Cognitive Functional Size (CFS) of software was developed [19]. Cognitive weights are degree of difficulty or relative time and effort required for comprehending a given piece of software. In 2006, Misra et al. proposed the modification in CFS measure by taking into account the total occurrence of operators and operands and all internal BCS [20]. The same year, Misra proposed the Cognitive Weight Complexity Measure (CWCM) complexity measure which is also based on cognitive weights [21]. Then, Kushwaha and Misra framed different cognitive complexity metrics with the goal of aiding in increasing the reliability of software product during the development lifecycle [4].
In 2007, Misra S. and Misra A. K. compared cognitive complexity measures in terms of nine properties [22]. Then, Misra proposed an improved cognitive complexity measure named Cognitive Program Complexity Measure (CPCM) which establishes a relation between total number of inputs and outputs, cognitive weights, and cognitive complexity [23]. The same year, Misra proposed an object-oriented complexity metric which calculates the complexity of a class at method level [24]. Later, in 2008, Misra et al. proposed a metric that considers internal attributes which directly affect the complexity of software: number of lines, total occurrence of operators and operands, number of control structures, and function calls (coupling) [25]. The same year, Misra and Akman proposed a new complexity metric based on cognitive informatics for object-oriented code covering cognitive complexity of the system, method complexity, and complexity due to inheritance together [26].
Few years later, in 2011, Misra et al. proposed a cognitive complexity metric for evaluating design of object-oriented code. The proposed metric is based on the inheritance feature of the object-oriented systems. It calculates the complexity at method level considering internal structure of methods, and also considers inheritance to calculate the complexity of class hierarchies [27]. In 2012, Misra et al. proposed a suite of cognitive metrics for evaluating complexity of objectoriented codes [28]. All the metrics are critically examined through theoretical and empirical validation processes. The same year, Misra et al. also proposed a framework for the evaluation and validation of software complexity measure. This framework is designed to analyse whether or not software metric qualifies as a measure from different perspectives [29].
In 2016, Haas and Hummel addressed the problem of finding the most appropriate refactoring candidate for long methods written in Java. The approach determines valid refactoring candidates and ranks them using a scoring function that aims to improve readability and reduce code complexity [14]. Later that year, Wijendra and Hewagamage proposed a cognitive complexity metric which determines the amount of information inside the software through cognitive weights and the way of information scattering in terms of Lines of Code (LOC) [30]. In this paper, authors also analyzed how the proposed cognitive complexity calculation can be automated. The same year, Crasso et al. presented a software metric to assess cognitive complexity in object-oriented systems developed in the Java language [31]. The proposed metric is based on a characterization of basic control structures present in Java systems. Authors also provided several algorithms to compute the metric and introduced their materialization in the Eclipse IDE. Finally, the applicability of the tool was shown by illustrating the metric in the context of 10 real world Java projects.
In 2017, Rabani and Maheswaran discussed and analyzed classical and modern metrics of software cognitive complexity [5]. The same year, Misra et al. identified the features and advantages of the existing software cognitive complexity metrics [32]. They also performed a comparative analysis based on some selected criteria. The results showed that there is a similar trend in the output obtained from the different measures when they are applied to different examples.
In 2018, Misra et al. presented an updated suite of cognitive complexity metrics that can be used to evaluate objectoriented software projects [33]. The metrics suite was evaluated theoretically using measurement theory and Weyuker's properties and practically using Kaner's framework [34].The VOLUME to appear, 2021 same year, SonarSource introduced cognitive complexity as a new metric for measuring the understandability of any given piece of code [8]. This paper investigated developers' reaction to the introduction of cognitive complexity in the static code analysis tool service SonarCloud. In an analysis of 22 open-source projects, they assessed whether a development team 'accepted' the proposed metric based on whether they fixed code areas of high cognitive complexity as reported by the tool. They found that the metric had a 77% acceptance rate among developers.
In 2019, Kaur and Mishra conducted an experimental analysis in which the software developer's level of difficulty in comprehending the software (the cognitive complexity) was theoretically computed and empirically evaluated for estimating its relevance to actual software change [35]. This study validated a cognitive complexity metric as a noteworthy measure of version to version source code change. Also in 2019, Alqadi proposed novel metrics to compute the cognitive complexity of code slices [36]. Empirical investigation into how cognitive complexity correlates with defects in the version histories of three open-source systems was performed. The results showed that the increase of cognitive complexity significantly increases the number of defects in 93% of the cases. The same year, Hubert proposed an approach to fully automate the extract method refactoring task ranking refactoring opportunities according to a scoring function which takes into account software cognitive complexity [15].
Recently, in 2020, Jayalath and Thelijjagoda proposed a new metric to evaluate the complexity of object-oriented programs based on the influence of previous object-oriented metrics and some disregarded factors in calculating the complexity [6]. The same year, Muñoz Barón et al. conducted a systematic literature search to obtain data sets from studies which measured code understandability and found that cognitive complexity integrated in the well-known static code analysis tool service SonarCloud positively correlates with comprehension time and subjective ratings of understandability [9].
Although existing approaches can indirectly reduce software cognitive complexity, they are not able to automatically reduce methods cognitive complexity to a given threshold. Our proposal is novel because we (i) model the software cognitive complexity reduction to a given threshold as an optimization problem and (ii) provide a tool that generates and applies a sequence of feasible extractions to reduce the SonarSource Cognitive Complexity of Java projects.

III. PROBLEM DEFINITION AND MOTIVATION
SonarCloud and SonarQube compute cognitive complexity of a method as the sum of two components that we call the inherent component and the nesting component. The inherent component depends on the presence of certain control flow structures and complex expressions (like decisions combining several conditional expressions). When a control flow structure or complex expression is found it contributes +1 to the inherent component. The nesting component depends on the depth that a certain control flow structure is in the code with respect to the root node (e.g. a method declaration). This depth is the contribution to the nesting component. Let s i and e i be the start and end offset (in characters) of the ith sequence of sentences of a method in its source file. We consider that ith sequence is nested in the jth sequence, denoted with i → j, when [s i , e i ] ⊂ [s j , e j ]. We say that the ith sequence is in conflict with the jth sequence, denoted with i j, when i and j are not nested one in the other and The SSCC at method level can be reduced to a threshold applying Extract Method refactorings: extracting as a new method in the same class sequences of sentences (i.e., lines of code). However, this task is not straightforward for software developers due to the following reasons: 1) The number of different Extract Method refactoring opportunities l is bounded by n 2 = n·(n−1)

2
, where n is the number of sentences of the method. 4 2) Two code extractions cannot be applied simultaneously if they are in conflict. 3) Extract method refactoring opportunities are not applicable when they introduce compilation errors or the SSCC of the extracted code cannot be reduced to the threshold. 4) More than one Extract Method refactoring could be required to reduce the SSCC of a method. Based on the previous, we define the method cognitive complexity reduction task as an optimization problem which asks "What is the optimal sequence of extract-method refactoring to apply in order to reduce the SSCC of the original method to/below a given threshold?". Thus, a solution to this problem is a sequence of code extractions which is bounded by 2 l (all possible combinations of Extract Method refactorings). Fig. 2 shows a running example to illustrate the difficulties developers face when reducing the SSCC of a method. Note that some code has been replaced by "..." due to space limitations, but the whole code is accesible in the following URL 5 . This method has SSCC 46 and SonarQube suggests to reduce it to 15 in order to improve the understandability and maintainability of the method. As shown, there are 37 statements and the upper bound of Extract Method refactoring opportunities is 37 2 = 666. However, we have checked computationally that there are only 28 applicable code extractions. After analyzing the method, a developer who faces this cognitive complexity reduction task could realize that the optimal solution is a sequence of three Extract Method refactorings. Therefore, one would need to evaluate all possible sequences of one, two, and three Extract Method operations totaling 28 1 + 28 2 + 28 3 = 28 + 378 + 3, 276 = 3, 682 solutions. Although this number of solutions is much smaller than the theoretical upper bound of all possible sequences of extractions explained in Section III (which is 2 28 ≈ 268 million solutions), it is still unmanageable for developers without an automated approach.

IV. COGNITIVE COMPLEXITY REDUCER APPROACH
We propose a SSCC reducer approach consisting in a solver method implementing an automatic algorithm that takes as input the path to the software project to process and the cognitive complexity threshold (τ ). Then, for each method with SSCC greater than τ , it searches for sequences of applicable Extract Method refactoring operations. Finally, it shows the changes to perform to each method and apply them all at once in an automated way.
In order to search for Extract Method refactoring opportunities in a method, our approach generates its corresponding Abstract Syntax Tree (AST). Second, it parses the AST and annotates different properties 6 in each node: its contribution to the SSCC of the method, the accumulated value of the inherent component (ι), the accumulated value of the nesting component (ν), the number of elements contributing to the nesting component of the SSCC of the node (µ), and its absolute nesting level (λ). Note that λ is 0 when no nesting exists in the target piece of software. Third, the approach processes the annotated AST to compute the list of consecutive sentences contributing to the SSCC of the method. This is done to obtain Extract Method refactoring opportunities. Although sentences contributing to the SSCC must be part of code extractions, it is also necessary to consider single statements even if they do not contribute to the value of this metric. The inclusion of statements of this kind could suppose that the extraction is feasible or not. For example when several arithmetic operations are needed to compute a result, if all operations are not included in the extraction, the refactoring probably is not possible because only one variable could be returned. Once the approach identifies Extract Method refactoring opportunities, it checks if the extractions are applicable. This is done with the help of refactoring tools which are able to check pre-conditions, post-conditions, and apply the corresponding operation over the source code.

COGNITIVE COMPLEXITY REDUCER TOOL IMPLEMENTATION
We propose a Java cognitive complexity reducer tool as an Eclipse application. The goal is to provide the necessary means for generating an Eclipse product that can be run from the operating system command-line as a standalone executable, without the need for opening Eclipse for running. This is particularly useful if, for instance, one needs to 6 These are used to compute the SSCC of extracted methods. integrate it in their current development workflow (e.g., using VOLUME to appear, 2021 continuous integration). We got this idea from the jDeodorant project 7 , an Eclipse plug-in that detects design problems in Java software and recommends appropriate refactorings to resolve them.
The developed tool takes as input (i) a SonarQube server URL, (ii) the path to the software project to process, (iii) the cognitive complexity threshold (τ ), and (iv) a stopping criteria (a number of Extract Method refactoring evaluations). Then, it runs SonarQube to perform an analysis of the project and get all existing cognitive complexity issues. Finally, for each method with SSCC greater than τ , it searches for Extract Method refactoring opportunities. In order to do this, the tool first generates and processes the AST associated to the method declaration as explained in the previous section. Then, it enumerates sequences of applicable Extract Method refactorings while the given stopping criteria is not met. The tool uses the Extract Method refactoring operation provided by the Java Development Toolkit (JDT) of Eclipse to test the feasibility of code extractions programmatically. Finally, the tool chooses the best sequence of method extractions found during the search: the one that reduces the SSCC to (or below) the threshold and minimizes the number of method extractions. If wished, the tool applies the required code extractions in an automated way by using the Extract Method refactoring provided by JDT.
The tool internally generates for each method under processing what we name the conflicts graph. A conflicts graph is a directed graph where vertices are applicable extractions and edges represent nested sequences of statements (i.e., if an edge targets j from i, then i → j). The tool labels vertices in the conflicts graph as [s i , e i ](CC i , ι i , ν i , µ i , λ i ), where s i and e i refer to the start and end offset (in characters in the source file) of the ith extraction. Red edges connect conflict vertices in the conflicts graph (i.e., i j). Note that two vertices in conflict cannot be both selected for extraction in the same sequence. The root in a conflicts graph is a special vertex representing the whole body of the method. Conflicts graphs are used when searching for applicable Extract Method refactorings and to compute the impact of code extractions when reducing the SSCC of a method. Fig. 3 shows the conflicts graph of the running example whose source code is shown in Fig. 2. As shown, there are 28 extractable nodes plus the root node which is located in the left lower corner. In addition, there are 35 black edges that represent nested sequences of statements and 44 red edges that represent 22 pairs of nodes in conflict.

V. CASE STUDY
In this section we describe the study we conduct to evaluate the proposed approach when reducing the SSCC of 10 opensource projects. Next, we detail the objects of study. Then, we report the experimental setup used to conduct the experiments. 7 https://github.com/tsantalis/JDeodorant

A. OBJECTS OF STUDY
We used the GitHub REST API to create calls to get repositories from GitHub satisfying two conditions: Java applications using Apache Maven as software project management. We choose Maven as software management because it eases the execution of SonarQube analysis via a regular Maven goal. We ended up selecting a diverse set of 10 open-source projects: two popular frameworks for multi-objective optimization, five platform components to accelerate the development of smart solutions, and three popular open-source projects with more than 10,000 stars and forked more than 900 times. Table 1 shows these projects and some software metric values. In order to ease the replication of the study, for each software project we also show its abbreviated commit hash in GitHub. The simplest open-source project is QueryExecution: it contains 53 methods and six classes, summing up 1,013 lines of code. Despite the low number of methods in comparison to other open-source projects, 6 over 53 (11%) of the methods of QueryExecution have SSCC greater than 15 (the default threshold). Although this project looks simple, and, therefore, easy to maintain, reducing the SSCC of these six methods is not straight forward. For instance, for the method getDBIds SonarQube suggests to reduce its SSCC from 41 to 15. However, there are several refactoring opportunities that can be applied to get this done. Conversely, Knowagecore is the most complex project in our case study: it contains 6,967 methods and 1,093 classes, summing up 149,137 lines of code. Even for a senior developer, maintaining this ecosystem is complicated and prone to errors. SonarQube reports 558 cognitive complexity issues for this project, i.e., 8% of the methods in the project have SSCC greater than 15. Reducing the SSCC of these 558 methods would be time consuming and prone to errors when done manually.
We validate the proposed cognitive complexity reduction tool over the 10 open-source projects shown in Table 1. In total, these projects have 1,050 cognitive complexity issues. The goal of the study is to validate if the proposed approach is able to reduce the number of cognitive complexity issues existing on these projects. In addition, we want to uncover how many extractions are needed, how many lines of codes [13646,15700] (38,10,14,7,2) [13646, 15841] (41,11,14,8,2) [13622, 15700] (38,10,14,7,2) [ are extracted, and how many parameters new extracted methods have when reducing the SSCC of methods.

B. ALGORITHMS
We use an exhaustive search as resolution technique because it is conceptually simple and effective. It generates possible sequences of code extractions and assures the optimal one when all combinations can be generated. We want to keep the resolution technique simple to focus more on the problem and not in the resolution process. The algorithm generates an exhaustive list of refactoring candidates first. To get this list, the source code is transformed into a block structure which contains structural information. After that, the algorithm starts to enumerate all possible code extractions in a recursive way with the help of a stack structure. The way the elements are introduced in the stack determines two variants of the algorithm: Exhaustive Search-Long Sequences First (ES-LSF) and Exhaustive Search-Short Sequences First (ES-SSF). The former is aimed at exploring as many consecutive statements as possible in a single extraction first. In contrast, the latter is aimed at exploring short sequences of statements first. We propose these two different ways of exploring the search space because we set a number of evaluations as stopping criteria. If no stop condition is set, both variants must return an optimal solution.

C. EXPERIMENTAL SETUP
We conducted the experiments in a laptop Dell XPS 15 9560 with 4 × Intel R Core TM i7-7700HQ CPU @ 2.80GHz and 16 GiB of RAM, running the operating system Windows 10 Pro. We used SonarQube version 7.2 and the Eclipse IDE version 2020-06 (4.16.0). We set the cognitive complexity threshold to the default value proposed by SonarQube (τ = 15). AST processing and Extract Method refactorings were performed through JDT version 3.16.0. All graph generation in our tool has been developed using the jGraphT library, a Java library of graph theory data structures and algorithms. 8 In order to check if the observed differences in the results of ES-LSF and ES-SSF are statistically significant, we applied the non-parametric Mann-Whitney-Wilcoxon test with a confidence level of 95% (p-value < 0.05). Table 2 reports the number of cognitive complexity issues, the number (and percentage) of cognitive complexity issues fixed, and the number and percentage of cognitive complexity issues that keep unfixed, respectively, for the projects under study. As shown, the proposed approach is able to fix, on average, 78% of the cognitive complexity issues on these projects. Therefore, our approach is able to fix most cognitive complexity issues in most of the projects under study. However, 288 methods out of 1,050 (27%) has no solution since there are not applicable Extract Method refactorings. The reason is that most of these methods use multiple return statements and loops containing multiple break or continue statements. These kind of statements prevent the extraction of any piece of code contributing to the SSCC of the method. Table 3 summarizes the combined results of our experiments. The first column shows the name of the different variants of the exhaustive algorithm implemented in our tool. The second column indicates whether found solutions are feasible or not. Note that a solution is a sequence of Extract Method refactorings. We define a solution as feasible when the original method and all the new extracted methods have a SSCC no greater than τ . In other case, the solution is unfeasible. Note that the best solution is that one which minimizes the number of method extractions. The rest columns show some aggregated function values (min, max, mean, standard deviation, and sum) for different metrics. Next, there are four blocks of metrics. The first one (columns 4-11) is devoted to SSCC related metrics: iniCC is the initial cognitive complexity of the original methods (always above the threshold), extrac is the number of Extract Method refactorings proposed by the best solution, reducCC is the reduction on the cognitive complexity of the original methods, minReduc is the minimum reduction for a single extraction, maxReduc is the maximum reduction for a single extraction, avgReduc is the mean (average) reduction considering all extractions of the best solution, totalReduc is the sum of the reductions of all extractions of the best solution, and finalCC is the SSCC of the original methods after applying the sequence of Extract Method refactorings determined by the best solution. The second block (columns 12-15) shows LOC metrics, the third block (columns [16][17][18][19] provides information about the parameters involved in the extracted methods, and, finally, the last block shows the execution time in milliseconds.

VI. RESULTS
ES-LSF and ES-SSF found 291 and 289 unfeasible solutions, respectively (i.e., they were able to reduce the SSCC of those methods but above τ ). Interestingly, those algorithms found 759 and 761 feasible solutions for existing cognitive complexity issues, respectively. The slight difference between the two algorithms is due to the way they explore the search space.
Next, we focus on feasible solutions. Existing methods in the source code require, on average, more than one code extraction to reduce its SSCC. These code extractions reduce the SSCC of original methods below τ and extract around 15-20 lines of code from their original location into new methods. ES-LSF prioritizes during the search long code extractions while ES-SSF prefers short ones. Based on this, we expect that, on average, solutions found by ES-LSF extract portions of code with higher SSCC, more lines of code, and need more parameters for the new extracted methods, than solutions found by the ES-SSF algorithm. This is supported by statistical differences as shown in Table 4. Fig. 4 shows boxplots of different metrics for feasible solutions. In Fig. 4(a) we can appreciate that there exist solutions with a high number of extractions for the ES-SSF algorithm. In contrast, the ES-LSF algorithm obtains solutions with lower number of code extractions. As commented previously, this is due to the way these algorithms explore the search space. Fig. 4(b) compares the final SSCC of the original method after the cognitive complexity reduction. This boxplot confirms that ES-LSF tends to reduce SSCC more than ES-SSF. In fact, differences are significant between these two algorithms for this metric. Figs. 4(c), 4(d), and 4(e) can be interpreted all together as the behaviour of the presented approaches slightly affect them in the same way. The results confirm our expectations again. In fact, differences are significant between the two algorithms for avgReduc and avgLOC metrics. Finally, Fig. 4(f) shows the execution time in seconds. Both algorithms took almost 20 hours to process all methods with cognitive complexity issues on the 10 software projects under study. Despite the execution time looks similar, there are some outlier solutions in the case of ES-SSF that take longer time to meet the stopping criteria. The outlier near 800 seconds of execution time represent unique method which ES-SSF is able to obtain a solution but ES-LSF is not. Both algorithms took, on average, less than 70 seconds to reduce methods cognitive complexity.
In order to analyze the overall performance of the proposed approach to automatize the cognitive complexity reduction task, Table 5 shows aggregated metrics (min, max, mean, and standard deviation) for each project.
As shown, all projects, excepting Fiware-Commons, require, on average, more than one code extraction to reduce the SSCC of their methods to 15. However, five projects (CyberCaptor, FastJson, Jmetal, Knowage-core, and MOEA) required five or more code extractions to reduce the SSCC of some methods. In general, code extractions reduced, on average, SSCC by 12 units. Nevertheless, some code extractions reduced methods SSCC up to 72 units.

Analyzing not so typical solutions
Next, we explain not so typical solutions obtained in this experiment which can be detected analyzing the results shown in Tables 3 and 5. The ternary operator is a part of Java's conditional statements. As the name ternary suggests, it is the only operator in Java consisting of three operands. The ternary operator can be thought of as a simplified version of the if-else statement with a value to be returned. This kind of statement contributes to the SSCC of a method with its inherent component (ι = 1) plus the nesting component, which depends on the nesting level   However, most original parameters are needed in the extracted method.
Coming back to the running example Following with the running example introduced in Section III, Fig. 5 shows the addParametersToServiceUrl method after cognitive complexity reduction. The SSCC of this method has been reduced from 46 to 8 after applying three Extract Method refactoring operations. The number of lines of code has also reduced from 65 to 24. Note that, although three Extract Method operations are applied, just one appears in the code (line 16). The reason is that the other two method extractions are called from the extracted method extraction1. These two additional method extractions are required to reduce the SSCC of the first extracted method to τ .

VII. DISCUSSION
A number of cognitive complexity metrics have been proposed in the literature measuring software cognitive complexity in different ways. However, software complexity has gained popularity last years due to the usage of SonarCloud and SonarQube as service and platform, respectively, for continuous inspection of code quality. For this reason we used in this work the cognitive complexity measure provided by these well-known static code tools, which we referred to SSCC. In order to search for feasible refactoring opportunities to reduce cognitive complexity of a method, the proposed approach generates its Abstract Syntax Tree (AST) and annotates in their nodes information about the contribution to the cognitive complexity of the method. Thus, other cognitive complexity measures which take into account the presence of control flow structures in source code for their computation could be integrated in our tool: it would just require to adapt the way the approach gets the list of existing cognitive complexity issues in a project and the computation of the properties annotated in the nodes of the AST of the methods. However, this is out of the scope of the paper: our main contribution is that software cognitive complexity reduction can be modeled as an optimization problem and automatizing SSCC reduction is feasible, which is empirically proven through our experiments.
Concerning the resolution process, a block of consecutive statements can be extracted if pre-conditions and postconditions are met and the extraction generates compilable code. The more blocks of consecutive statements that are extractable, the more possibilities the cognitive complexity can be reduced. However, developers unknown which sequences of statements are extractable but also the impact of any code extraction on code cognitive complexity. Therefore, developers cannot make informed decisions concerning the cognitive complexity of a method when maintaining its code. The resolution techniques proposed here have a number of advantages. Among them, it stands out that they achieve optimal solutions quickly for most of the methods analyzed (78%) without using any heuristic or randomized operator. In contrast, they require a high number of evaluations to find feasible solutions in methods with high number of extractable blocks of code. As the number of evaluations increases, the better the solution obtained. However, the minimum number of evaluations required to find optimal solutions is unknown beforehand. If execution time is a constraint when reducing the cognitive complexity of a method, search-based software engineering techniques could be applied instead of the exhaustive algorithms used in this paper (note that our tool is algorithm independent when solving the cognitive complexity reduction problem). Nevertheless, the proposed approach and the implemented resolution techniques took, on average, less than 70 seconds to process each method of the 10 software projects in our case study. It seems reasonable to integrate software cognitive complexity reduction in the development workflow (e.g., using continuous integration). Thus, the proposed approach could be automatically run at night or after developers commit new changes to software repositories.
In this work we decided to minimize the number of Extract Method refactoring operations when reducing the SSCC of a method. Therefore, all applicable sequences of extractions with minimum size are optimal. However, not all these sequences have the same characteristics and they vary in most metrics studied in this paper: extracted lines of code, extracted SSCC, number of parameters, final SSCC in the original method, and many others. It is possible to model the cognitive complexity reduction problem as a multiobjective or many-objective optimization problem. In that case different techniques can be applied to optimize several of these metrics at the same time. In addition, multiple criteria decision making could be applied allowing developers to decide which solutions seems most appropriate for them based on their preferences.
Interestingly, we found that some coding practices could hinder the cognitive complexity reduction task. This usually happens when methods contain multiple return state-ments. Having too many return statements in a method decreases the method's essential understandability because the flow of execution is broken each time a return statement is encountered. This makes it harder to read and understand the logic of the method but could also prevent the extraction of the code. 9 Consequently, the use of multiple return statements in a method might make an instance of the cognitive complexity reduction problem unsolvable. This also holds for loops containing multiple break or continue statements, which also breaks the execution flow. Therefore, restricting the number of break and continue statements in a loop is done in the interest of good structured programming. 10 § ¦ ¤ ¥ The use of multiple return, break, and continue statements in a method could hinder the cognitive complexity reduction task or even make it unsolvable.
An aspect that is out of the scope of this article is the choice of the name for the new extracted methods. The name of new methods can influence the understanding of the resulting source code. Therefore, this is an important aspect we plan to address in the near future. Creating a dictionary with keywords in the original method and using natural language processing techniques with Transformers [37] could be a good starting point to handle this fact.

VIII. THREATS TO VALIDITY
This section discusses all threats that might have an impact on the validity of our study following common guidelines for empirical studies [38].
Threats to internal validity concern factors that could have influenced our results. A possible threat to internal validity is that we set a stopping criteria of 10,000 evaluations. This stop condition might have influenced our results because the algorithms, in some cases, stop before all possible sequences of extractions are explored. However, in order to alleviate this issue, we have presented two completely different ways of exploring the search space. Another aspect that can influence the results is the choice of the cognitive complexity metric and the used threshold. A number of cognitive complexity measures have been proposed in the literature. However, there is no single metric which has the capability of measuring the complexity of a program based on multiple objectoriented concepts [6]. We used the cognitive complexity metric integrated in the well-known static code tools SonarCloud and SonarQube because (i) this metric positively correlates with source code understandability [9] and has a 77% acceptance rate among developers [8], (ii) it is accessible via SonarQube API REST, and (iii) a well-defined cognitive complexity threshold is suggested for it.
Threats to construct validity concern relationship between theory and observation and the extent to which the measures represent real values. In our study all the experiments were run in the same computer and the metrics we collected are all consistent when analyzing the original and the resulting source codes.
Threats to external validity concern the generalization of our findings. To reduce external validity threats we selected a diverse set of 10 open-source projects for our case of study. Aggregating all projects, we processed 1,050 methods with SSCC greater than 15. This high number of existing issues guarantees that we have analyzed very diverse methods in terms of complexity and size. Thus, we guess our findings can be generalized to other software projects.
Threats to conclusion validity concern the relationship between experimentation and outcome. We compared the results of two different variants of an exhaustive algorithm and performed a Mann-Whitney-Wilcoxon test to determine the statistical significance of the results. In addition, a large number of methods were analyzed and the algorithms had enough evaluations to find feasible solutions for most of the methods of the software projects under study.

IX. CONCLUSION
We formulated the reduction of software cognitive complexity provided by SonarCloud and SonarQube, to a given threshold, as an optimization problem. We then proposed an approach to automatically reduce the cognitive complexity of methods in software projects to the chosen threshold using sequences of Extract Method refactorings. We conducted some experiments in 10 open-source software projects analyzing more than 1,000 methods with a cognitive complexity greater than the default threshold suggested by SonarQube (15). Our automated approach was able to reduce the cognitive complexity to or below the threshold in 78% of those methods.
We found that statements that brake the execution flow of programs could prevent the extraction of code and, therefore, make a particular instance of the cognitive complexity reduction problem unsolvable. With the aim of helping to alleviate this issue, we propose as future work a semantically equivalent code transformation that increases the number of extractable blocks of code in a method by reducing the number of return, break, and continue statements. This transformation will indirectly improve the readability and maintainability of the code, but it will also benefit the cognitive complexity reduction task.
Although our approach was able to reduce the cognitive complexity for most methods, we cannot assure that no solution exist for the rest of the methods. The reason is that the cost of exploring all possible sequences of extractions might be unaffordable. However, we think that it is preferable to maintain the simplicity of the approach to emphasize the benefits of providing an automated tool. As future work we want to study the NP-hardness of the modeled cognitive complexity reduction problem. If we prove this, a different procedure (like an ad-hoc metaheuristic) could be included in our approach to solve the problem. Last but not least, we plan to validate our approach on software developers in order to get their feedback and analyze the way of including our approach as part of the continuous integration practice.

X. APPENDIX
In the field of theoretical validation, a number of researchers have proposed different criteria to which software measures should adhere. Weyuker established a formal list of nine properties in order to estimate the accuracy of software metrics [39]. It has been used to evaluate numerous existing software metrics. Next we evaluate the cognitive complexity metric provided by SonarCloud and SonarQube (which we refer to as SSCC) against Weyuker's properties and validate it against measurement theory, as suggested in the framework proposed by Misra et al. [28]. Then, we perform a practical validation with Kaner's framework [40]. We finally end up with a comparative analysis and conclusion of the validation.

WEYUKER'S PROPERTIES
In the following, P , Q, and R are methods of a class. With |P | and (P ; Q) we refer to the SSCC of method P and the composition of P and Q methods, respectively.  All projects have finite number of classes and methods, and all methods have a finite number of statements. Because SSCC of a method depends on its statements then there are only finitely many methods that will be equal to the measure c. The SSCC metric thus holds for this property. This property says that there can be multiple methods containing the same SSCC value. Two methods without control flow structures will have equal complexity (0). Hence this property is satisfied by this measure. This property states that even though two methods compute the same function, it is the details of the implementation that determine the methods's complexity. Even though the functionalities of two methods are equal, their complexity depends on the number of control flow structures and their nesting level on the code. Because of that the SSCC measure holds this property. This property states that the complexity values of two methods should be less than or equal to the complexity of the composition of the two methods. The SSCC measure mainly depends on the presence of control flow structures and complex expressions which determine the inherent component complexity of a method. Thus, the complexity value of the combination of two methods should be greater than or equal to the complexity value of these two methods. Hence this property is satisfied by this measure. This property states that if there are two methods P and Q with same SSCC and when they are separately combined with the same third method R, yields a method of different SSCC. For any two methods P and Q, any combination of them with another method R will produce new methods with similar SSCC. Therefore, this measure does not satisfy this property.
Property 7: There are methods P and Q such that Q is formed by permuting the order of the statements of P and (|P | = |Q|).
Changing the order of the statements in a method, without changing the functionality of the method, will not change its complexity value. Therefore, this measure does not satisfy this property. Renaming of a method does not impact its SSCC. As a consequence, this property is satisfied by this measure. This property states that the addition of complexities of two separate methods is lower than the complexity of a method which is created by joining those two separate methods. The SSCC of the combined method never reduces. Because of this situation this condition is not fulfilled by this metric. However, the modified version of this property, (∃P )(∃Q)(|P | + |Q| ≤ |P ; Q|) [33], is satisfied. Table 6 gives a summary of the evaluation process through Weyuker's properties. The satisfied properties are marked. As shown, and according to the above explanation, the SSCC measure satisfies all the properties of Weyuker's framework excluding 7th and 9th. Thus, we suggest that the SSCC measure establishes as a well-structured one.

MEASUREMENT THEORY
Here we validate the SSCC measure against measurement theory using the Briand et al. framework [41]. We do our assessment providing the basic definitions and desirable properties that make up the framework.
Definition (Representation of Systems and Modules): "A system S is represented as a pair < E, R >, where E represents the set of elements of S, and R is a binary relation on E (R ⊆ E × E) representing the relationships between S's elements.".
For the SSCC metric, E can be defined as a method containing a set of code statements and R as the set of inherent components: complex expressions and control flows from one statement to another.
Definition (Complexity): "The complexity of a system S is a function Complexity (S) that is characterized by the following five properties: non-negativity, null value, symmetry, module monotonicity, and disjoint module additive.".
• Non-negative: "The complexity of a system S =< E, R > is non-negative if Complexity(S) ≥ 0.". SSCC values are always positive, this property is thus satisfied by this measure.
• Null value: "The complexity of a system S =< E, R > is null if R is empty.". If a method does not contain inherent components (certain control flow structures or complex expressions), then it will have null (0) complexity. Thus, this property is satisfied by the SSCC metric.
• Symmetry: "The complexity of a system S =< E, R > does not depend on the convention chosen to represent the relationships between its elements.". There is no effect on the complexity values of the SSCC metric by changing the order or representation because the contribution of control flow structures and its nesting level to the overall complexity of a method cannot depend on the order or way of representation. Therefore, this property is satisfied by the SSCC measure.
• Module monotonicity: "The complexity of a system S =< E, R > is no less than the sum of the complexities of any two of its modules with no relationships in common.". For the SSCC metric, a module can be defined as a code segment in a method. For this property, if any VOLUME to appear, 2021 method is partitioned into two methods, the sum of the complexity values of its partitioned methods will never be greater than the one of the joined method. Therefore, this property holds for the metric.
• Disjoint Module Additivity: "The complexity of a system S =< E, R > composed of two disjoint modules m 1 and m 2 is equal to the sum of the complexities of the two modules.". The SSCC value of the method obtained by concatenating m 1 and m 2 is equal to the sum of their calculated complexity values. Thus, if two independent methods are combined into a single method then the complexity of the individual methods will be combined. Therefore, this property is satisfied by the SSCC measure.
By fulfilling these properties, one may say that the SSCC metric is on the ratio scale, which is the most desirable property of complexity measures from the point of view of measurement theory [33].

PRACTICAL VALIDATION WITH KANER'S FRAMEWORK
In addition to the theoretical validation using Weyukers' properties and measurement theory, the framework given by Kaner [40] can also be adopted for evaluation of the SSCC metric. This approach is more practical than the formal approach of Weyukers' properties and measurement theory. The framework is based on providing answers to the following points: Purpose of the measure The purpose of the measure is to evaluate the complexity of methods in object-oriented programming languages.

Scope of the measure
Object-orientation is widely adopted nowadays in the development of software, from open-source to proprietary software. The SSCC measure can be used within and across these projects.

Identified attribute to measure
The SSCC metric is defined as a measure of how hard the control flow of a method is to understand and maintain. Thus, The identified attributes that the SSCC measure addresses are understandability and maintainability.

Natural scale of the attribute
The natural scales of the attributes cannot be defined, since it is subjective and requires the development of a common view about them [33].
Natural variability of the attribute Natural variability of the attributes can also not be defined because of their subjective nature. It is possible that one can develop a sound approach to handle such attribute, but it may not be complete because other factors also exist that can affect the attribute's variability [33].

Definition of metric
The SSCC metric has been formally defined by Campbell [42] and briefly introduced in Section I.

Measuring instrument to perform the measurement
The SSCC measure was computed by SonarQube.

Natural scale for the metric
The SSCC measure is on the ratio scale, as mentioned earlier in this section.

Relationship between the attribute and the metric value
The SSCC metric contributes to determining the overall complexity of methods and classes in object-oriented programming languages. Higher SSCC values translate into code harder to understand and maintain.
Natural foreseeable side effects of using the instrument There are no side effects of using SonarQube to measure the SSCC of software projects because the computation of the metric is automatically performed by it. In addition to the previous, SonarQube is an open-source tool and its source code is available in public repositories.

COMPARATIVE ANALYSIS AND CONCLUSION OF THEORETICAL VALIDATION
The SSCC measure satisfied seven out of the nine Weyuker's properties. Although these results were convincing enough, we turned to the measurement theory. Measurement theory has five properties all of which were satisfied by the SSCC metric. This shown that this measure is additive and on the ratio scale. Finally, the Kaner's framework was used to prove the usefulness of the SSCC measure after asking practical questions.