Object-Oriented Test Case Generation Using Teaching Learning-Based Optimization (TLBO) Algorithm

Researchers are currently seeking effective methods for automated software testing to reduce time, avoid test case redundancy, and create comprehensive test cases to cover (paths, benches, conditions, and statements). Generating a minimum number of test cases and covering all code paths is challenging in automated test case generation. Therefore, the use of optimization algorithms has become a popular trend for generating test cases to achieve many goals. In this study, we used a teaching-learning-based optimization algorithm to generate the minimum number of test cases. We compared our results with those of other state-of-the-art methods based on the path coverage for ten Java programs. The motive for using this algorithm is to optimize the number of test cases that cover all code paths in the unit test. The results emphasize that the proposed algorithm generates the minimum number of test cases and covers all paths in the code at a full-coverage rate.


I. INTRODUCTION
Software testing is an important phase in the software development process because it represents the quality of the software product [1], [2], [3], [4], [5], [6]. The primary goal of testing is to identify software flaws. This stage is considered even more important nowadays as programs are becoming increasingly complex, essential to safety, and extremely important in daily activities, thus requiring an increase in quality [7], [8], [9]. According to [10] and [11], testing accounts for more than 50% of the cost of software development. In software testing, two methodologies are often used: black box and white box [12]. The former is concerned with testing the functionality of the software under test without knowledge of the structure or implementation specifics. The latter is a technique for testing with knowledge of a program's core structure and coding. Therefore, testers must completely understand the source code and consider its behavior using The associate editor coordinating the review of this manuscript and approving it for publication was Roberto Nardone . testing coverage requirements (e.g., path coverage). The white box shows two types of testing, an integration test and a unit test. The integration test uses input and output file pairs to test the overall function of the software. Each integration test is specified in a configuration file with one line for each test. On the other hand, the unit test is the smallest testable portion of the software. The development team is responsible for this type of test. A developer performs this type of testing and must be well versed in the code design. Unit testing is the most basic sort of testing [9]. In this study, we focused on generating test cases for the unit testing. A unit testing approach can be either manual or automated. The former involves manually writing test cases and is more susceptible to human mistakes, whereas the latter involves using tools to perform test cases, depending on user input. Automated software testing can significantly lower the cost of software development. The goal of automated test case generation is to identify a suitable number of test cases to cover all conceivable targets (paths, statements, and branches) [13]. Path coverage tests all paths in the code. The major goal of the path-coverage criteria is to ensure that all paths are covered and to avoid redundant tests. The difficulty in generating data for a unit test can be viewed as an optimization problem that can be handled using a search-based software testing (SBST) approach [14], [15], [16], [17]. SBST aims to move software engineering problems from human-based to machine-based searches [18], [19]. The increase in SBST in recent years has been attributed to its significant contributions to the domain of software testing, such as reducing maintenance costs, prioritizing test cases, reducing human costs, verifying software models, and validating real-time properties. It also generates and minimizes the number of software tests using meta-heuristic searching algorithms, such as genetic algorithms, hill climbing, particle swarm optimization, and teaching-learning-based optimization) [11]. The teaching-learning-based optimization algorithm (TLBO) was inspired by the teaching-learning process [19]. An iterative learning algorithm has several characteristics compared to other evolutionary computation (EC) techniques. The algorithm simulates a teacher's and student's capacity to teach and learn in a classroom. The TLBO method has acquired widespread support among optimization experts because it does not require any specific parameters; it simply requires common regulating parameters, such as population size [19]. Therefore, it enhances the performance of the algorithm. The TLBO algorithm was used in the SBST in two studies. The first study Shahabi et al. [7] proposed TLBO for generating test cases. The algorithm was implemented in EvoSuite, which is a reference tool for search-based software testing. Empirical investigations on the SF110 dataset demonstrate that TLBO provides competitive results of 90.08% in method coverage when compared to standard and monotonic genetic algorithms. However, they did not evaluate the path coverage because EvoSuite did not adopt this criterion. The second study Kumar and Rajeev [20] suggested using TLBO for generating test cases based on branch coverage in procedural programming. Therefore, the motivation for this study is to implement TLBO in object-oriented programs to optimize the number of test cases based on the path coverage criterion. We then compared the results to other state-of-the-art methods. The paper is organized as follows: Section II describes the relevant past research in the field and provides an overall review of past literature. A preliminary description of the TLBO algorithm is in section III. Section IV describes the methodology of the proposed algorithm. Section V details the experimental results and compares other meta-heuristic algorithms. Section VI discusses the conclusion, as well as the theoretical and practical contributions of the study and suggestions for future work.

II. RELATED WORK
The field of automated test data generation was first developed in the early 1970s. Clarke's (1976) [21] was considered the first research on automated test data generation. Parther (1987) [21] presented a new concept for test data generation called the path prefix method. Korel (1990) [21] provided a revolutionary change by dynamically generating test data. During the '90s, researchers focused on object-oriented programs, such as Lakhotia et al. [21]. Souza et al. [22] proposed a multi-objective optimization process based on particle swarm optimization (PSO) to optimize test case selection for functional tests. Moreover, various algorithms, such as (swarm intelligence and evolutionary algorithms, among other meta-heuristic methodologies) have been employed in the development of software test case generation. Khari et al. [23] created a tool that includes two primary automated software testing components: test-suite generation and test-suite optimization. Boundary value testing, robustness testing, worst-case testing, robust worst-case testing, and random testing are the five test suite-generating methods offered in the tool. The generated test suite was further optimized to the desired fitness level using the artificial bee colony algorithm or the cuckoo search algorithm. The two algorithms were applied to ten sample Java programs. The average value of the path coverage for ABC was 90.3% and that for CSA was 75.4%. Hamad [24] developed an artificial bee colony algorithm (ABC) to test data generation for software structural testing in two programs. The results demonstrate the success and ability of the ABC algorithm in software path testing by determining the optimal fitness values. Saber et al. [25] proposed a composing method: a greedy algorithm to quickly find good solutions, a genetic algorithm to increase the search space covered, and a local search algorithm to refine the solutions. The proposed method is 178% better than the state-of-the-art algorithms. Rani et al. [26] implemented an elitist genetic algorithm (GA) with an improved fitness function to expose maximum faults while also minimizing the cost of testing by generating fewer complex and asymmetric test cases. It uses a selective mutation strategy to create low-cost artificial faults that result in fewer redundant and equivalent mutants. This study used 14 Java programs of significant sizes to validate the efficacy of the proposed approach in comparison with initial random tests and a widely used evolutionary framework in academia, namely EvoSuite. The approach was a significant improvement in the test case optimization. Khari et al. [27] examined the performance of six meta-heuristic algorithms, including the hill-climbing algorithm (HCA), particle swarm optimization (PSO), firefly algorithm (FA), cuckoo search algorithm (CS), bat algorithm (BA), and artificial bee colony algorithm (ABC), using their standard implementation to optimize the path coverage and branch coverage produced by the test data. Each algorithm was implemented to generate test cases. Subsequently, the performance of each approach was evaluated for five Java programs. Process measures, such as average time, best time, and worst time, were used to compare the algorithms as well as product metrics, such as path coverage and objective function values of the resulting test suites. The BA was found to be the best-suited algorithm because it produced the most optimal test suites in the shortest amount of time, and the average coverage path of all five programs was approximately 80%. BA was found to be the most rapid. FA was found to be the slowest algorithm. The CS, PSO, and HCA fall somewhere in the middle. Bidgoli and Haghighi [28] ant colony optimization (ACO) was adapted and improved to provide a test data generation strategy for covering prime paths. In comparison, test suites generated by an automatic tool can be used in a metaheuristic algorithm to generate test cases called EvoSuite.
The results indicate that ACO had a 9% higher mutation score. Sharma et al. [29] developed a framework to optimize test cases using a cuckoo search algorithm. Geetha and Mala [14] proposed a tabu search hybrid to the BAT algorithm to choose test cases. The metric criterion for comparison is code coverage. The proposed BAT with TABU search yielded a 0.04875% improvement over the tabu algorithm. Anh [13] developed and enhanced a GA-based method for generating test cases for unit and integration testing. They implemented the algorithm in two classes for the unit test and in six classes for the integration test. The results showed that the GA obtained the highest coverage when compared to state-of-the-art algorithms based on branch coverage and execution time. Damia and Esnaashari [31] combined the firefly algorithm (FA) and the asexual reproduction optimization algorithm (ARO). FA is a bio-inspired algorithm that excels in exploitation and local searches but struggles with exploration and is prone to the local optima problem. On the other hand, ARO gets out of local optima. As a result, they have teamed up to incorporate ARO into the FA phases to boost population variety. This combination was used to generate automatic test cases for the six tested programs to cover all finite paths. FA-ARO achieves 100% path coverage when compared to the traditional genetic algorithm (TGA), adaptive genetic algorithm (AGA), adaptive particle swarm optimization (APSO), hybrid genetic tabu search algorithm (HGATS), random search (RS), differential evolution (DE), hybrid cuckoo search, and genetic search. Jaiswal and Prajapati [32] proposed a Particle Swarm Optimization (PSO) based test case selection approach for the basis path testing. They used the improved fitness function (IFF) as a fitness function that can direct the PSO-based optimization process toward optimal test case selection. They implemented the proposed algorithm using two programs. The results suggest that the proposed approach can generate better outcomes 100% in terms of control-flow graph coverage of all linearly independent paths than the traditional fitness function. Esnaashari and Damia [33] presented a structure for generating test cases based on path coverage. They proposed a mimetic algorithm that employs reinforcement learning as a local search approach within a genetic algorithm. Experiments have shown that this method generates test data faster than the standard genetic algorithm, various genetic algorithm upgrades, random search, particle swarm optimization, bee algorithm, ant colony optimization, simulated annealing, hill climbing, and tabu search. Furthermore, the algorithm provides 100 percent path coverage while requiring fewer evaluations. Lakshminarayana and Kumar [34] developed the cuckoo search and bee colony algorithm (CSBCA) to optimize the automated test cases. Using an example of the ATM withdrawal procedure. According to an experimental investigation, the proposed CSBCA technique produced path coverage in 16.4 seconds. The cuckoo search and bee colony algorithm (CSBCA) achieved a higher fitness function value of 0.7 to 1.0 in 65 percent of test cases/test data than the particular swarm optimization (PSO), cuckoo search algorithm (CS), firefly algorithm (FA), and bee colony algorithm (BCA). Sahoo and Ray [35] proposed a new approach, the forest optimization algorithm (FOA) with metamorphic relations (MRS), to cover multiple paths at a time in one run. The initial test case was created with FOA, and the subsequent test cases were created with metamorphic relations without going through many runs. The reason for utilizing the FOA is that its search process is similar to that of branch/path coverage techniques. The algorithm was implemented in MATLAB and its performance was assessed using six programs. The results show that FOA based on metamorphic relations is more efficient than particular swarm optimization based on metamorphic relations in terms of time consumption and the number of paths covered. For instance, FOA-based metamorphic relations cover five paths, whereas PSO-based metamorphic relations cover four paths. Gupta and Goyal [36] conducted a systematic review to generate test cases. The duration of the systematic review was 2010-2020. They presented all studies that showed the coverage standards, different datasets, and testing levels from 2010 to 2021. Path and branch coverage are common criteria used to generate test cases. Moreover, researchers have focused on the genetic algorithm (GA), some of which used particle swarm optimization (PSO). Moreover, they mentioned all the studies that presented hybrid algorithms. Furthermore, the results emphasize that only a few articles used hybrid algorithms and the ant colony algorithm (ACO), and they showed research that used manual and automatic processes of test case generation. The results indicated that only a few studies used manual test cases. By contrast, one study used both manual and automatic techniques. However, there is a need to reduce the time complexity of software testing and save money for automatic generation test cases. [19]. It is based on the teacher's influence on the effect of the student's output in a class (teaching-learning process). The algorithm shows two primary ways of learning: (1) learning from a teacher as the teacher phase, and (2) learning from other learners as the learner phase. In this optimization algorithm a group of learners is considered as a population and different subjects supplied to the learners are considered as different design variables of the optimization problem and a learner's result is analogous to the 'fitness' value of the optimization problem [30]. This method only requires general control parameters, such as population size and generation number, but no algorithm-specific control parameters are required. The instructor is considered the finest answer for the entire population. The design variables are the factors involved in the objective function of the given optimization problem, and the optimal value of the objective function is the optimum solution. Note: The teaching learning-based optimization (TLBO) algorithm has been presented to optimize continuous problems. However, in the research Shahabi [7], they adapt the algorithm to solve the discrete search problems. Therefore, we used a modified version of the algorithm that solves the discrete search problem as shown in Figure.1.

Initialization:
The algorithm receives the number of individuals and termination conditions as inputs. The process begins with a randomly generated initial population.
2. Teacher phase: This is the first stage of the algorithm, in which students learn from the teacher. During this phase, the teacher seeks to enhance the class's average result in the subject based on his abilities. At any iteration, ı assumes that there are 'm' number of subjects and 'n' number of learners. Mji is the mean result of the learners in a particular subject 'j' (j=1,2, .., m), and kbest, i is the best learner. The best overall result X total kjbest, i is the best learner that is calculated by adding all the subjects achieved over the full population of learners. However, because a teacher is often thought of as a highly educated person who trains students to get higher results, the algorithm considers the best learner identified as the instructor. The difference between each subject's existing mean result and the teacher's comparable result for each topic as: where, Xj, kbest, i is the result of the best learner in the subject j and ri is a random number in the range [0, 1]. TF is a teaching factor that determines the value of either one or two. The value is determined randomly with the same probability as: The TF value range is between one and two. In the teacher's phase, the present answer is updated based on the Difference Mean j,k,i where, X j,k,i is the updated value of X j,k,i . X j,k,i is accepted if it gives a better function value. The learner phase uses all the accepted function values from the teacher phase as input.

Learner phase:
It is the second step of the algorithm, in which learners interact with one another to gather information. A learner connects with other learners at random to improve his or her knowledge. When another student has more knowledge than others do, the learner picks up new information.
4. Termination: At the end of every iteration, the entire population is evaluated, and if the minimum requirement (i.e., a specific coverage percentage) is found in a member, the algorithm ends. If there are no such members, the algorithm chooses the best individual as the teacher and continues into its evolutionary iterations. In addition, there is another stopping condition, such as a time limit.

IV. METHODOLOGY
This section describes our proposed approach; more precisely, it describes our fitness function. We also describe the implementation of the proposed algorithm for generating and minimizing test cases.

A. OVERVIEW
Search-based software testing is a random or directed search technique used to address problems in the software testing, verification, and validation domains. Figure.2 shows a search-based software test input generation approach.
Search-based techniques are becoming increasingly common in software testing and are particularly beneficial for generating test data [11]. In all SBST approaches, a suitable representation of the problem is first used to encode the solutions. The type of search operator employed in a search-based algorithm is influenced by the representation. A more significant aspect of the SBST approach is the development of a fitness function that evaluates the quality of the metasolutions.

B. DATASET
This study focuses on ten Java programs to generate the test cases. Table. 1 listed the ten programs, which were named P1 until P10. The programs range in length from approximately 10 to 75 lines of code (LOC). These programs differ in terms of programming paradigms and data types. P1 and P5 use nested if-else conditionals, while P2, P6, P7, P8, and P9 use basic if-else conditionals; P3 uses a switch case conditional, P4 is an if-else conditional, and P5 and P10 are nested loops.

C. PERFORMANCE STUDY
In this subsection, we explain the steps of the methodology by using one of the tested programs P2 (the greatest number problem), as an example. The steps involved in the execution of the TLBO algorithm are shown in Figure.3. Figure.4 shows the pseudocode for the greatest number program.

a: INDIVIDUAL ENCODER
A test case is constructed from a set of input values supplied to the program under test during execution. However, testing an object-oriented program requires additional information  about the object's construction, the use of supplemental methods for attribute setting, and the values of the parameters supplied to these methods. A test case is defined as a collection of statements to be executed (separated by a colon) and the related parameter values (separated by a comma). Four types of statements in the encoding representation have  been clarified [13]: The constructor is to generate instances of a given class. The field is to access public attributes of objects (identified as $obj); the method is to invoke methods on objects, and the assignment is to assign values to variables (indicated as $var) or public attributes of objects. Invocation parameters for methods and constructors can be an integer, boolean, string, double, or array. In the tested program P2, there were three integer inputs (n1, n2, and n3). The input values were selected randomly from the data type range until all target paths were passed.

b: CODE INSTRUMENTATION
We generate a control flow graph (CFG) by using an abstract syntax tree (AST) as a copied file of the source code by the Junit and Open Java plugin in Eclipse. Figure.5 illustrates the control flow graph of tested program P2 as an example. Table. 2 shows all the paths of the tested program P2.

2) THE TLBO ALGORITHM IS USED TO GENERATE THE TEST CASES a: INITIAL POPULATION
The initial population of solutions is randomly produced from their data type domain. For instance, the size of the initial population of p2 is 100.

b: FITNESS FUNCTION
We used Korel's fitness function [12] to compute the distance between the target path and the considered path. Table. 3 shows the Korel function for P2.
Furthermore, the evaluation of the objective fitness value is conducted as Every path moves toward the teacher during the teaching phase. The average of the choice variables was determined for this purpose, and each path was updated as:

d: LEARN PHASE
Each participant was assigned a classmate at random, and their fitness levels were compared. In reality, in the teaching phase, the path is coupled with the instructor in the movement operator, which means that it will become more like the teacher throughout that phase. If the fitness value of a random classmate is greater, the path travels toward it using the following equation: If the randomly chosen classmate has a lower score, the path moves away from it and closer to the teacher as: e: TERMINATION In our example P2, the entire population is examined at the end of each iteration, and if a member meets the target path, the algorithm ends. There is another stopping condition, which is that the time limit equals 1000000 seconds.

V. RESULT
In this section, we present the results of implementing the TLBO-based test data generator. Then, we compared the proposed algorithm results with those of other state-of-theart algorithms.

A. EXPERIMENT RESULTS
We implemented the algorithm for ten Java programs to generate test cases. The results emphasize that the TLBO algorithm generates test cases with full 100% path coverage. The objective fitness value for P1 is 80, P2 is 85, P3 is 92, P4 is 67, P5 is 83, P6 is 75, P7 is 67, P8 is 75, P9 is 75, and P10 is 66, as illustrated in Figure.  under test. For example, the execution time of P6 was 75.0 (ms). If we converted 75.0 (ms) to seconds, it would be 0.750 (s). This means less than one second. Therefore, obtaining test cases and covering code paths in less than one second has proven the efficiency of the proposed algorithm. Table. 4 lists the objective fitness value and the execution time for each program.

B. COMPARED THE OUTCOMES TO THOSE OF OTHER ALGORITHMS
Regarding the research results Sahoo and Ray [35]; Khari et al. [27]; Khari and Kumar [15]), we compared the results of hill climbing (HC), bat algorithm (BA), cuckoo algorithm (CS), firefly algorithm (FA), particle swarm algorithm (PSO), artificial bee colony algorithm (ABC), forest optimization with improved combined fitness (FOA-ICF), and TLBO based on the path coverage criteria. The global parameters of all compared algorithms are illustrated in Table. 5. Table. 6 and Figure.8 show the results of compared algorithms based on path coverage criteria. The result of P1 indicates that TLBO achieves full coverage of paths; however, the CS obtains the lowest percentage of path coverage (20.0%). The result of P2 shows that TLBO obtains the highest result with full coverage of 100%; however, the CA gets the lowest percentage of path coverage (20.0%). P3 result indicates that TLBO obtains the highest result with full coverage; in contrast, the HC obtains the lowest percentage of path coverage (42.15%). P4 result emphasizes that TLBO obtains full coverage of 100%; nevertheless, the CS obtains the lowest percentage of path coverage (25.0%). P5 and P10 results illustrate that TLBO gets the same result with full coverage of paths 100% to the FOA-ICF. The P6 result shows that TLBO, PSO, ABC, and FA achieve full coverage. The P7 result indicates that TLBO, PSO, ABC, and FA achieve the full coverage of paths 100%; however, CS obtains the lowest percentage of path coverage (75.0%). The result of P8 indicates that TLBO, PSO, and ABC obtain the highest result with full coverage of paths 100%; on the contrary, the FA obtains the lowest percentage of path coverage of 80.0%. The P9 result indicates that TLBO and the FA obtain the same result with full coverage of paths 100%. The 10 result indicates that TLBO and the FOA-ICF obtain the same result with full coverage of paths 100%.

VI. CONCLUSION
The purpose of this study is to generate test cases in a unit test for object-oriented programming. We implemented the generation of test cases in ten Java programs and covered all code paths using the TLBO algorithm. Therefore, the results of the practical experiments indicate that the TLBO algorithm obtains the minimum number of test cases and full path coverage of 100% for all tested programs when compared to hill climbing, the bat algorithm, the cuckoo algorithm, the firefly algorithm, the particle swarm algorithm, and the artificial bee colony algorithm. However, the forest optimization algorithm achieved the same full coverage of 100%. This study focused on object-oriented programs. The proposed method can also be used in procedural programming. The proposed approach is adapted to generate test cases for unit testing. However, it can also be applied to integration testing of multiple classes.