Combinatorial Test Suites Generation Strategy Utilizing the Whale Optimization Algorithm

The potentially many software system input combinations make exhaustive testing practically impossible. To address this issue, combinatorial t-way testing (where t indicates the interaction strength, i.e. the number of interacting parameters (input)) was adopted to minimize the number of cases for testing. Complimentary to existing testing techniques (e.g. boundary value, equivalence partitioning, cause and effect graphing), combinatorial testing helps to detect faults caused by the faulty interaction between input parameters. In the last 15 years, applications of meta-heuristics as the backbone of t-way test suite generation have shown promising results (e.g. Particle Swarm Optimization, Cuckoo Search, Flower Pollination Algorithm, and Hyper-Heuristics (HHH), to name a few). Supporting the No Free Lunch theorem, as well as potentially offering new insights into the whole process of t-way generation, this article proposes a new strategy with constraint support based on the Whale Optimization Algorithm (WOA). Our work is the first attempt to adopt the WOA as part of a search-based software engineering (SBSE) initiative for t-way test suite generation with constraint support. The experimental results of the test-suite generation indicate that WOA produces competitive outcomes compared to some selected single-based and population-based meta-heuristic algorithms.


I. INTRODUCTION
Ensuring conformance to specification, software testing is often considered a determinant of quality. In many situations, testers often race against time to release software on-time and on schedule. Practically, however, it is impossible to consider all exhaustive test cases because of the numerous time and resource constraints involved.
Combinatorial testing provides a convenient mechanism to minimize the number of test cases by considering a subset of interactions between parameters, called t-way testing. The fundamental idea of t-way testing is that ''a fault is usually caused by interactions of two or more system inputs (say, t number of parameters)'' [1], [2]. Many t-way testing applications have demonstrated encouraging results (e.g. at t = 6, almost 90 percent of faults can be triggered and The associate editor coordinating the review of this manuscript and approving it for publication was Seyedali Mirjalili . detected). Nevertheless, it should be noted that combinatorial testing does not replace existing minimization strategies (such as boundary value, equivalence partitioning, causeeffect-graphing and the like) but rather complements them.
To-date, in line with the emergence of a new field called Search-based Software Engineering (SBSE), which deals with solving optimization problems within the Software Engineering lifecycle, many related works have adopted meta-heuristics to address the combinatorial t-way test suite generation. Such applications include PSO [3], Cuckoo Search (CS) [4], the Flower Pollination Algorithm (FPA) [5], Ant Colony System (ACS) [6], and High Level Hyper-Heuristics (HHH) [7].
The No Free Lunch theorem suggests that no single metaheuristic is superior to the other in all optimization cases. In line with this idea, the adoption of a new meta-heuristic is most welcome. This article proposes a new strategy with constraint support for t-way test suite generation based on the Whale Optimization Algorithm (WOA). The Whale Optimization Algorithm (WOA) is a recently developed algorithm based on the hunting behavior of the humpback whale [8]. WOA has a strong global search capacity due to its distinctive optimization mechanism [9]. In addition, WOA is less parameter-dependent and has a straightforward implementation [9]. It has therefore been commonly proposed in various domains to solve many issues, such as feature selection [10], clustering [11], flow shop scheduling [12], electronic engineering [13], energy [14], and electrical power [15], to name a few. Moreover, the WOA has also shown competitive outcomes in all domains. Owing to its robust performance against many existing meta-heuristics, the adoption of WOA for the currently proposed combinatorial t-way test suite generation appears justifiable.
Complementing existing works on t-way testing metaheuristics, our contributions are two-fold. Firstly, we present the first work of its kind that adopts WOA for t-way test suite generation. More precisely, our work investigates the hypothesis that the adoption of WOA is useful for SBSE applications involving constrained and unconstrained software test suite generation. Secondly, we extensively evaluate the performance of WOA through a set of benchmark test suites.
We organized our paper as follows: Section II presents the background on the t-way strategy using definitions and scenario examples, while the related works are presented in Section III. In Section IV and Section V, we introduce the Whale Optimization Algorithm (WOA) and its implementation in t-way testing, respectively. Preliminary findings and discussion are presented in Section VI and Section VII concludes this research.

II. OVERVIEW OF T-WAY TESTING
To demonstrate t-way testing, let us consider the following hypothetical smart city planning example, as shown in Figure 1.
Smart city planning consists of five basic components/parameters, i.e. a transport system, e-service, smart traffic management, health cards, and water level monitoring. The transport system parameter takes three possible values (i.e., Transport System = Public Transport, e-hailing, Individual Vehicle), whereas the rest of the parameters take two possible values (i.e., e-Service = Wired, Wireless, Smart Traffic Management = Sensors, CCTV, Health Cards = Government Hospital, Private Hospital, and Water Level Monitoring = Tripping-bucket Rain Gauge, Hydrophone). Figure 2 shows that the covering array as MCA (N;3, 3 1 2 4 ) in the smart city planning example, assuming the interaction strength is t = 3. The exhaustive test for smart city planning requires 3 * 2 * 2 * 2 * 2 = 48 test cases to cover all the smart city planning configurations. Meanwhile, when the meta-heuristic strategy (i.e., WOA) is used in 3-way testing, only 17 test cases are generated to cover all the configurations of the above-mentioned example. Mathematically, the test suite is a process of constructing an array, N * k where N is the number of test cases and k is the number of parameter values. Every test case consists of a combination of k parameter values [1]. It is mandatory to include all combinations of the t-way parameter values in the test suite. The interaction strength is the number of interacting parameters, denoted as t. Some important definitions of the terminologies used are listed below: • T-way testing is a combinatorial software testing method that examines the t-way interaction of every possible discrete combination of input parameters. This testing can be done much faster than an exhaustive search of all combinations of all parameters.
• An interaction represents a combination of two or more different parameters with a specific value.
• The Covering Array (CA) represents the test suite, which is an array of size N * v, where v is the value (option of system configuration/input user), p is the parameter (system configuration/ user input), t is the interaction strength, and N is the number of test cases generated and is denoted as CA (N ; t, v p ). Minimizing the size of the test suite, as well as retaining fault detection capabilities, are critical to escape time and resource constraints and to maintain the effective detection of faults [16].
where C is the constraint, p no represents the parameter number in the t-tuple table, and v no represents the value number of the parameter in the t-tuple table. Section V further elaborates on these constraints.

III. RELATED WORK
Combinatorial interaction testing strategies use Greedy test suite construction algorithms [17]. Every iteration of the design process aims to cover the maximum number of combinations. The test suite may be constructed by using either one parameter at a time (OPAT) or one test at a time (OTAT) [18]. The OPAT approaches start the test suit composition for the first two parameters or smallest t-combination. Next, it expands the test suite horizontally by inserting one parameter per iteration until the t-way requirements have been completed. IPOG [19] and IPOG-D [20] are examples of such an approach.
Unlike OPAT, OTAT approaches start by producing one test case per iteration, including all the parameters, to cover the maximum number of combinations. The iteration lasts until all the t-combinations are covered. Because of its good performance, many studies have applied OTAT methods, such as Jenny [21] and TConfig [22]. A number of OTAT-based approaches have recently implemented meta-heuristic algorithms to produce a t-way test suite.
Meta-heuristic optimization algorithms give adequate solutions within a sensible time for solving hard and complex issues in science and engineering; thus, justifying the increased interest among researchers and scientists in this area. Meta-heuristic optimization algorithms solve optimization problems by imitating evolution behavior, swarm behavior, or the law of physics [23].
To date, there is a fairly comprehensive literature on combinatorial testing, spanning various approaches. Nevertheless, each of these approaches share a common aspect: when combined with heuristics, these approaches can harness the power of random combinatorial searching to evaluate tstrength covering arrays. Two main aspects must be focused on to formulate the optimization problem: the definition of the objective function; and the selection of the technique, whether utilizing a pure-based approach or a hybrid-based approach [39]. In terms of the objective function, the number of tuples covered for the candidate test case (i.e. weight) is used as the fitness value. Meanwhile, the number of uncovered tuples is taken as the cost of the candidate test case in another case, which needs minimization.
As for the type of meta-heuristic algorithm applied in t-way combinatorial testing, Table1 summarizes some of the algorithms introduced in the last five years, as further explained in the following sections. Some of these algorithms have been listed in [40] and their variants have also been updated and provided here as well.
PSO was firstly applied to t-way testing in 2010 to generate a test suite. PSO imitates the behavior of flocks of birds searching for food. The optimal solution (position) is calculated using individual position and velocity. In each flock, an individual moves towards the best individual position and the best global position (optimal solution) [3]. PSO has opened many developmental ideas on variant algorithms due to its rapid convergence rate behavior and less demanding computational requirements. These variants include DPSO [42], SITG [43], PSTG [51], etc.
TCA [41] integrates Greedy Tabu search and heuristic random walk. Initialization of test cases is generated using Greedy Tabu search. TCA performs the heuristic search method to extend the search to discover any uncovered interactions. Another algorithm emerged in 2015 called the Cuckoo Search (CS) [4], which was implemented in t-way combinatorial testing with a small number of control parameters. A variant of CS was also implemented in t-way testing that upgrades the search space with Levi flights [31]. Then, Flower Strategy (FS) [5] was introduced in 2015 derived from the efficiency of the Flower Pollination Algorithm (FPA). Some defining features of FPA are its simplicity, flexibility, and low complexity.
Ant Colony System (ACS) [6] is an AI-based strategy and is a type of Ant Colony Optimization (ACO). ACS has effectively resolved numerous combinatorial optimization issues. Its strategy provides all kinds of interactions, particularly IOR. Additionally, another strategy based on the Bat Algorithm (BA) was introduced, called the Bat-inspired Testing Strategy (BTS), where the BA works as the main search engine to obtain the optimal test suite size [44].
Meanwhile, in 2017, several strategies were implemented in t-way testing, such as the Artificial Bee Colony Algorithm (ABC) and the Teaching-Learning-Based Optimization algorithm (TLBO). ABC was designed to imitate a honey bee colony's feeding behavior. Several variants of ABC have also been implemented in t-way testing. For instance, the Pairwise Artificial Bee Colony algorithm (PABC) [45] was implemented in 2-way testing and the Artificial Bee Colony Strategy (ABCS) [46] was applied for a higher interaction strength of up to ten (i.e. t ≤ 10). Meanwhile, TLBO mimics the classroom environment, which has two stages, i.e. a teacher (global search) and a learner (local search). TLBO was applied in pairwise testing (i.e. 2-way testing) to generate a test suite [47]. Meanwhile, another variant called Adaptive TLBO (ATLBO) was also implemented in t-way testing in another study [18]. ATLBO uses the Mamdani Fuzzy inference system to enhance the selection process between the global search and the local search [18].
Two novel algorithms were introduced in 2018 for application in t-way testing, namely the firefly algorithm (FA) and the kidney algorithm (KA). FA was inspired by the distinguishing feature of the firefly, namely the flash patterns that attract consorts and scare away predators. A strategy called FATG based on FA was introduced to minimize the test suite and reduce execution time [48]. KA emulates the role of the kidneys in the human body. KA involves two main procedures: filtration (local search) and reabsorption (global search). The Pairwise Kidney Strategy (PKS) was developed based on KA to generate a smaller test suite [49]. Improved Jaya Algorithm (IJA) [50] is a populationbased algorithm developed to address constrained and unconstrained problems. The key idea behind the algorithm is that every candidate solution will seek the best solution while simultaneously evading the worse solution. IJA is implemented in t-way testing by only updating the best test case and the worse test case. Then, the current test case is updated based on the best and worst test cases. To improve diversity and a quality solution, lévy flight was introduced, as well as a mutation operation, to improve the convergence speed of the proposed method in generating a test suite [50].
Multiple Black Hole (MBH) algorithm [39] emerged in 2020 for application in combinatorial testing. The Black Hole algorithm is a modern meta-heuristic method focused on observable evidence of the black hole phenomenon and the behavior of stars when interacting with the black hole. The Black hole algorithm is considered a population-based algorithm. The stars are the solutions (test cases) and the best star (test case) is selected as the black hole, which all solutions move towards based on their current location and a random number. MBH is based on the multi-swarm principle, which can be defined as multiple black holes. Additionally, MBH introduced the black hole energy to promote the removal of certain black hole swarms and to produce fresh ones [39]. Another algorithm introduced in 2020 is SCA [17], which is a population-based algorithm that produces numerous initial random test cases and allows the cases to fluctuate outwards or towards the best possible test case using a sine and cosine mathematical model. SCA was enhanced by introducing a combination of linear and exponential magnitude updates for search displacement [17].

IV. WHALE OPTIMIZATION ALGORITHM
The SBSE field has seen the extension of several metaheuristic algorithms, such as Greedy Search, Simulated Annealing, Genetic Algorithms, Tabu Search, and even the Whale Optimization Algorithm (WOA). However, WOA was applied in regression testing via hybridization with the Artificial Neural Network (ANN) [52]. Harikarthik et al. [52] introduced an innovative effort to investigate the effectiveness of WOA in regression testing by hybridizing it with ANN to optimize its weights. As for the t-way test suite generation problem, no study has yet used (WOA) to address software engineering issues, i.e., the optimization problems mentioned earlier.
Therefore, it appears that the SBSE research community has not fully explored the potential of WOA.
In 2016, Mirjalili and Lewis [8] introduced WOA, which is a modern nature-inspired AI-based algorithm. WOA imitates the hunting behavior of humpback whales. Humpback whales are intelligent and have a sophisticated way of performing collective work. These creatures use a special tracking technique known as the bubble-net feeding technique, as shown in Figure 3. The whales perform this technique by making peculiar bubbles along a circle or a '9'-shaped path. Then, they hunt near the surface and trap the victim in a net of bubbles. There are two stages of WOA: exploitation and exploration. The prey-encircling method and spiral bubble-net attacking technique are used in the exploitation stage, where both techniques update the position of the current search agent using the location of the best search agent. However, the spiral bubble-net attacking technique includes a randomness factor (i.e. explorational side), as seen in Equation (6). Meanwhile, in the exploration stage, a random search is conducted, where the position of the current search agent is updated based on a generated random search agent, as illustrated in Algorithm 1. The mathematical model for WOA is specified below:

A. EXPLOITATION PHASE
The two mechanisms used in this phase are as follows: 1) Encircling Prey: Humpback whales can identify the victim's position and then surround the victim. In WOA, the target victim is presumed to be the current best candidate solution. Next, the best search agent is located, while all other search agents attempt to move towards it. In other words, the agent updates the movement (location) of the whale around the victim per the following mathematical model: where t represents the current iteration, X * represents the best solution obtained so far, and X is the current solution. Next, A and C are coefficients computed using Equations (3) and (4) respectively: where a is reduced linearly from 2 to 0 during iterations as shown in Equation (5) and r is a random number in [0,1].
2) Bubble-net attacking technique: This method involves two mechanisms: i) a shrinking encircling mechanism carried out by the reduction of the value of a in Equation (3), so the new location of a search agent is located between the genuine location of the agent and the location of the existing best agent; and ii) a spiral updating position mechanism used to calculate the distance between the current solution (whale) and the best solution (victim) using the spiral equation of Eq. (6): where D is the distance between the whale and the victim, b is a constant for defining the shape of the logarithmic spiral, and l is a random number in [−1, 1]. Humpback whales use both mechanisms simultaneously. To model this behavior, a 50% chance is introduced to select one of the mechanisms to update the location of the whales during the search. The mathematical model is outlined by Equation (7): where p is a random number in [0,1].

B. EXPLORATION PHASE
WOA is considered a global search. Therefore, the whales search randomly based on each other's location. Thus, the location of a search agent is randomly updated instead of depending on the best search agent found so far. This technique is used when the random values of A are greater than 1, to ensure the search agent moves away from a reference whale (best solution). This mechanism emphasizes global search and induces WOA to perform exploration. The mathematical model for this step is outlined by Equations (8) and (9):

V. IMPLEMENTATION OF WOA
The WOA-based approach is used to automatically generate a test-suite and to decrease the number of test cases. Figure 4 presents an overview of the WOA implementation in t-way testing, which consists of two phases:

A. T-TUPLE TABLE GENERATION
The outcome of this phase is the t-tuple table, which, as mentioned earlier, a sequence (or ordered list) of t elements.
Algorithm 1 Pseudo-Code of the WOA Algorithm 1: Initialize the whales population X i (i = 1, 2, . . . , n) 2: Calculate the fitness of each search agent 3: X * = the best search agent 4: while i < maximum number of iteration do 5: for each search agent do 6: Update a, A, C, l, and p 7: if p < 0.5 then 8: if |A| < 1 then 9: Update the position of the current search agent using Eq (2) 10: else if |A| > 1 then 11: Select a random search agent () 12: Update the position of the current search agent using Eq (9) 13: end if 14: else if p ≥ 0.5 then 15: Update the position of the current search using Eq (6) 16: end if 17: end for 18: Check if any search agent goes beyond the search space and amend it 19: Calculate the fitness of each search agent 20: Update X * if there is a better solution 21: t = t + 1 22: end while 23: return X * To generate the t-tuple table, four steps are taken, as illustrated in Figure 4, and explained as follows: The first step is to obtain the system configuration or user input for the software to be tested. The second step is to decide on the interaction strength (t) of the t-way testing. The next step is to generate the parameter combination. For example, if we have 4 parameters (say a, b, c, and d) and the interaction strength, t = 2, then the 2-way combinations are (ab, ac, ad, bc, bd, and cd). The last step is to generate the t-tuple table that depends on parameter combination (generated in the previous step) and the values of the parameters. If we took the previous example of 4 parameters (a, b, c, d), each parameter has 2 values (0,1), so the t-tuple table will be represented by Table 2, where the x would be replaced randomly with one of the parameter values (0,1) during the search.
While in the presence of constraints, the forbidden combinations are obtained together with the system configuration or user input for the software to be tested.

B. TEST SUITE GENERATION
The t-tuple table generated in the previous phase is now an input for this stage, while the WOA attempts to cover its cells (interaction elements) with the minimum test cases. As illustrated in Figure 4, WOA will run until the t-tuple table becomes empty after applying the four steps shown VOLUME 8, 2020  in Figure 4. Figure 5 shows the elimination process in the t-tuple. As shown in Figure 5, WOA will search for the best test case based on weight. The weight is the number of six. This means that it covers six interactions in the t-tuple table, which are 1xx0, x1 × 0, xx10, 11xx, 1 × 1x, and x11x. Then, the covered interactions are removed from the t-tuple table and the best test case is added to the test suite array. This process continues until the t-tuple becomes empty, in other words, when all the cells (i.e. interactions) in the t-tuple table are covered.
Meanwhile, in the presence of constraints, each time the WOA updates its solution (i.e. generate new solution), the new solution will be checked whether or not it is one of the forbidden combinations. This step is to ensure that the solutions will not converge to one of the forbidden combinations.
Consider the example in Figure 6 of CCA (N ; 2, 2 4 , F), This means that the constraint covering array (CCA) consists of 4 parameters, with each having 2 values and an interaction strength of 2. Meanwhile, the forbidden combinations, F, has two constraints, and each constraint has a pair of tuples. The first constraint is (C p 1 ,v 2 , C p 2 ,v 1 ), where the first tuple is C p 1 ,v 2 indicating parameter one and value two and the second tuple is C p 2 ,v 1 i.e., parameter two and value one. Thus, the first forbidden combination is (10xx), as per Figure 6. Similarly, the second forbidden combination will be (xx00), where x is a 'don't-care' value.
As for WOA, the generation process begins with a set of random solutions (initial population). Then, the solutions are  evaluated using a fitness function to find the best solution. Then, the algorithm repeatedly executes the following steps until the stopping criterion is met. First, the coefficients are updated. Second, based on the random values of A and p, the algorithm updates the position of a solution using either Equation (2) or Equation (9) or Equation (6). Lastly, the WOA returns the best solution obtained.

VI. EXPERIMENT AND DISCUSSION
Our experiments aim to demonstrate the efficiency of WOA versus other existing, well-known, population-based meta-heuristic algorithms and pure computational strategies (i.e. the efficiency is described by the size of the generated test suite).
To express the computational cost performance of our strategy, a time complexity analysis of our strategy was done by considering the structure of our implementation as prescribed under Section V. The structure is displayed in Figure 7. Assuming that all other operations are carried out in a time constant, the time complexity of our strategy is O(ExBxG) ≈ O(n 3 ). The Big O notation can sometimes be used to describe execution time. However, a few studies have already computed the code execution time. There are some valid threats to comparing meta-heuristic algorithms VOLUME 8, 2020  performance specifically when execution time is compared [7]. Owing to factors such as differences in the implementation language (e.g. Java versus C versus MATLAB), the data structure, the system configuration, as well as running environment, a comparison of execution time is deemed unfair. The same observation has also been cited by other researchers [18], [53].
We split our experiments into three parts. First, we systematically tuned the parameters. Second, we evaluated and compared the WOA strategy with existing populationbased meta-heuristic algorithms. Lastly, we benchmarked our strategy with existing constraint-supporting strategies. Another measurement was also applied based on Wilcoxon's signed-rank test for all reported results.

A. PARAMETER TUNING
One of the advantages of WOA is that it has a fewer number of parameters, unlike other meta-heuristic algorithms, such as PSO, HS, and GA, to name a few. However, population size and the maximum number of iterations are still required for tuning. This is because a big iteration value could be unproductive if the previous iterations did not produce a better solution. Conversely, too few iterations could perhaps prevent the best candidate solution from being reached. Comparably, a large population size raises the cost of computation; while a small one hinders a good solution from being obtained. Hence, it is necessary to carefully coordinate the selection of the maximum number of iterations and population size. The covering array CA(N; 2, 5 7 ) was chosen as a case study to tune the parameters. The justification for embracing this covering array is that many AI-based approaches are tuned using the same covering array [54]- [56].
To tune the WOA parameters, the WOA strategy for CA(N; 2, 5 7 ) was executed repeatedly 20 times with a different population size and the maximum iteration number values tested, by setting the population size and varying the maximum iteration number (i.e. 10, 25, 50, 75, 100, 125, 150, 175, and 200). Then, reverse experiments were performed, where the population size was varied (i.e. 10, 30, 50, 70, 100, 120, 140, 160, 180, and 200) and the maximum iteration number was fixed. The best test suite size and the average test suite size are shown in Table 3 and Table 4, respectively, where the darkened cells indicate the most optimal size. The execution time is reported in seconds. The best execution time and the average execution time are shown in Table 3 and Table 4, respectively.
Per the results shown in Tables 3 and 4, it can be concluded that a large population size could yield better results and, on the contrary, a too-small population size could contribute to worse results. A large population size (i.e. 200) did not, however, necessarily produce better results so we had to consider that the execution time could also increase. Likewise, a high iteration value (i.e., 200) may not always provide the most optimal size in each case. The best results were obtained when the population size was set between 70 and 200. Otherwise, the iteration value would increase and the result would improve. The best result was obtained when the iteration value was varied from 75 to 175. Beyond that, when considering the best average results obtained, the population size was varied between 120 and 200 while the maximum number of iterations was varied between 100 and 175.
In Table 4, the best average results are marked in bold. We highlight two of the best average results: the first was achieved when the population size was 180 with a maximum number of iterations of 100, while the second was obtained when the population size was 180 and the maximum number of iterations was 150. In this case, we had to consider the execution time when choosing the optimal population size and maximum number of iterations. This is because the execution time increases when both the population size and the maximum number of iterations increase. Therefore, we selected 100 as the maximum number of iterations and 180 as the maximum population size.

B. BENCHMARKING WOA STRATEGY WITH EXISTING STRATEGIES
To assess the performance of WOA, we benchmarked it against other existing strategies in terms of CA size.   The experiments were divided into two of the following well-known datasets: 1) Comparing the WOA strategy with currently available strategies using CA(t, v 7 ), where the number of parameters remains constant while their values are varied. In addition, the interaction strength t is varied from 2 to 6. 2) Comparing the WOA strategy with existing strategies using CA(t, 3 P ), where the number of parameters is varied and their values are kept constant. In addition, the interaction strength t is varied from 2 to 6.
The experimental environment is a laptop operating on Windows 10, with a 64-bit, 2.71 GHz, an Intel Core i5 CPU, and 8 GB of RAM. The proposed strategy was coded and implemented in Java. Table 5 shows the parameter settings for each meta-heuristic algorithm used for the comparison.
In Table 6, the configurations of CA(t, 3 P ) were adopted, where t was varied as 2 ≤ t ≤ 6, p was varied as 3 ≤ p ≤ 12, and v was kept constant at v = 3; the results are reported in terms of the best test suite size, as well as average test suite size, after repeating the experiment 30 times (for statistical significance) [17]. The results reveal that WOA outperformed all the pure computational strategies and most of the AIbased strategies, including GBGA, PSO, CS, and ABCVS. Moreover, WOA produced competitive results to that of the GS and APSO strategies, bearing in mind that we used the standard WOA without any modifications.
It can be noted from Table 6 that the WOA strategy produced better results when the search space got larger, compared to other AI-based strategies, because WOA has the ability to explore more, but it lacks exploitation when it comes to a small search space.
Meanwhile, Table 7 displays the configurations of CA(t, v 7 ) where t is varied as 2 ≤ t ≤ 6, v is varied as 2 ≤ v ≤ 7 and p is kept constant at p = 7. The results show that the WOA strategy outperformed all the pure computational strategies and most of the AI-based strategies including PSO, CS, and APSO. In addition, WOA yielded competitive results to that of the GS strategy, although the standard WOA strategy was used.
Similarly, Table 7 also shows that the WOA strategy delivered better results with a larger search space compared to other AI-based strategies, because WOA can explore more TABLE 6. Test suite size performance for CA(t , 3 P ) where P was varied from 3 to 12 and t was varied from 2 to 6.  7 ) where v was varied from 2 to 7 and t is was varied from 2 to 6. in a larger search space and because exploration is one of its advantages.
To ensure the superiority of the WOA strategy over the other existing strategies, a statistical analysis was conducted, particularly the Wilcoxon signed-rank test, which is a nonparametric test for matched or coupled data concentrating on differential ratings. However, this test also considers the extent of the observed differences in response to evaluating the signs of the differences. The Wilcoxon signed-rank test was used because it can inform the researcher if a significant difference exists between two results.
The Wilcoxon signed-rank test produced two factors. The first is the Asymp. Sig. (2-tailed) and Z, which are statistical tests indicating the difference between two groups. An Asymp. Sig. (2-tailed) value smaller than 0.05 implies a significant difference between the two groups. Although the value of Z is not relevant and beyond the applicability of this study, this value was nonetheless provided in this report. The second factor is the ranking, which ranks the values that are greater than, equal to, or less than the comparable values.
In all the tables presenting the statistical results, in the ranks part, ''WOA <'' indicates the number of cases the WOA strategy generated with a smaller CA size compared to the other strategies (i.e., pure computational and AI-based strategies). In other words, this label indicates the number of times the WOA strategy generated better results. Similarly, ''WOA ='' indicates the number of times the results were the same, while ''WOA >'' represents the number of times the WOA strategy produced the worst results. Table 8 presents the result of the Wilcoxon test reported in Tables 6 and 7. Table 8 shows that the WOA strategy generated better outcomes than the pure computational strategies; thus confirming the superiority of WOA over the other strategies. As for the AI-based strategies, WOA also produced significantly different outcomes compared to PSO, CS, and ABCVS. Meanwhile, WOA statistically produced competitive results to that of GBGA, GS, and APSO; but it must also be considered that these strategies have been modified and enhanced while ours was not.

C. BENCHMARKING WOA STRATEGY IN THE PRESENCE OF CONSTRAINTS AGAINST FIVE DIFFERENT ALGORITHMS
In this section, we present our experiments for benchmarking WOA against 5 recent algorithms. These algorithms are the Sine-Cosine algorithm (SCA) [62], the Jaya algorithm [63], the Flower Pollination algorithm (FPA) [64], the Cuckoo Search algorithm (SC) [65], and the Late Acceptance Hill Climbing algorithm (LAHC) [66]. All the algorithms, including WOA, support constrained t-way testing. The settings of each algorithm are summarized in Table 9. We ran each algorithm 30 times and recorded the best results from these 30 runs.
The performance of the algorithms was mainly evaluated in terms of test suite size. In the evaluation, we compared the best test suite size and the average test suite size acquired by the algorithms, as per Table 10 and Table 11, respectively. Then, the Wilcoxon signed-rank test was applied to the results reported by the six algorithms.
We divided our experiments into three dataset groups. We also designed their constraints (i.e. forbidden combinations). The details of the datasets are as follows: 1) Comparing the WOA strategy with five different algorithms using CCA(2, 3 P , F), where the number of parameters was varied and their values (v = 3) and interaction strength (t = 2) were kept constant.
In addition, the number of constraints (i.e., forbidden combinations) were varied between 3 and 5 pairs of constraints, as shown in Tables 10 and 11. 2) Comparing the WOA strategy against five different algorithms using CCA(2, v 7 , F), where the number of parameters (p = 7) and the interaction strength (t = 2) were kept constant and their values were varied. In addition, the number of constraints (i.e. forbidden combinations) was varied between 3 and 5 pairs of constraints, as shown in Tables 10 and 11. 3) Comparing the WOA strategy against five different algorithms using CCA(t, 2 10 , F), where the number of parameters and their values were kept constant (p = 2 and v = 10). While the interaction strength t was varied from 2 to 6. In addition, the number of constraints (i.e, forbidden combinations) were varied between 1 and 3 pairs of constraints, as shown in Tables 10 and 11. To evaluate the performance of the WOA strategy, we compared WOA against six other t-way strategies that were also implemented. The performance evaluation criteria included size (i.e., optimal test suite size) [50]. The experimental results are presented in Tables 10 and 11; Table 10 shows the minimum (i.e., best) test suite size while Table 11 shows the average suite size for each competing strategy. The best results obtained by each strategy are marked in bold.
The results of Tables 10 and 11 show that WOA performed better than LAHC, FPA, and CS for both best test suite and average test suite, as WOA favors exploration, which allows it to explore more especially when the search space gets larger, while the CS performance achieved slower convergence that led to failure to perform well. FPA, meanwhile, lacks exploration. Comparing WOA with SCA, WOA had better average test suite results than SCA and produced competitive results in terms of best test suite because SCA has a good local search ability but lacks the global search ability. Lastly, WOA and Jaya produced competitive results when compared to each other. In terms of average results, Jaya achieved better results more frequently than WOA because of its ability to balance between global search and local search, as it is a parameterfree algorithm.
Statistically, the Wilcoxon signed-rank test was applied to the results reported in Table 10. Table 12 presents the outcomes of the Wilcoxon signed-rank test. Statistically, WOA produced better test suite sizes than LAHC, FPA and CS, with the exception of the Jaya and SCA algorithms. On a positive note, WOA managed to produce better results more frequently than Jaya and SCA while most cases showed equal results.
Additionally, the Wilcoxon test reported the results in Table 11 and the outcomes of the Wilcoxon test are shown in Table 13. WOA produced results that are significantly different from that of LAHC, SCA and FPA, with the exception of the Jaya and CS algorithms. This is because of WOA's exploration advantage while LAHC, SCA and FPA lack this ability.   Side by side, our approach is comparable to that of the Binary Decision Diagram (BDD) [67] and SATSolver [68]. BDD exploits a decision diagram to ensure restrictions are turned into constraints. Although useful, the BDD approach is known to suffer from a state explosion problem, which can potentially limit the size of the constrained configuration. SATSolver addresses the aforementioned limitation of BDD but to the expense of large overheads due to the extensive use of the Conjunctive Normal Form (CNF) to represent the constraints. On a positive note, both approaches implicitly guarantee backtrack-freeness (i.e. dead-end constraint satisfiability during the configuration process). Our approach excels in terms of simplicity as compared to both BDD and SATSolver, although the backtrack-freeness must be explicitly checked every time a new solution is generated.

VII. CONCLUSION
A convincing review of the most current approaches for t-way testing was presented. Plus, the recently developed Whale Optimization Algorithm (WOA) was presented for implementation in current state-of-the-art constrained and unconstrained t-way testing and its implementation explained step by step.
In terms of overall performance, WOA showed competitive results to that of well-known AI-based metaheuristics from the literature, bearing in mind that we used the original WOA while the other methods had been modified. Additionally, we designed our own constraints on well-known CAs and implemented six recently-developed AI-based algorithms, including WOA, to comprehensively compare and evaluate the performance of each. The results showed that WOA outperformed most of the AI-based strategies and all of the pure computational strategies. Moreover, WOA showed consistent (i.e. no odd outlier result) overall performance.
In a future work, given our promising results, we expect to expand our approach as a multi-objective optimization method for combinatorial testing. As NSGA-II, four parameters will be taken into consideration to assess the optimality of the test suit, which are the test suit size, the test case priority, the test case frequency, and the test case constraints [69]. The aim is to create a test suit with decreased size and increased priority.
Additionally, we will enhance WOA by either hybridization or by combining it with other meta-heuristics because two main drawbacks of WOA were noted from the experiments: the first drawback is that its adaptive parameter depends on random distribution while the second drawback is that WOA suffers from premature convergence like any other meta-heuristic (evolutionary and swarm) algorithm.

ACKNOWLEDGMENT
ALI ABDULLAH HASSAN would like to thank Hadhramout Foundation, Yemen, for his support in tuition fees.