Trend Application of Machine Learning in Test Case Prioritization: A Review on Techniques

Software quality can be assured by passing the process of software testing. However, software testing process involve many phases which lead to more resources and time consumption. To reduce these downsides, one of the approaches is to adopt test case prioritization (TCP) where numerous works has indicated that TCP do improve the overall software testing performance. TCP does have several kinds of techniques which have their own strengths and weaknesses. As for this review paper, the main objective of this paper is to examine deeper on machine learning (ML) techniques based on research questions created. The research method for this paper was designed in parallel with the research questions. Consequently, 110 primary studies were selected where, 58 were journal articles, 50 were conference papers and 2 considered as others articles. For overall result, it can be said that ML techniques in TCP has trending in recent years yet some improvements are certainly welcomed. There are multiple ML techniques available, in which each technique has specified potential values, advantages, and limitation. It is notable that ML techniques has been considerably discussed in TCP approach for software testing.


I. INTRODUCTION
Software engineering is not just about programming and software development. Software engineering itself is an implementation of engineering procedures in the development of any software in a systematic way [1]. Within the software development process, software testing consumes a long time for execution and can be the most expensive phase [2]. Software testing itself is normally carried out repetitively, even under time constraints and fixed resources. Software engineering groups are regularly compelled to end their testing activities because of financial and time requirements, which causes difficulties such as problems with software quality and client agreements.
Regression testing is an activity that confirms that new versions do not harm the previously functioning software [3], [4]. As the software evolves, the software test suite has the tendency to increase in size, which frequently makes it expensive to execute. Research shows that regression testing is an expensive process that may require more than 33% of the cumulative expenses of the software [5]. In the work of Yoo and Harman [6], various regression test approaches were examined to supplement the importance of the accumulated test suite in regression testing. Those studies were then classified into three domains: minimization, selection, and prioritization. Test case prioritization (TCP) aims to order a set of test cases to achieve early optimization based on preferred properties [1], [7]. It gives an approach the ability to execute first test cases that are highly significant according to some measure, and produce the desired outcome, such as revealing faults earlier and providing feedback to the testers. TCP also helps to find the ideal permutation of a series of test cases and can be executed accordingly [6].
Artificial intelligence (AI) techniques have been successfully used to reduce the effort required to carry out many software engineering activities [8]. In particular, ML techniques, which belong to a research field at the intersection of AI, computer science, and statistics, have been applied to automate various software engineering activities [9]. In a TCP approach, ML techniques have been welcomed in recent years [9]- [11]. As software systems become more complex, some conventional TCP approaches may not scale well [12]. This snowballing complexity has solidified the need for ML techniques in TCP. Even though there have been numerous studies on ML techniques in TCP, there are no advanced literature reviews that illustrate the importance of recent ML techniques for TCP. Therefore, this review paper attempts to show the trends application of ML techniques in TCP.
The point of an review paper is not to simply summarize all current proofs based on research questions, but also to bolster the improvement of evidence-based research recommendations for researchers [13]. This paper is structured as follows: Section 2 considers previous studies related to TCP approaches. Section 3 describes the strategy embraced to direct this review method. Next, results and discussion based on the research questions are presented in Section 4. Research findings are then elaborated in Section 5. In Section 6, the validity threats of this paper are discussed. Finally, Section 7 presents conclusions for this review.

II. BACKGROUND STUDIES
This section discusses prior studies to relate the review paper to the application of ML techniques in TCP. It is apparent that there have been systematic reviews that covered most TCP approach domains. However, there have been no reviews focusing specifically on ML techniques within the TCP approach itself, as ML has been trending in almost all other domains. Therefore, the authors have gathered three review studies and three mapping studies to determine the requirements of this review paper on ML techniques in TCP. A summary of nominated studies is tabulated in Table 1.
In Table I, the first-ranked review study was done by Khatibsyarbini et al. [1], and offered a systematic review of TCP specifically for the approaches available within the domain. This study reviewed 69 studies from 1999 to 2016. Of these 69 works, more than half were taken from highimpact journals, and the rest were from either conferences or symposiums. The review resulted in several findings, and the main finding was that there were many TCP approaches. Each TCP approach specified potential values, advantages, and limitations. The review also found that the search-based TCP using ML techniques showed the most improvement in TCP regression in several recent studies.
The second review paper, authored by Arora et al. [14], covered regression testing and ML over a time period from 2000 to 2016. The majority of the studies within the work were focused on agent-based approaches in regression testing. The findings were highly related to trends and the state of the art of agent-based approaches in regression testing. The paper explored 115 studies, but only 56 studies discussed agentbased software testing, which is partially related to our review study, as this paper focuses on ML in TCP software testing. To pinpoint the finest ML technique for TCP software testing, further reviews of ML in TCP are needed, as ML techniques have been trending in various domains. The next review paper was done by Saeed et al. [15], and deals with ML and software testing. Again, as with previous papers, this work was done in 2016 covering a time span from 1975 to 2012. This work has review 72 primary studies which mainly discuss ML in software testing. The work objectively studies the current state of the art of empirical experimentation with search-based techniques that focus on model-based testing. The results indicate that there were many works that applies AI techniques in model-based testing to achieve functional and structural coverage. The paper also concluded that there was a need for an extensive systematic analysis of the taxonomy of search-based techniques to reveal the limitations and advantages of AI application. As for the last review paper by Mece et al. [9], the paper discuss on TCP with application of ML. This work only reviews 15 primary studies cover from 2006 until 2018. The outcome of this paper manages to give a glimpse of some of ML application in TCP.
In addition to these three review studies, three mapping studies were selected for authors to better articulate relevant research questions for this new review paper study. The first mapping was done back in 2013 by Catal and Mishra [16], and focuses on TCP itself. This mapping presents an overview of trends in available TCP approaches and techniques. This work reviewed the greatest number of papers compared with other review papers, which collectively covered 120 primary studies from 2001 to 2011. The next mapping study was updated in 2019 by Durelli et al. [17], where the work focused mainly on ML in software testing. This mapping covered 48 studies from 1995 to 2018. From this work, it was found that ML was widely used in test case generation and evaluation in software testing. However, the work did not touch on ML used in TCP, where TCP was a crucial element in software testing after the execution of test case generation. Therefore, their work also concluded that there is a need to research how ML algorithms can be used to automate software testing with TCP. As for the final mapping paper, the paper solely focuses on continuous integration in TCP which discussed on the available approaches in continuous integration environment. Their findings highlight testing complexity, time-consuming and test case volatility for TCP in continuous environment as a major challenge. To conclude the background study of prior works, Table II shows a summary of findings from related studies in comparison with this review paper. From Table 2, two works are evident, Khatibsyarbini et al. [1], and Catal and Mishra [16], which discuss TCP approaches. As highlighted before, both works suggest that there is a need for an extensive analysis of search-based techniques in TCP, as the techniques have been trending in recent years. Therefore, to address this need, the authors carried out a review trend application of ML techniques used specifically in TCP testing. As for the other three prior studies, all of them reviewed ML in software testing. However, none of them mainly focused on ML techniques within a TCP approach in software testing. In short, there were some uncovered findings will be revealed in this new review paper.

III. RESEARCH METHOD
A good review paper study requires a clean research method to search for and examine required prior works. With specific goals in mind, a design method as shown in Fig. 1 was systematically carried out to complete this review study. This method was inspired by Khatibsyarbini et al. [1] and Kitchenham [19].
Referring to Fig. 1, there are four main phases within the review protocol, itemized as follows: research questions, search strategy, study selection, and data synthesis and extraction. In the first phase, the research questions to be designed were based on the findings that were uncovered from the prior works discussed in Section 2. Seven main research questions were created to answer the uncovered findings. After the research questions were stated, a search strategy was employed that comprised specific search strings and search processes. The output of the search stage was then moved to the study selection phase. In this phase, the outcome of the search process was subject to inclusion and exclusion criteria to extract relevant studies. Quality assessments were then carried out to further evaluate the scrutinized studies. Finally, the last phase dealt with data synthesis and the extraction of primary studies that were utilized for this review study. Referring to Figure 1, there are four main phases within the review protocol, itemized as follows: research questions, search strategy, study selection, and data synthesis and extraction. In the first phase, the research questions to be designed were based on the findings that were uncovered from the prior works discussed in Section 2. Seven main research questions were created to answer the uncovered findings. After the research questions were stated, a search strategy was employed that comprised specific search strings and search processes. The output of the search stage was then moved to the study selection phase. In this phase, the outcome of the search process was subject to inclusion and exclusion criteria to extract relevant studies. Quality assessments were then carried out to further evaluate the scrutinized studies. Finally, the last phase dealt with data synthesis and the extraction of primary studies that were utilized for this review study. The detail review protocol process was carried out by whom and how much time was cost is tabulated in Table A4 in Appendix section.

A. RESEARCH QUESTIONS STAGE
This review study aims to grasp and analyze recent experimental evidence regarding ML technique in TCP regression testing with respect to the most recent technique for further investigation as the end goal is to improvise the ability of present technique. Simultaneously, the authors wish to review the empirical evaluations used in each reviewed approach. To accomplish this goal, four main research questions with respective motivations were articulated as presented in Table III. All these research questions are relatively associated and concurrently explored in order to frame the objective of this review study. The uncovered and extra findings from Table II that covered by this paper will be answered by these research questions from Table II. To make things clearer, Table IV show the mapping of the uncovered and extra findings to its corresponding research questions.
As for Table IV, each research question manages to answer uncovered findings from previous works. The question was designed based on the uncovered findings also manages to provide some extra findings which serve as added value to this review study. In short, the research questions do have significance values which might be useful for other future works in ML technique in TCP related domain.

B. SEARCH STRATEGY STAGE
A review study required a decent search strategy as it is the key to ensure the broadness of the nominated studies.
Generally, the value of review paper is realized according to the primary studies nominated. The main strategy is to have a good search string and process. In order to make searching process successful, the first thing required is the search string to be used. Not having a good search string may lead to irrelevant outcome. Therefore, the search string formulated in this study followed systemic method which consist of the following criteria: a) Terms related to machine learning in TCP approach. b) Terms related to specific research questions. c) Terms with equivalent words. d) Usage of the Boolean 'OR' and 'AND' operators as link between terms.
Since the main focus this paper to examine ML technique in TCP area, some of the results from previous studies were utilized to handpicked significant studies. "Machine learning" and "test case prioritization" are among the exact phrase utilized by authors in the most of the search queries made. The other aspect of string formulated, the search strings were made directly connected to the respective research questions. Table  V show the connected search string with its respective research questions. "Classification" AND "test case prioritization" AND "advantages" "Clustering" AND "test case prioritization" AND "advantages" "reinforcement learning" AND "test case prioritization" AND "advantages" "Classification" AND "test case prioritization" AND "limitations" "Clustering" AND "test case prioritization" AND "limitations" "reinforcement learning" AND "test case prioritization" AND "limitations" RQ3 Classification technique Clustering technique Process flow Test case prioritization "Classification" AND "test case prioritization" AND "process flow" "Clustering" AND "test case prioritization" AND "process flow"

RQ4
Test case prioritization Evaluation metric Study program Dataset Case study "Dataset" AND "test case prioritization" AND "evaluation metric" "Case study" AND "test case prioritization" AND "evaluation metric" "Study program" AND "test case prioritization" AND "evaluation metric" From Table 5, different search strings were created for each respective research questions. Authors identified specific related terms which widely used to answer each one of the research questions. Each research question does have several related terms used. It is also noticeable that authors utilize an exact phrase "test case prioritization" in all search string combined with other related terms. This is due to avoid the search engine return unnecessary and unrelated result with TCP domain.

C. STUDY SELECTION STAGE
As mentioned previously, to have a high impact review paper it is required to be conducted in an appropriate manner. Therefore, to make the primary studies selection, all the prospective papers gathered underwent a selection stage. This selection stage comprises with two selection phase which name inclusion and exclusion criteria and quality assessment. The process of this stage is depicted in Figure 2. From Figure 2, the process of selection of primary study start with the prospective papers gathered go through inclusion and exclusion criteria phase. The output from the phase were then scrutinize again using quality assessment where then lead toward primary study selection. The inclusion and exclusion criteria used in this review study were tabulated in Table VI, while for the quality assessment tabulated in Table VII. The inclusion and exclusion criteria were applied to see either the study meet the terms related to the research questions, while the quality assessment intended to make sure the study selected at least manage to answer two to three research question appropriately. After the inclusion and exclusion phase, quality assessment was applied. The quality assessment of the selected studies was accomplished by scrutinize the nominated studies either they are adequate enough to answer all the RQ. Were the paper able to answer more than two research questions? 2 Were the paper run on complete experiment? 3 Does the publication publish in appropriate manner? 4 Were the publication have significant contribution?
Authors have tabulated four quality assessment questions shown in Table VII in order to evaluate the nominated papers.
The results of quality assessment were tabulated in Table A1 in Appendix section. Subsequently, some papers were rejected from this assessment phase. Upon the completion of this selection stage, 110 studies were recognized to manifest the capability to answer all of the research questions derived earlier. The inclusion and exclusion criteria were applied to see either the study meet the terms related to the research questions, while the quality assessment intended to make sure the study selected at least manage to answer two to three research question appropriately.

D. DATA SYNTHESIS AND EXTRACTION STAGE
The final stage of this research method is the data synthesis and extraction stage. The synthesis and extraction method were made correspondingly with the derived research questions. This strategy actually already applied in search string and search process where the searching process has been made with specific aim for specific data type required for each research question. Consequently, this process does benefit data extraction phase to answer each research questions. The data collected for each research question were tabulated in Table VIII.

IV. RESULT AND DISCUSSION
This section outlines the results with respect to the research questions. The summary of the primary studies was presented first, followed by each research question, answered in different sub-section. Figure 3 show the percentages of collated studies. For the overview collated studies, 110 primary studies in total were nominated for this review. From the primary studies, there were 58 journal articles, 50 conference papers and 2 others articles. All the studies then were analysed and discussed under research question that been discussed previously The percentage of the collated studies shown in Figure 3 while for the detail overviews of selected studies, Table A2 in Appendix section tabulated the information.

B. WHAT IS THE RESEARCH TREND OF MACHINE LEARNING IN TCP? (RQ1.1)
As search based TCP approach has been quite popular in recent years [1], [20], [21], the application of AI in TCP was then suggested to be assessed in a comprehensive context. Since AI quite big to be cover in single review study, only ML techniques taxonomy in TCP will be covered. The first RQ is to find the taxonomy of ML in TCP. As for the first aspect of first research questions was to examine the current publication trend regarding ML technique in TCP studies. The trend of paper published per year is depicted in Figure 4. As the day progress, there were many new ML techniques were introduced. All these ML techniques can be categorized in several category [22]. Work by Durelli et al. [17], suggested that there were as many as five categories of ML. However, two out of five was supervised combination on semisupervised category which have only one reference only. Therefore, authors agreed to have only three main categories in ML within TCP approach regression testing. The three categories named by supervised, unsupervised and reinforcement. Figure 5 shows the taxonomy of ML in TCP with its respective techniques.

FIGURE 5. Overview of taxonomy of ML techniques in TCP
The first category is supervised ML which can divided into two types of algorithms, classification and regression. Classification algorithm attempt to assess the mapping from input variable to produce isolated output variables [23]- [25]. Output category is the results from the mapping function predicts. A classification model will try to calculate the output of a single or several conclusions based on the input variables. The most popular classification algorithms are K Nearest Neighbours and decision trees [26], [27]. As for regression algorithms, it attempts to assess the mapping from input variable to produce continuous output variables [25], [28]- [30]. Linear regression, regression trees, and Support Vector Regression (SVR) are the example of the common regression algorithms.
The second category is unsupervised ML which again can be divided into two type of algorithms, clustering and dimensional reduction. Clustering algorithms attempt to group (called cluster) object while making sure each objects from different cluster are not similar [31]- [33]. In order to cluster, defining the distance among the object is crucial part to achieve a perfect clustering process. There were many clustering algorithms available in the literature, K-Means can be said as the most popular algorithm among the researchers to be taken as their benchmark [34], [35]. The last category can be named as reinforcement learning. This reinforcement VOLUME XX, 2017 9 learning is a goal oriented algorithms which learn how to achieve a specific goal or to help maximize the cumulative reward in an environment where software agent take actions [36]- [38]. Q-learning and neural network are among the popular algorithm within reinforcement learning [39]- [41]. In short, each of these three categories present different learning process depending on available dataset.

C. WHAT IS THE DISTRIBUTION OF ML TECHNIQUES IN TCP AND IT REASONING? (RQ1.2)
As for the second aspect of first research question, the RQ required a discussion on which ML technique were most utilized and why does it been chosen. The distribution for each technique is illustrated in Figure 6. The list of prior works selected for each discovered ML technique in TCP is tabulated in Table A3 in Appendix section.

FIGURE 6. Percentages distribution of ML techniques
From Figure 6, the results showed that classification machine learning technique is the most utilized among the selected studies. It takes 38% from the collated studies. As we know, classification technique lies under supervised category which within the category there were several algorithms could be used including Bayesian Network [32], [42]- [44], Swarm Intelligence [45]- [49], Fuzzy [50], [51] and others [52]. There were some observations are noted for classification technique utilization. Firstly, classification technique required training data which in TCP empirical data normally come with historic version which can serve as their training data [1], [17]. Second, classification target to predict discrete value which highly compatible with TCP aim which ideally to find which test cases faulty or not.
The second largest utilized technique reported in collated studies is clustering techniques with 32% contributed by these notable works [34], [51], [53]- [57]. Clustering technique look like classification which aim to grouping the inputs but they difference in term of the needs of training and testing dataset. Clustering lie in unsupervised category which has been identified in previous sub-section 4.2. Unsupervised clustering technique complexity is far less complex in compared to classification technique which considered to be the reason this technique been selected. Apart from that, not having a training and testing dataset could reduce time and resources for more cost effective TCP which can be noted for clustering technique utilization [53], [58].
Reinforcement learning technique comes as the third most utilized technique reported from the collated studies with 17% portion. The authors believe this technique able to hit such a number as the researchers [59]- [63] works on continuous integration which is a situation condition in TCP. A part from that a multi-objective TCP also play main role to have this techniques reinforcement learning been selected as this technique help maximize the cumulative reward in an environment where software agent take actions [36]- [38].
Regression and dimensional technique which have 6% and 7% portion correspondingly, which lose miserably to their superior technique within their respective category. Regression technique which categorized under supervised ML has only 6% utilization [24], [48], [64]- [66] as the technique dependent on numerical in compare to classification which dependent on categorical. Regression technique is more on statistical analysis in order to reveal the relationship between independent variables and dependent variables [67]. As for dimensional reduction, having only 7% portion did not seem to be much known but still have its own fans [68]- [70]. Authors believe this may due to the availability of other technique in TCP is much more superior and easier to access. However, the gap of this distribution percentage is getting closer. Figure 7 show the modern trend of ML techniques in TCP. other hand, the trend seems to be able maintain higher than the others for final two years. As for the other techniques, the trend still on sideways mode.

D. WHAT ARE THE METAPHORS, STRENGTH, AND RESTRICTIONS OF EXISTING ML TECHNIQUES? (RQ2.1)
The second research question aims to see the differences of ML techniques in TCP. A s for the first aspect of second research question, the metaphors for each ML techniques as illustrated in Figure 6 is tabulated in

E. HOW WERE ML TECHNIQUE APPLIED AND HOW DID THEY AFFECT TCP RESULTS? (RQ2.2)
As for the second aspect for second research question, to answer this question the selected studies were examined deeper into their experimental setup and results. For each ML techniques, authors select certain work to be elaborated in order to give a glimpse on the application of the techniques and how it affects TCP results.

Supervised ML technique
Supervised ML technique is a technique which utilized history or training data to be used in later classification process [81]. As in TCP context, most of the available dataset or study program comes with previous version which can be utilized as training data for further classification technique which far preferable compare to regression. All available previous data were analysed and trained under ML algorithms which produce a hypothesis. This hypothesis then used for classification for the current version of test case which will undergo TCP process. Work by [82], proposed a technique which utilize bug history of the software order to predict defect in the system. The model designed able to estimate faultproneness in source code which then can be used to classify test case accordingly with coverage-based TCP approach. Recent studies show that using appropriate history can significantly coverage based TCP approach [1], [82]- [85].

Unsupervised ML technique
Unsupervised ML technique is the technique reserved when there were no historic information or incomplete information regarding study program. Unsupervised ML technique may also have been chosen as it been claim for far less complex in compare to supervised ML technique [71], [76]. Clustering technique was notable as most popular unsupervised ML technique in TCP. Work by Chen [34], proposed adaptive random sequence based on clustering techniques. By using black box information their clustering techniques manage to cluster test cases as diverse as possible. As the experiment conducted further, the result shows that the technique manages to unfold fault at earlier stage with higher effectiveness. Recent studies also show that clustering technique may have high efficiency in term of time execution which lead to cost effectiveness [58], [71].

Reinforcement Learning ML technique
As for the last technique in ML which is reinforcement learning, it may seem not very popular enough in TCP, there still some notable work [18], [38], [40], [86], which apply the technique. One of the reason of this technique been chosen was the continuous integration in TCP [18], [59], [86]. Work by [40] demonstrated reinforcement learning in TCP. This technique was introduced in order to reduce and save computing resources as the integration continuous executed. The experiment was executed using three datasets and show that reward function in reinforcement learning do have cost effect in the continuous integration environment TCP. However there also has been reported to have excessive condition during learning process may lead to reduced result accuracy [38], [59], [87].
In short, each of the .ML techniques do have advantages in different situation.

F. WHAT ARE THE PROCESSES INVOLVED IN ML TECHNIQUE IN TCP? (RQ3)
Engineering is an art of constructing something complex look more straightforward. In this case, software engineering also does extremely concern on how the process applied throughout the software development period. Therefore, authors took initiative to investigated this kind of research question. In order to have systematic complete experiment, every experiment should follow design process to make sure the solution is run at complete satisfactory. Some of the selected studies were inspected further regarding their experiment flow. As there are two most popular ML techniques in TCP, authors able to designed standard flow of both ML techniques illustrated as in Figure 8 and Figure 9. As shown in Figure 8, the standard flow process for clustering in TCP have five stage while in Figure 9, classification have extra four stage before classification of test cases take place. Both of the process may start with test suites generated then move to analyse the test case information. Even though no single work clearly described these two processes, we can agree that any experiment or research activity should identify an analysed their data information first. After available information analysed, the ML technique then can be applied either clustering or classification. However, for classification do have extra work before the process can be started. Works by these researchers [74], [75], [82], [83], demonstrated few steps before classification take place. The steps are known as training phases which learn from previous version of study program or any history data which the come out with specific hypothesis. This hypothesis then used to do the classification of test suites later on. As for clustering technique there is no required pre-trained data to do the clustering. The works by researchers [34], [71], [76], [78], clearly demonstrated there were no training data required where the process directly can be started after analysed current available information. Therefore, it can be consider the main reason behind the claim that clustering technique have high efficiency in term of time execution which lead to cost effectiveness [58], [71]. After the clustering and classification test case executed, both techniques employed similar steps toward the end of the process. The next step is prioritizing the clustered or classed test case followed by evaluation of prioritized test cases.

G. WHAT AND WHICH SUBJECT STUDY TYPE USED RESPECTIVELY TO ML TECHNIQUES IN TCP? (RQ4.1)
As for the final research question which aims to unveiled the state or art on evaluation method used for ML technique in TCP, the first aspect of this question is to reveal the popular type of subject study utilized. There were three type subject study that normally used in any experiment or research study which can named as open-source programs, lab programs and industrial programs. The percentage of utilized study programs among selected study has been depicted in Figure  10. From Figure 10, we can see the most used programs were open-source programs with 47% portion followed by lab program, 31% portion and industrial programs with 22% portion. Some of the open-source programs can be referred in the work of Khatibsyarbini [1]. Authors purposely to only discuss programs type used instead of listing out every programs used since most of them have been listed out and discuss in previous works [1], [14], [16], [17]. The opensource program leads the most utilized study programs as the open-source program mostly come numerous versions with various size of programs [34], [88]. As for industrial programs, authors believe the availability of industrial programs were limited for some institution which have connection directly with the industrial organization. Works by [23], [35], [61], [78] demonstrated an industrial program evaluation method where most part of the information within the programs cannot be access as confidential issues. As for lab programs, some institution may have established lab with a good team could proceed with the own study program. Also similar with the issues in industrial programs, the confidential information of the programs may reduce the availability of program to be utilized in other works [14], [57], [89], [90]. As the distribution of size of study programs used, the information illustrated in Figure 11. From the Figure 11, open-source programs have the most number of studies in all size of study programs which have been noted as the main reasons for the most utilized study programs type in ML technique in TCP. Apart from that, Figure 11 revealed that ML technique in TCP preferred to use medium to large size of program instead of small as one of the purposed of ML itself to improve performance in term of efficiency in large scale environment. However, small scale program still reliable either to prove the concept of the ML before moving toward bigger scale of study programs.

H. WHAT KIND OF EVALUATION METRICS USED IN ML TECHNIQUES IN TCP? (RQ4.2)
In any empirical study, the most important element where could highlight either the study success or not was the results which can be determined by using several evaluation metrics. There were numerous evaluation metrics used in TCP approach. Figure 12 shows the hierarchy of evaluation method in ML technique in TCP.

FIGURE 12. Hierarchy of evaluation method
From Figure 12, there were three main evaluation type which can categorize by name, statistical evaluation, performance evaluation and outcome evaluation. The main evaluation type is outcome where the evaluation was made accordingly to its main objective. Within outcome type evaluation, there were average percentage fault detected (APFD) and coverage evaluation metric which can be consider popular among the researcher in TCP domain [1], [6], [73].
Work by [1], their findings show that average percentage fault detected (APFD) was the most utilized evaluation metric across the TCP approach. APFD is a metric used to quantify how rapid a prioritized test suite detects faults which could be consider as compulsory evaluation metric in TCP [91], [92]. The values of APFD result were ranged from 0 to 1 where higher value means better faults detection rates. The equation for calculating the APFD value is shown as below.

. . . 2
Where T is a test suite containing n test cases, F is a fault from set of m faults revealed by T.
is the first test case in ordering of T which reveals fault number i and the APFD value calculated using the equation.
After outcome evaluation, empirical experiment using ML technique in TCP domain typically will highlight the performance of their techniques [39], [56], [93], [94]. This performance could be determined by the time execution of the algorithm and also by the cost involved. Whilst the evaluation stage of the experiment could stop at performance evaluation, there were few works continue with statistical evaluation. Statistical evaluation were mainly used to verify the validity of the outcome of the experiment [59], [95]. At the end it is within the choices of the researcher either to run all type evaluation available or simply go for the outcome evaluation only. As for distribution evaluation metric used in ML techniques for TCP, the data depicted in Figure 13.  Figure 13, we can see all techniques category utilized APFD evaluation metric as the APFD itself is the main metric for TCP evaluation. The supervised and unsupervised techniques have similar nature of evaluation style. Both techniques are more focused on outcome-based evaluation type and time execution for performance-based. This is due to both techniques have quite similar ML strategy which dependent on data either supervised data or non-supervised data. As for reinforcement learning strategy in TCP context, the evaluation is more focused on statistical-based and cost for performance based. The nature of continuous learning in this category contributes the needs of statistical evaluation to assess the preciseness of the learning process.

V. RESEARCH FINDINGS
In the rise of machine learning in TCP domain, it is essential the knowledge of the current state of ML technique in TCP. The detailed techniques of ML within TCP are vital in order to achieve optimize TCP results. Therefore, to highlight the impact of ML technique in TCP domain, the findings for each research questions must be emphasized more. The summary of the finding of subsequent research questions were tabulated in Table XII. For the first research questions most of the selected studies were used to illustrate the taxonomies of ML techniques in TCP. From the results, there were three main ML techniques category and still broadly open for perfection. The publication trend of ML technique in TCP show significant improvement through the years. New ML technique using various kind of algorithm are introduced consistently almost every month. The result also show that classification technique category was the most popular follow by clustering then reinforcement learning come as the last preferred. Even though so, each of these techniques have their own supporter where does not really concern about the popularity of the technique. This can be proven by some recent publication where successfully employ reinforcement learning technique [73], [96], [97] even there were less literature available regarding the strength of the technique. For the next research question, which intended to reveal the differences among the main available ML techniques, conclude that there were noteworthy differences in the idea of execution of ML techniques. The most notable difference was the main objective of the selected ML technique. Coverage based objective, classification technique would benefit the most [98]- [100]. As for performance wise objective, clustering technique would do the best [101]- [103]. Apart from that, the strength and limitation for each technique were discussed which can help other future work to select which technique suitable with their available resources. In short, each technique has specified potential values, benefits, and drawback.
As for the special research question which does not have any sub aspect, several studies were investigated deep into their experimental setup to give a glimpse on standard process flow in ML technique in TCP. The employment of standard process is highly essential in order to have clean project execution. The results of this research question shows that the supervised ML technique involved in training data process while the unsupervised is more straight forward. This variation of the process does profit any project manager or researcher to select which technique suite with their available resources and project schedule.
For the last research question, the results conclude that the subject study available do plays important role for the ML technique to be chosen in the first place. Medium size to large scale open source study program was consider as the most preferred due to the availability and accessibility of the study program. However, industrial study program would do better in proving the effectiveness the ML technique in real world application [47], [104]. As for the of evaluation metric, most of the previous reviews works already revealed that APFD was the main evaluation metric in TCP domain [1], [6], [9], [15], [16]. However, in this review study, the last research question categorizes the evaluation metric available in TCP domain specifically in ML technique into three categories. From the three categories, outcome evaluation type using APFD metric which consider the primary evaluation metric in TCP domain itself. As for ML technique works which performance wise objective would proceed with performance evaluation metric and may go for statistical evaluation to verify the results.

VI. THREAT OF VALIDITY
As a human, authors could not possibly produce a perfect review study in all aspect. Therefore, the weakness of this review study which could threaten its validity is recognized. The flaw in selecting primary studies and uncovered related field are the potential threats determined associated with human error.

A. SELECTION OF PRIMARY STUDIES
The selection of primary studies for this review paper were made with consideration in answering the designed research question respectively. In Section III, the research method used in this review study is presented in detail illustrate the process of selection of primary studies. However, in the process of the selection primary studies, it is hard for the authors to ensure all accessible works related to TCP and ML technique were reviewed. The most considerably issue can be highlight here is the numbers of research work enormously available with misleading keywords and research summary which could resulted in time wasting read through the whole research work one by one. Therefore, to encounter this issue, authors agreed to make the selection of primary study depend on specific search string connected to research question respectively.

B. UNCOVER RELATED FIELD
Within the TCP approach testing, there are several notable techniques available. However, this review study only focus on ML technique in TCP approach as ML technique which has been trending in almost other domain in recent year. Therefore, authors take initiative to investigate the state of art of ML in TCP approach to encourage the development of ML technique. In reviewing the ML technique, there were some related field not included in this review paper. The most notable uncover related field was the list of algorithms used in this ML technique. The issue here is, most of the algorithm nowadays could be tuned into different type ML technique. To make things clearer, work by [105] using neural network algorithm in classification technique, while work by [103] tweak the neural network to work on clustering technique. Therefore, to avoid misleading information, authors agreed to not list out algorithms available for each ML technique category as the algorithm can be tweak to fit the technique intended.

VII. CONCLUSION
As this paper come to the end, the purpose of this review paper has been achieved by answering all the research questions designated. The results obtained through the review study methodology scheme which required finding, categorizing and evaluating the primary studies. All this effort intended to aid other researchers to have a glimpse of current state of ML technique in TCP subsequently lead to any sort of improvement. As the result of this review, there were several notable findings which could give a guide for future work. The discovered notable findings were: