An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations

The study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fragments which, after analysis, must be reassembled. Since this reassembly takes a non-specific polynomial time to solve, the DNA fragment assembly problem (DFAP) is NP-hard. This paper proposes a new assembler for tackling the DFAP based on the overlap-layout-consensus (OLC) approach. The proposed assembler adapts a discrete whale optimization algorithm (DWOA) using standard operators adopted from evolutionary algorithms to simulate the strategy adopted by humpback whales when searching for prey. For the first time, we formulate the behaviors of whales to be applied directly to any discrete optimization problem based on three primary operations: a swap-based best-position operator, an ordered crossover operator, and selection of a random whale operation to perform the exploitation and exploration phases of the algorithm. These operations were carefully designed to preserve the methodology of the original whale algorithm. DFAP is a multi-objective problem that seeks to reach the optimal order of segments that maximizes the overlap score and minimizes the number of contigs (set of overlapping DNA segments) to compose a one-contig DNA strand. Existing local search methods, such as problem aware local search (PALS) many non-conflicting movements (PALS2-many), suffer from being trapped in local optima. Hence, the integration of DWOA with PALS2-many improves the search capability for finding the optimal order of fragments. In addition, we propose a new variation of PALS2-many that achieves simultaneously the two objectives of DFAP. Our proposed DWOA was compared with a number of the most recent robust assemblers: a hybrid crow search algorithm for solving the DFAP (CSA-P2M*Fit), P2M*Fit, and a hybrid genetic algorithm (GA-P2M*Fit). The experimental results and statistical analyses of the proposed DWOA on thirty benchmark instances show that DWOA significantly outperforms those algorithms in reaching fewer contigs, in addition to being competitive with CSA-P2M*Fit and superior to P2M*Fit and GA-P2M*Fit for the overlap score.


I. INTRODUCTION
Progress in the study of deoxyribonucleic acid (DNA) has allowed the early detection and prediction of an individual's exposure to many diseases including as cancer The associate editor coordinating the review of this manuscript and approving it for publication was Shuai Liu .
[1]- [3] and autoimmune [4] diseases. DNA analysis has been further extended to the fields of forensics and crime detection [5]- [8], genetic engineering and agriculture (i.e., improving the productivity of crops) [9]- [11]. Despite such wide-spread applications of DNA analysis, reading the complete DNA sequence is still an onerous task due to the massive length of DNA strands-human DNA is estimated to contain about 3.2 billion nucleotides [12], [13]. So, a standard procedure is to divide the DNA strands into small segments or fragments at random positions to facilitate the reading process. Once the analysis is complete, the DNA fragments must be combined back into the original DNA sequencethe process is referred to as the DNA fragment assembly problem (DFAP). In DFAP, the main objective is to find the optimal order of the fragments to reassemble the original DNA sequence; the layout phase is therefore considered as the core of DFAP.
In order to ensure the accuracy of any reassembled DNA, the two main objectives of DFAP are to attain the optimal order of the fragments that combine to form the original DNA while maximizing the overlap score among these fragments and minimizing the number of contigs (set of overlapping DNA segments). The traditional approaches of DFAP try all the possible fragment combinations in order to detect the best combination. For F fragments, there are 2 F F! possible combinations such that solution time increases exponentially and it may take years to find the exact solution [12]. Due to the significant success of meta-heuristic algorithms in solving such problems in a reasonable time [14]- [21], this work was motivated to adapt those algorithms for solving DFAP in order to overcome the computational time and accuracy problems which plague traditional methods.
Recently, a meta-heuristic algorithm [22] known as whale optimization algorithm (WOA) has been proposed for tackling continuous optimization problems [23]- [25]. WOA has the advantages over a large number of the other meta-heuristic algorithms that motivate investigation of its use for tackling the DNA fragment assembly problem: • Having two exploitation capabilities allows the whales to quickly move toward the optimal solution.
• Terminating its exploration capability after the first half of the iterations assists in accelerating convergence due to reducing the diversity between the members of the population; noting that this is considered a disadvantage for a problem with several local minima and a global minimum that is not easy to reach, as found in such problems as parameter extraction problem of double-and triple-diode models for solar photovoltaic systems [26].
• Easy to understand and implement. In addition, to the best of our knowledge, there is no research tackling this problem using WOA. Recently, several variants of the WOA have been proposed for tackling various discrete optimization problems. In [27] A binary version of the WOA has been proposed for tackling the binary optimization problems, specifically three engineering optimization problems and a real-world travelling salesman problem. Mafarja [28] integrated the WOA with simulated annealing (SA) to address the feature selection problem; the SA was used to improve the quality of the bestso-far solution after a number of iterations until accelerating convergence toward the optimal solution. WOA was also integrated with the quantum theorem [29] to improve the diversification and intensification of the standard WOA for the feature selection problem. Further, WOA was improved by the Lévy flight strategy and the local search strategy (LSS) [30] for tackling the single and multidimensional 0-1 knapsack problems. WOA has also been suggested in [31] for tackling the clustering problem in data mining. Abdel-Basset [32] modified the WOA and hybridized this modified version with a LSS for addressing the scheduling of the multimedia data objects. Further, in [33] the green job scheduling problem was tackled by proposing a discrete version of the WOA. Jiang, et al. [34] improved the WOA for tackling the energy-efficient scheduling problem; the improvement was based on the dispatch rules, a nonlinear convergence factor, and a mutation operation (MO).
After reviewing the recent published variants of the WOA, we found that no variant has been proposed for addressing DFAP. Therefore, in this paper, we propose a new discrete variant based on adapting the behaviors of the WOA under relevant genetic operators for tackling DFAP.
At the outset, the performance of the standard WOA mapped using the largest position value (LPV) technique is initially proposed for addressing DFAP. LPV arranges the continuous positions of the whale in descending order. The largest position value is mapped to 1, the second-largest position value is mapped to 2, and so on. However, the performance of the WOA under this mapping technique is poor in comparison to some of the recent robust algorithms as shown in the results section. Recently, a new trend has appeared in several reported works, such as the crow search algorithm [35] and Faris et al. [36] to convert continuous optimization algorithms into discrete versions by borrowing some adequate genetic operators to simulate the nature of the standard algorithm for tackling combinatorial problems. This trend, in addition to the poor performance of the standard WOA, motivates us to propose a discrete version of the WOA by borrowing some relevant genetic operators for tackling the DFAP. Specifically, the discrete version of WOA (DWOA) is proposed under a number of genetic operators that are utilized to mimic the behavior of the WOA discretely: • the swap-based best position operator to mimic the action of encircling prey; • the ordered crossover operator to mimic the bubble-net attacking method; and • random positions are used to search for prey. The DWOA was compared with the standard version and a number of the well-known robust algorithms mapped using LPV and the results of the comparison show the efficacy of the DWOA when solving the DFAP. Moreover, the DFAP is solved under two objectives: minimizing the number of contigs with maximizing the overlap score. Therefore, at the first subIter iterations, DWOA is integrated with the PALS2-many technique [37] applied to search the best order of the fragments that minimizes the number of the contigs within this number of iterations. PALS2-many is summarized as follows: VOLUME 8, 2020 1) The algorithm generates a neighborhood solution (NS) by a movement that reverses the sub-permutation between two different positions. 2) Next, the variations are calculated in the overlap score and the number of contigs between the current solution and the NS for only the swapping fragments. If the variation in the number of contigs is minimized or the overlap core is maximized with preserving the number of contigs, this movement is stored in a list. 3) Then, the algorithm selects many non-conflicting movements from the list that minimize the number of contigs and applies them on the current solution. 4) Finally, the previous three steps are continuously executed until there is no movement improving the current solution. Then, within the remainder number of the iterations, an improvement on the PALS2-many was proposed and called PALS2-many-based fitness and contig (PMFC). The PMFC strategy applied to the current solution only the movements that increase the overlap while reducing or keeping the number of contigs. This new variant of DWOA that used a local search method to improve its quality in terms of the number of contigs and overlap score is abbreviated as DWOA-LS. The proposed algorithm (DWOA-LS) works on maximizing the overlap score based on the DWOA considering as the man objective that need to be optimized in our research as shown in most of the papers in the literature. While, within the first subIter iterations, the LS works on finding the order of the fragments that minimizes the number of contigs with an overlap score higher than obtained by the DWOA. Afterwards, within the rest of the iterations to maximize the overlap score considering the main objective of our research, the LS is adapted to search for the highest overlap score with preserving the number of contigs, or minimizing it. In general, the LS is employed at the first subIter iterations to find the order that will minimize the number of contigs with preserving the overlap score. For the remainder of the iterations, the LS will be functionalized to optimizing the overlap score with preserving the number of the contigs.
Generally, the main contributions and novelty of this work are: 1) Development of a discrete WOA (DWOA) for solving DFAP, that borrows some of the standard operators adopted from evolutionary algorithms to mimic the behaviors of humpback whales. 2) Incorporation of two advanced local search strategies (PALS2-many and an improved variant of PALS2many) with DWOA (DWOA-LS) to boost the searching capability. 3) Conduct of a rigorous experimental analysis and comparison with the proposed DWOA against other existing assemblers. To do so, thirty instances were considered in terms of overlap score, the number of contigs, and a new evaluation function. 4) The investigation shows that DWOA outperforms all other algorithms used in the comparison, when considering two objectives together: the first objective is to minimize the number of contigs, and the second objective is to maximize the overlap score. The remainder of this paper is organized as follows. Section II summarizes the related work of the DFAP. Section III presents the DNA fragment assembly problem. Section IV overviews the standard WOA. Section V illustrates the proposed approach to adapt WOA to solve the DFAP. Section VI presents the discussion and the experimental results of the proposed method for addressing DFAP on three sets of standard benchmarks. Section VII draws conclusions about the proposed approach and highlights some potential future work.

II. RELATED WORK
Alba and Luque [38] presented a heuristic algorithm called problem aware local search (PALS) that can obtain near-optimal solutions for DFAP better than the existing assemblers including PMA [39] and available commercial packages such as CAP3 [40]. Nonetheless, their proposed heuristic algorithm was still trapped in local optima and consequently converged slowly towards the optimal solution, particularly for large-scale DFAPs. Therefore, researchers have been encouraged to find new and effective methods for problems involving larger numbers of fragments. The emergence of metaheuristic algorithms and their promising success in tackling many optimization problems [41]- [47] has attracted many researchers to look to use such techniques for solving DFAP. One of the first algorithms that solved DFAP was a genetic algorithm (GA) proposed by Parsons et al. [48] using sorted-order and traditional permutation representations. Their results demonstrated that the edge-recombination crossover works better for such a problem, so they incorporated the permutation representation and the edge-recombination operation in a GA which produced better results than a greedy algorithm.
Nebro et al. [49] proposed a GA and solved the DFAP with the aid of a computing grid comprising 150 computers, reducing computing time from days to hours. Further, Hughes et al. [50] presented three variations of GA based on ring species, island model, and recentering-restarting to maximize the overlap score and obtain high-quality orderings. They integrated two heuristics, including 2-opt and Lin-Kernighan since one heuristic alone can become trapped in local optima. They concluded that the recentering-restarting variations works better with their proposed heuristic. However, in most of the solved instances, the algorithm does not attain the optimum overlap.
Bucur [51] designed an advanced GA by employing segmented permutations for representing candidate solutions. Their algorithm was verified using only three instances of DFAP of medium sizes. More recently, Rathee et al. [52] incorporated the quantum computing concept with GA (QGFA) to carry out DNA fragment assembly with an overlap-layout-consensus process. The performance of QGFA was assessed against several 222146 VOLUME 8, 2020 well-known algorithms, with the results showing that QGFA performs better that these algorithms for both the number of contigs and the overlap score obtained. In [53], three efficient algorithms (GA, simulated annealing (SA), and PALS) were proposed for handling noisy and noiseless DFAP instances. Among those meta-heuristics, SA demonstrated the best results for noiseless DFAP instances, while GA showed better performance for DFAP in the presence of noise.
Particle swarm optimization (PSO) is another metaheuristic algorithm that is commonly used in solving DFAP. Some of the works done for tackling the DFAP based on PSO will be reviewed within this paragraph. Rajagopal and Sankareswaran [54] studied three variations of PSO, including constant and dynamic inertia weight, and adaptive PSO, with the adaptive PSO providing superior results. In [55], six variations of PSO were introduced using two seeding algorithms to generate the population and variable neighborhood search (VNS). The results showed that combining the SA, tabu search, and VNS with PSO is the best variant. Some authors have resorted to hybridization as a trend for tackling DFAP. Mallén-Fullerton and Fernandez-Anaya [56] combined PSO with the differential evolution (DE) algorithm. Further, Huang et al. [57] integrated PSO with the SA algorithm which was shown to outperform PSO, albeit with an increase in computational time.
Vidal and Olivera [58] presented a firefly algorithm (FA) [59] on a GPU (DFA-GPU) for DFAP. A local search (LS) is combined with DFA-GPU, which provides a parallel model for tackling different DFAP instances without degradation in the performance and time-consuming. For the suggested crow search algorithm (CSA) in [35], their fitness function only considered the overlap score, without considering the number of contigs which, as a consequence, results in inferior performance. In addition, Ali [60] proposed a discrete particle swarm optimization (DPSO) based on a new updating rules known as probabilistic edge recombination (PER) for tackling the layout stage in the OLC DFAP. PER operator creates a new permutation by considering relative ordering of DNA fragments. In addition, Ali, within the same research, created another variant of DPSO combined with PALS to improve the exploitation capability for reaching better outcomes; this variant was called quick-PALS.
Furthermore, the memetic gravitational search algorithm (MGSA) [61] is proposed for tackling the DFAP based on the OLC approach and used the tabu search to initialize the population. Moreover, MGSA used time-varying maximum velocities to increase the diversity among the members of the population to reduce the probability of stuck into local minima problem. Finally, MGSA was integrated with the SA-based variable neighborhood search to improve the accuracy of the best solution obtained by MGSA. The MGSA was validated on 19 DFAP instances in an attempt to maximize the overlap score among the fragments of each sequence. However, the MGSA doesn't take in consideration the number of contigs, which is its main limitation.
Ülker [62] adapted the harmony search (HS) algorithm for DFAP. Because HS was proposed to address the continuous optimization problem and the DFAP is a combinatorial one, the smallest position value was used to convert the continuous solutions generated by HS into permutation ones to be adequate for tackling this combinatorial problem: DFAP. The performance of HS was observed using three real DNA datasets. Indumathy [63] adapted the cuckoo search (CS) algorithm to reconstruct the original sequence of the segmented DNA as the first attempt to apply this algorithm on the DFAP. The CS algorithmwas observed using nine instances and compared with a number of variants of the PSO. The experimental results show the superiority of the CS over those variants. Other metaheuristics developed for DFAP include the ant colony system algorithm [64], and the bee algorithm [65].
All the algorithms mentioned in the literature dealt with DFAP by improving one of the following two objectives: (1) minimizing the number of contigs only, or (2) maximizing the overlap score only. This is at odds with the nature of the problem that needs to achieve the two objectives simultaneously when looking for the best order of the DNA fragments while maximizing the overlap score among the fragments in this order. Specifically, the major problems that affect the algorithms developed in the literature are summarized as follows: • Most of approaches have difficulty in avoiding becoming trapped in local optima.
• The efficacy of most approaches has not been tested on a sufficient number of large-scale instances.
• Existing algorithms can obtain the optimal overlap for several cases, but the number of subsequent contigs is too large. The significance of quick, reliable DNA analysis and the shortcomings of existing approaches motivate us to suggest an efficient assembler based on DWOA for solving the DFAP. The efficient assembler DWOA-LS presented here works on finding the best order of the fragments while maximizing the overlap among them.

III. DNA FRAGMENT ASSEMBLY PROBLEM
Deoxyribonucleic acid (DNA) is the hereditary material that stores the information required to create all living organisms. DNA consists of four chemical bases, including Adenine (A), Cytosine (C), Thymine (T), and Guanine (G). The sequence of the four letters (A, G, C, and T) indicates the available information for building the living organism. The letters join in pairs, A with T, C with G, to create base pairs. Each pair is tied to a sugar molecule and a phosphate molecule to compose a nucleotide. A nucleotide resembles a ladder comprising of two long strands that make a spiral called a double helix [35], [66].
In the field of computational biology, these sequences are used to extract the function of information coded in DNA. The human genome consists of a vast number of bases (about 3.2 billion), and the current sequencing technologies cannot VOLUME 8, 2020 read more than 1000 bases. Accordingly, a shotgun sequencing strategy has been used to overcome that problem by breaking the long DNA sequence into smaller pieces called fragments or segments. These fragments are sequenced randomly by machine so that the original order and orientation of them is lost. The fragments have to be reassembled based on the overlap among them to regain the original order of the DNA. The assembler has to calculate the overlap between all possible pairs of fragments to reassemble the ones that have the highest similarity score to compose a contig, which consists of a set of contiguous and overlapping fragments. The problem is known as the DNA fragment assembly problem (DFAP) and the challenge is to reassemble the contigs successfully to retrieve the original DNA.
The traditional assembly approach involves three stages: overlap, layout, and consensus, which is known as the overlap-layout-consensus (OLC) approach. In the overlap step, the assembler calculates the overlap score between all possible fragments. Detecting the highest overlap score between the prefix of one fragment and the suffix of another is the objective of this step. The semi-global alignment method is adopted and is implemented using dynamic programming [3]- [5]. In the layout step, the order of the fragments that maximizes the sum of all the overlaps of each adjusted fragment is reassembled until the original DNA sequence is obtained. Finally, in the consensus step, the order of the fragments from the layout step is used to form the complete DNA. These phases are illustrated in Fig. 1.

IV. STANDARD WHALE OPTIMIZATION ALGORITHM: OVERVIEW
In WOA, Mirjalili and Lewis [22] mimicked the actions and behaviors of humpback whales, which use an astounding feeding method called the bubble-net approach when attacking their victim or prey. They surround the victim in a spiral shape and then swim up to the surface in a shrinking circle. WOA mimics this hunting tradeoff between a spiral model and a shrinking encircling prey with a probability of 50% to update the position of the whale. The mathematical model for the encircling mechanism is as follows: where − → Z is the position vector of the current whale, t the current iteration, − → Z * the position vector of the best whale in the population, rand a random number between [0, 1], t maxIter to maximum number of iterations, and a a distance control parameter linearly decreased from 2 to 0 [22]. The spiral model tries to mimic the helix-shaped movement of whales. The mathematical model of a spiral shape is as follows: where − − → Dist is a vector used to store the absolute distance between − → Z * (t) and − → Z (t), l a numerical value created randomly between [-1, 1] and b a fixed value to depict the logarithmic spiral shape. To search for the prey in another direction of the search space, WOA uses a random whale from the population to update the position of the current whale in the exploration phase. If − → A is greater than 1, then the current whale is updated according to a random whale from the population. The mathematical model of the search for the prey is as follows: where − → Z ind is a vector including the position of a whale with an index of ind in the current population; this index is randomly selected between 1 and N to enable the WOA to explore other regions within the search space to avoid becoming trapped into local minima. With the significant success achieved by WOA when solving many optimization problems, in this research, it is adapted for the first time in this work to address the DFAP as discussed later.

V. THE PROPOSED APPROACH: DWOA-LS ALGORITHM
This section provides a full explanation and illustration of the proposed DWOA-LS algorithm. The fundamental components of this proposed solution approach are: initialization, fitness evaluation, the DWOA for DFAP, and PALS as a local search approach. Each of these components is described in the following subsections.

A. INITIALIZATION
DFAP is a combinatorial problem, seeking to increase the overlap among the adjacent fragments, while the ultimate goal is to produce a one-contig DNA. To provide an effective solution representation, a proper understanding of this problem is essential. The set of fragments is enumerated from 1 to N , where N is the maximum number of fragments. Then, a possible solution to DFAP is to rearrange the set of numbers from 1 to N . The identification of the optimal order of the fragments requires an examination of the permutations of the numbers assigned to those fragments. Now, to solve this DFAP by using the proposed DWOA-LS we assume that each whale carries a solution to the problem. We randomly initialize the population of M whales, while each whale is described by a position vector of size N containing a permutation of numbers assigned to the fragments, which represent a solution to DFAP. Each position in the whale position vector should have a different fragment number.

B. OBJECTIVE FUNCTION
An objective function plays an essential role in DWOA-LS to reach the best solution for a given problem. In this case, the objective function calculates the fitness value of each whale in the population. The whale that has the best fitness value is identified as the best solution. In DFAP, the main objective is getting the optimal arrangement of the fragments, which achieves the highest overlap score and reduces the number of contigs. In the proposed DWOA, the evaluation is based on the overlap score only, and the minimization of the number of contigs is considered as an objective for the two versions of the PALS2-many method. Therefore, to calculate the overlap score of each whale estimated by the DWOA to find the nearest one to the optimal solution: where F( − → Z (t)) represents the fitness values of the current whale − → Z (t), and w(f d , f d+1 ) the overlap score between any two consecutive fragments. In each possible order estimated by an algorithm, this equation is used to estimate its quality by summing the count of the similar consecutive letters at the end of the first fragment f d with the letters at the beginning of the second one f ( d + 1), and the order with the highest overlap score is considered the best. For example, Fig. 1 includes three fragments, which need to be arranged to return the original DNA sequence. Therefore, the sum of the count of the similar letters between each two consecutive fragments is calculated, and the order that fulfills the highest score is considered the best as depicted in this figure. As identified previously, the overlap score has been calculated using semi-global alignment, implemented by a dynamic programming approach.

C. THE DISCRETE WHALE OPTIMIZATION ALGORITHM (DWOA)
The standard WOA was designed to address the continuous-search space problems and cannot, therefore, be used with the discrete-search space of the DFAP. In the literature, authors suggest converting continuous optimization-based algorithms to discrete ones by using mapping methods, such as the smallest position value (SPV) or the LPV. SPV arranges the continuous positions of the whale in ascending order so that the smallest position value is mapped to 1; the second smallest position value is mapped to 2, and so on. Unlike SPV, LPV arranges the continuous positions of the whale in descending order. The largest position value is mapped to 1, the second-largest position value is mapped to 2, and so on. From the experimental results presented here, the mapping of continuous search space into a discrete space isn't an effective way to solve DFAP. The main disadvantages of using the continuous algorithms mapped using LPV and SPV to solve the combinatorial problems are as follows: • In combinatorial problems, generally, updating all the positions of the vector together may deteriorate the quality of the solution because it may need a small change to reach the optimal.
• SPV arranges the continuous values and the smallest one is given a value of 1, the second smallest a value of 2, and so on. However, if there are two continuous values are equal, then one will take a random value and subsequently this will convert the algorithm to a random search if a significant number of the continuous values are equal. Therefore, a new trend is to redesign the standard algorithm to deal with combinatorial problems. Examples include: in [35], the crow search algorithm was redesigned with the ordered crossover operation, which was more suitable for DFAP; and Faris et al. [36] converted the addition operation in the salp swarm algorithm into a crossover operation providing improved results compared to selected standard algorithms. An essential stage in DWOA is the adaptation of Eqs. (1), (6), and (8) to be able to address the solution of discrete problems. In this work, the WOA is adapted using the following operators: • The swap-based best position operator mimics the action of encircling prey.
• The ordered crossover operator mimics the bubble-net attacking methods. VOLUME 8, 2020 • random positions are used to search for prey. The adaptation of the previous three operations to simulate the actions performed by the whales is illustrated in detail in the next subsections.

D. ENCIRCLING PREY (EXPLOITATION PHASE)
In this phase, the whales encircle the prey as in Eq. (1). However, Eq. (1) is used for continuous problems only. Therefore, the swap-based best-position operator is used to imitate this action for the DFAP that is, to enable the current whale to update its position towards the best position or the optimum prey − → Z * (t). To update the position of the whale − → Z (t + 1), the following steps are executed: • selecting a random position i from the best whale

E. SPIRAL BUBBLE-NET FEEDING METHOD (EXPLOITATION PHASE)
In this phase, the whales swim up towards the prey in a spiral shape as in Eq. (6). But to fit this for discrete problem, we have used an ordered crossover operator to simulate this action for DFAP. The ordered crossover operator is applied to the current whale − → Z (t) to modify its positions towards the best whale − → Z * (t) found so far. To update the position of the whale − → Z (t + 1), we follow these steps: • Generate two random positions i and j within the current whale − → Z (t), such that i < j.
• Copy all the values between the two positions i and j from the best whale • Remove the previous values copied to − → Z (t +1) from the current whale and copy the remaining values in the order they appear from − → Z (t) to − → Z (t + 1) to fill the positions before i then the positions after j. Fig. 4 describes the ordered crossover (OC) operation between the best whale and the current one. We replace the original equation (Eq. (6)) with the following equation 11: In the standard WOA, the current whale − → Z (t) moves towards a random whale selected from the population to search for the prey. However, if we apply this procedure for DFAP, the variation in population will be reduced. Therefore, to increase the diversity of the population, the current whale − → Z (t) is updated and randomly generated from the solution area of the problem to explore more promising solutions that were not discovered before. Based on the hunting behavior of the WOA, there is a probability of 50% of selecting between a spiral model and a shrinking encircling prey to update the position of the whale.

G. INTEGRATION OF DWOA WITH LOCAL SEARCH
In this section, we explain how to integrate DWOA with a heuristic method to boost its performance and improve the quality of the solutions. We used two variations of the PALS [38]. At first, we review the previous versions of PALS: • The original PALS iteratively ameliorated a random solution by producing its neighborhood solutions. The neighborhood solution (NS) is generated by a movement that reverses the sub-permutation between two positions in the solution. Then, the algorithm calculates the variation in the overlap ( p) and the number of contigs ( NC), between the current solution and the NS for only the affected fragments. PALS tries all the possible movements and stores them in a list, and then selects a single movement that reduces or maintains the number of contigs while not decreasing the overlap. When several movements have the same minimum NC, PALS chooses the NS with the maximum p. The drawbacks of PALS is that a particular NS may appear again through iterations and the calculations may be redone, which is time-consuming.
• In PALS2-many [37], the algorithm selects the movement that reduces NC, but in the case of having several movements with the same minimum NC, PALS2-many selects the NS with the lowest p.
To speed up the algorithm, many non-conflicting movements are selected from the list, so that the algorithm reduces the number of calculations required. PALS2many can produce a sub-optimal solution with minimum NC, but it can't reach the optimal overlap in large instances.
• PALS2M*Fit [35] is concerned with the movements that increase p. It can obtain the optimal overlap, but the number of contigs is large, especially for large instances, which conflicts with the ultimate objectives of achieving a single-contig DNA with optimal overlap. The disadvantages of PALS2-Many are summarized as follows: 1) adding the movements that minimize the number of contigs to the list even if they minimize the overlap score; and 2) adding to the list the movements that preserve the number of contigs with maximizing the overlap score. It should be noted that there is a case that isn't taken into consideration by PALS2-Many and PALS2M*Fit, in which the movements that will only maximize the overlap score by preserving or reducing the number of contigs. If those movements will minimize the number of contigs only, they must be discarded because one objective will be achieved, but not the other. Therefore, to tackle this issue in this work, an improvement is proposed on the PALS2-many called PALS2-many-based fitness and contig (PMFC). The PMFC strategy is applied to list the movements that increase the overlap p while reducing or keeping NC. This improvement seeks to find the order that not only maximizes the overlap score among the fragments but also minimizes, or at least preserves, the number of contigs among them. This will help in reaching the near-optimal order of the fragments that reconstruct correctly the original DNA sequence.
The PALS approach tries to achieve the best contig and the PALS2M*Fit tries to attain the optimal overlap, neither of them attempt to achieve both objectives together. In this work, DWOA is incorporated with PALS2-many to exploit the search capability of DWOA and the improvement VOLUME 8, 2020 performed by PALS2-many to obtain better solutions. For a predetermined number of iterations (subIter), PALS2-many is applied to the new candidate whale − → Z (t + 1) after the crossover operation with an objective of minimizing the number of contigs in the hope of finding one contig.For the remaining iterations, the PMFC strategy is applied to maximize the overlap score by preserving or minimizing the number of contigs. The switch between PALS2-many and PMFC enables DWOA of reaching the optimal order of fragments. Both PALS2-many and PMFC is performed with a local search probability smaller than R, where R is a random value in the range [0, 1]. Fig. 5 shows the flowchart of the proposed DWOA integrated with PALS2-many and PMFC as local search methods (DWOA-LS). Update WOA parameters 8: if p < 0.5 then 9: if |A| < 1 then 10: Update − → Z (t + 1) using swap best position operation; 11: else 12: Update − → Z (t + 1) using random position; 13: end if 14: else 15: Update − → Z (t + 1) using the modified ordered crossover operation; 16: Generate R ∈ [0, 1]; 17: if R < LSP then 18: if t < subIter then Apply PALS2-many to − → Z (t + 1); 19: else 20: Apply PMFC to − → Z (t + 1); 21: end if 22: end if 23: end if 24: end for 25: Check the feasibility of the whale − → Z (t + 1); 26: Update − → Z (t + 1) in the population, if better; 27: Update the best whale − → Z * with − → Z (t + 1) if better; 28: t ← t + 1; 29: end while 30: return the best whale − → Z * Algorithm 1 illustrates the steps of solving DFAP using DWOA-LS improved with the PALS2-many and PMFC. The first step, the population is initialized randomly. Then, the fitness value for each whale inside the population is calculated, and the whale that has the highest fitness value is indicated as the best whale and stored in − → Z * . In the next step, from line 4 to line 25 the whales update their positions using the swap-based best position operation, the ordered crossover operation, and the random whale through a number of iterations. In line 15 to line 23, the current whale is updated using the ordered crossover operation. Then PALS2many and PMFC are applied to the current whale after the crossover operation with a local search probability (LSP) discussed in the parameter settings section. PALS2-many is applied for a specified number of iterations and PMFC is applied for the remaining iterations. The current whale is updated in the population if better. The best whale is updated through iterations. The algorithm satisfies a number of iterations. On completion, the algorithm returns the best obtained solution.
More illustration, in DWOA-LS, the DWOA strives to optimize the overlap score regarding the main objective that needs to be optimized for solving the DFAP as used in most of the papers in the literature, while the LS (PALS2-many and PMFC) is employed to optimize the number of contigs. In brief, DWOA will work to optimize the overlap score as the main objective, while LS strives to minimize the number of contigs with an overlap score higher than obtained by the DWOA within the first slice of the iteration that is smaller than subIter. While, within the rest of the iterations, LS based on PMFC seeks to maximize the overlap score by preserving, or reducing the number of contigs. Therefore, the DWOA will work on maximizing the overlap score, while the LS work also on maximizing the overlaps score with a constraint to ignore the solutions that will increase the number of contigs although the overlap score is maximized.
In Eq. 12, the time complexity of the proposed algorithm relies on the cost of the objective function and the number of fragments. The time complexity of the local search strategy is about O(N 2 ) for one iteration as described in the pseudo-code of the PALS2-many. Since PALS2-many is applied with a probability with our proposed approach, the number of times where this method is executed is not known. Therefore, in the worst case, assuming that this method is applied in all iterations, the time complexity of the proposed is estimated as follows: Since the PALS has a higher growth rate in terms of time complexity of, the time complexity of the proposed algorithm in the worst case is O(t maxIter N 2 M ), which is quite significant. Therefore, time complexity is one of the main limitations of our proposed approach that needs to be improved in future work.

VI. EXPERIMENTS AND DISCUSSION
Several experiments have been conducted to assess the efficacy of our proposed DWOA-LS algorithm. Thirty benchmark instances are chosen for testing the DWOA-LS effectiveness. We perform all the experiments on a device equipped with Windows 7 ultimate platform with a 64-bit operating system, Intel Core i3-2330M CPU @ 2.20 GHz, and 1 GB of RAM. DWOA-LS is implemented using the Java programming language. Statistical analyses are also introduced to validate the results. This experimental section is designed as follows. Subsection VI-A describes the DFAP benchmark instances used in the experiments. Subsection VI-B describes the parameter setting of DWOA-LS. Section VI-C evaluates the performance of the proposed DWOA-LS. Section VI-D compares DWOA-LS with the best three recent assemblers (based on our knowledge) suggested for solving DFAP. Section VI-E compares the proposed DWOA-LS with some others assemblers. Finally, section IV-F summarizes the conclusion of our experiments.

A. DESCRIPTION OF THE BENCHMARK INSTANCES
We examine the performance of DWOA-LS on three benchmark collections taken from [67]: GenFrag consisting of ten instances; DNAgen containing six instances, and f-series containing fourteen instances. Table 1 presents a description of the thirty instances in terms of coverage, average fragment length (AFL), number of fragments (NF), and the original sequence length (OSL). Here, the coverage is the summation of the bases found in all fragments divided by the total length of the original DNA sequence [35], which can be calculated by using Equation 14.
Coverage = NF j=1 length of fragmentj length of target fragment (14) The coverage value has to be greater than 1 to ensure that there is an overlap between the fragments to be used in the reassembling process. AFL ranges from 182 to 1003 bases; NF ranges from 25 to 1577 fragments. For simplicity, we provide an abbreviation for each instance which is used in the remainder of the paper.

B. PARAMETER SETTINGS
Parameter setting may affect the performance of the algorithm. So, several experiments were performed to detect the best values for the parameters. Six instances: M15421(5), M15421(6), M15421(7), J02459(7), BX4, and BX7 are used for tuning the population size (M ), subIter, and LSP, with their results are introduced in Tables 2, 3 and 4, respectively. Considering the population size, different values are considered such as 5, 10, 20, and 30 on different benchmark instances. Table 2 shows that the population size 30 is better because using this population value enables the proposed algorithm to reach the optimal value in fewer iterations for six instances. The population size of 5 is the worst.
Considering subIter, to test the efficiency of the proposed algorithm, several experiments are conducted by considering subIter = 20, 50, 70, 100, 120 and 500, with the results presented in Table 3. Regarding subIter, at the outset, a value of 20 was selected randomly. With a cutoff value between the compared fragments equal to 50, the number of contigs was 8 for BX4 and 2 for the BX7, while the fitness values were 227682 and 444839 for those two instances, respectively. For a value of 50, the number of contigs didn't change, but the fitness values for BX4 and BX7 become 227878 and 445039, respectively. Because changes in the overlap scores were quite significant, another value of 70 was selected and changes in the overlap score were observed. Consequently, three other values of 100, 120, and 500 were selected and   it was clear that the change in the overlap score (fitness value) is quite significant when subIter is equal to 100 and become nearly constant when subIter is equal to 120 and 500. Consequently, the best value for subIter is 100. Note that, in the remaining experiments, the cutoff value is reset to 10. Table 5 presents the values of the DWOA-LS parameters and the other algorithms parameters used in the conducted experiments.
Regarding LSP, at the start, a value of 0.2 for LSP was selected randomly, and then the number of iterations used under this value was observed until reaching the optimal solution forthe six instances mentioned previously. As a result of observation, it is notified that the proposed algorithm needs a high number of iterations to reach the optimal value for those instances. Therefore, another value of 0.3 was used to determine the influence of this parameter on the performance of the algorithm; using this value the optimal solution was reached in fewer iterations in comparison to the previously observed value for those instances. For a value of 0.5, the algorithm reached the optimal values for the same instances in fewer iterations compared with the other two values. For a value of 0.7, the proposed algorithms could reach the optimal values for those instances in fewer iterations compared with the others. For a value of 1.0, the proposed approach reached the optimal values in a number of iterations similar to 0.7. Therefore, within our experiments, a value of 0.7 was used instead of 1.0 to avoid the time complexity problem. Regarding the compared algorithms, five continuous WOA and DE variants and the sine-cosine algorithms mapped using LPV were compared with DWOA under the same number of iterations and population size assigned in Table 5: • Improved Lévy flight whale optimization (LWOA) [68]. • Chaotic-based whale optimization algorithm (CWOA) [69].
• Sine-cosine optimization algorithm (SCA) [71]. Because those algorithms were proposed for tackling continuous optimization problems rather than the discrete nature of DFAP, the parameters of those algorithms were tuned to determine the optimal relevant values for solving this problem. The standard WOA doesn't need any tuning for their parameters with the exception of parameter b that controls in the spiral shape. To adjust this parameter for the different variants of WOA, several values, involving 0, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 8, and 10, are randomly selected and experimented within 30 independent runs to determine the best value for this parameter. Based on our experimental results that is depicted in Fig.6(a), the best value for b is 5.
The performance of the differential evolution is based on two factors: scaling factor (F) that is here adapted as mentioned in [70] and the crossover rate (CR), so different experiments were separately performed to extract the best value of this parameter for three different variants of DE: DE based on the ''DE/rand/1'' scheme, DE based on the ''DE/current to best/1'' scheme, and the hybridized WOA and DE (DEWOA). For the DE based on the ''DE/current to best/1'' and ''DE/rand/1'', it is obvious for the outcomes depicted in Fig.6(b) that 0.02 is the best for CR. Similarly, for DEWOA as depicted in Fig.6(c), the best value for CR for DEWOA is 0.9. Finally, SCA is a self-adaptive algorithm that has one parameter, control in the distance, which was set as declared in the cited paper.

C. PERFORMANCE OF DIFFERENT WOA AND DE VARIANTS
The purpose of this section is to assess the performance of the proposed algorithm: DWOA-LS through two main experiments: • The first experiment compares the WOA integrated with LPV technique (WOA-LPV) and DWOA without using the local search method.
• The second observes the performance of different improved WOA and DE variants.
• The third experiment studies the effect of adding local search to the proposed algorithm by comparing DWOA and the proposed algorithm DWOA-LS.

1) THE COMPARISON BETWEEN DWOA AND WOA-LPV
This first experiment investigates the performance of two algorithms. The first algorithm is the standard WOA that uses the LPV technique (WOA-LPV). The second algorithm is DWOA without the local search method. This comparison is made to prove that the traditional mapping of continuous values to adapt WOA (WOA-LPV) for solving DFAP isn't effective. Table 6 presents the results obtained by the two algorithms based on the fitness function that uses the overlap score. The column opt shows the optimal overlap score for each benchmark instance obtained by the LinâĆŃ Kernighan heuristic (LKH) algorithm [72]. The best, average, and worst overlap score values are recorded in the table for running each algorithm 30 runs. By observing the results, DWOA attains much better results compared to WOA-LPV. For example, DWOA achieves higher overlap values for all the DFAP instances. The convergence of DWOA to the optimal solution is faster than WOA-LPV. This comparison demonstrates why WOA-LPV was not used and why the investigation used a discrete version of WOA (DWOA) that supports some operators from the evolutionary algorithms. Also, Fig. 7 (a) depicts a comparison between DWOA and WOA-LPV in terms of the percentage deviation (PD). PD shows the percentage of the difference between the average fitness value found by an algorithm and the optimal fitness value divided by the optimal fitness value. We can calculate PD as: Fig. 7 (a) shows that the DWOA is closer to the optimal solution than the standard version mapped using LPV. In LPV, there is no one-to-one mapping between the continuous solution and the permutation one because the permutation solution can be encoded by an infinite number of the continuous  numerical vectors. Because this disadvantage was solved in DWOA, it performs better than WOA-LPV.
However, the fitness values using the overlap score alone can't be used to assess the quality of the algorithm, because the algorithm may achieve high fitness values, but the obtained order of fragments contains a large number of contigs. Therefore, we compare the two algorithms based on the number of contigs attained from each algorithm, as recorded in Table 7. It can be observed that the results of DWOA outperform WOA-LPV based on the number of contigs for all instances. Based on these results, DWOA is better than WOA-LPV for solving the DFAP. DWOA outperform WOA-LPV in terms of the number of contigs and the overlap score. Fig. 8 (a) shows a comparison between DWOA and WOA-LPV based on the average number of contigs for all instances. For the best case of the algorithm, we can see that DWOA obtains in average 54.1contigs for all the instances, whereas WOA-LPV achieves 172.6 contigs. As can be seen from the Fig. 8 (a), the DWOA achieves the minimum number of contigs compared to WOA-LPV for the best, average, and worst cases.

2) COMPARISON OF DIFFERENT WOA AND DE VARIANTS
After completing the comparison between the DWOA and WOA-LPV, in this section, different robust WOA and DE variants, in addition to the SCA are compared with DWOA to prove its efficacy over the other recent improved variants on the instances from X4 to BX7. In the start, each variant is executed for 30 independent runs and the obtained outcomes in average are recorded in Table 8. Afterwards, those outcomes were observed to see the performance of DWOA; this observation shows that the DWOA could be superior to the other algorithms due to replacing the mapping phase by some genetic operators to get rid of the infinite number of the updated solutions that could represent the same permutation. In Fig. 9, the average of the outcomes recorded in Table 8 for each algorithm is graphically depicted to show more clear the superiority of DWOA. This figure shows that DWOA outperforms the other algorithms with a value of 55923, while Lshade-LPV performs worst with an amount of 9719. Additionally, Fig. 10 measures the distribution of the outcomes based on five metrics: minimum, first quartile (Q2), median, third quartile (Q3), and maximum for the instances X4, X5, X6, X7, and M5. Again, this figure shows the superiority of the proposed DWOA for the five observed instances over the five metrics.
Then, the CPU time required by each algorithm is computed and recorded in Fig.11 which shows the increase in   Finally, the Wilcoxon rank sum test [74] at a confidence level of 5% is used to show the significance of the DWOA over the other techniques. This test is based on two hypotheses: Null and alternative. This test assumes that there is no difference between the outcomes of a pair of the algorithms in the Null hypothesis case, in which the p-value is greater than the significant level (0.05) and h=0. Alternatively, it assumes that there is a difference between the two outcomes, in which case the p-value is less than the significant level (0.05) and h=1. Table 9 compares DWOA-LS with the other algorithms on the instances X4-BX7, showing that the alternative hypothesis is accepted with all the algorithms over all the instances and this confirms the significance of the DWOA over those algorithms.

3) COMPARISON OF DWOA AND DWOA-LS
The third experiment is conducted to study the effect of integrating the local search method with the proposed algorithm. We use the overlap score as a fitness function to evaluate the comparison between DWOA without Local Search (DWOA), and DWOA combined with the Local Search (DWOA-LS). Table 10 presented the results of the two algorithms for the best, average, and worst cases. Based on the results introduced, the DWOA-LS assembler finds the optimal solution for 20 out of 30 instances and can reach one contig for 22 out of 30 DFAP instances. From this analysis, the proposed algorithm DWOA-LS outperforms DWOA for all the DFAP instances. DWOA-LS obtained substantially better results for the medium and large DFAP cases, as opposed to the disappointing performance of DWOA. This superiority in the performance of DWOA-LS over DWOA is a result of the PMFC that enables DWOA to escape the local minima in which it may fall during the optimization process as a result of reducing the diversity among the members of the population. Subsequently, the possibility of reaching to other permutations that may improve the quality of the solutions is substantially reduced. In addition, this LS accelerates convergence toward the best-so-far solution because, applying it after the OC, generates the updated solutions based on the best-so-far position and the current one and this may increase convergence toward the optimal solution. Additionally, Fig. 7 (b) shows the percentage deviation between the two algorithms, VOLUME 8, 2020 from which it can be observed that DWOA-LS has the lowest percentage deviation for all DFAP instances.
In addition to the comparison of the overlap between DWOA-LS and DWOA, Table 11 provides a comparison based on the number of contigs. The proposed DWOA-LS outperform DWOA in all instances. From Fig. 8 (b), we can see significant differences between the two algorithms in the average number of contigs for all DFAP instances. The average number of contigs in the best case for DWOA-LS is 1.433 contigs and for the DWOA it is 54.17 contigs. Furthermore, the average number of contigs in the worst case is 1.533 contigs and 68.5 for DWOA. It is obvious that using the local search affects significantly the obtained number of contigs and the overlap score. Broadly speaking, in the first subIter of iterations, the proposed approaches uses the LS, PALS2-many, to minimize the number of contigs even if the overlap score will be minimized. Then, after ending this number of iterations, another LS, abbreviated as PMFC, replaces PALS2-many with the objective of maximizing the fitness value (overlap score) while preserving or minimizing the current number of contigs. The LS within the optimization process therefore plays a double role: the first is to arbitrarily minimize the number of contigs within the first subIter of the optimization process, while the second plays a significant role in improving the overlap score while preserving or minimizing the number of contigs.
From the experiments conducted in this section, the following conclusions can be drawn: • Simulating the behaviors of the WOA by borrowing some genetic operators can significantly improve its performance as a result of utilizing effectively the whole optimization process and the individuals within the population by erasing the problems of the traditional mapping methods that may generate the same permutation by different real-value positions.
• The experiments shows the superiority of DWOA over the recent robust algorithms mapped using the LPV.
• Then, to accelerate convergence, two-phase-based LS is integrated with the DWOA: in the first phase, an LS known as PALS2-many is applied within the first subIter of iterations focused only on minimizing the number of contigs, while the second phase is applied after subIter iterations to preserve or minimize the current number of contigs while maximizing the overlap score.
• The experiments show that PALS2-many and PMFC enhance the performance of DWOA.
• Applying PALS2-many in the first specified number of iterations concentrating on achieving a one-contig solution (in addition to applying PMFC within the remaining iterations focusing on the maximum overlap while preserving or decreasing the number of contig) assists in the production of more promising solutions that can attain the optimal overlap with a one-contig solution in most instances.

D. COMPARISON BETWEEN THE PROPOSED ASSEMBLER AND THREE RECENT ASSEMBLERS
This section is concerned with investigating the superiority of DWOA-LS over other existing assemblers. The experiments 222160 VOLUME 8, 2020 evaluate the performance of our proposed assemblers with the other assemblers based on three performance measures: 1) The fitness function using the overlap score.
2) The number of contigs.
3) An evaluation function called F + C.
The third experiment compares DWOA-LS and other three assemblers based on their outcomes in the published papers, including CSA-P2M*Fit [35], P2M*Fit [35], and GA-P2M*Fit [35]. The fitness function depending on the best overlap score and the minimum number of contigs (NC) are used as two main performance measures in this experiment. Table 12 introduces the results of the four algorithms. Although the two assemblers DWOA-LS and CSA-P2M*Fit find the optimal fitness values for 20 out of 30 instances, DWOA-LS outperforms CSA-P2M*Fit. DWOA-LS obtains a one-contig solution that has the optimal overlap score in 17 DFAP datasets, but CSA-P2M*Fit and GA-P2M*Fit achieve this solution in only four DFAP datasets. In contrast to DWOA-LS in the other assemblers, when the DFAP becomes larger, the number of contigs becomes disastrous. The maximum number of contigs is seven contigs for DWOA-LS, while the maximum number of contigs for the other assemblers is 788 for CSA-P2M*Fit and GA-P2M*Fit and 810 for P2M*Fit.
Graphically, Fig. 12 compares the different assemblers based on the PD values to determine how close each algorithm is to the optimal overlap score. As can be seen from the Fig. 12, the overlap score of the CSA-P2M*Fit is slightly higher in some instances. However, the superiority of the CSA-P2M*Fit in the overlap score does not mean that it is better because in some cases, a solution that has a better fitness value may generate a larger number of contigs. Therefore, the difference between our algorithm and the CSA-P2M*Fit in the overlap score doesn't provide a useful evaluation of performance for solving DFAP. In this paper, another evaluation function is proposed to evaluate the performance of the assemblers illustrated below.
As mentioned earlier, the fitness using the overlap score alone cannot be used to judge the quality of the assemblers because, in some situations, a solution that has a better fitness VOLUME 8, 2020  using overlap score can have a large number of contigs. Therefore, the judgment has to be based on two factors: 1) The primary factor is minimizing the number of contigs with the target of reaching one contig. 2) The second factor is maximizing the overlap score. As a result, a new fitness function is proposed to evaluate the performance of the assemblers based on the previous two factors. This fitness function is called (F+C), which is calculated according to the following formula: where avg_fitness is the average obtained overlap score for a given algorithm, and opt is the best known overlap score. NF and NC represent the number of fragments and contigs, respectively. Based on the results introduced in Table 13, we can see the superiority of the DWOA-LS for all instances for F+C values. Our algorithm contributes significantly to reduce the number of contigs and to increase the overlap score among the fragments compared with the other assemblers. DWOA-LS can reach the ultimate objective that contains a one-contig solution with the optimal overlap score in 13 datasets. For the remaining datasets, DWOA-LS is too close to obtain 2. CSA-P2M*Fit and GA-P2M*Fit get the optimal solution in only three datasets, as shown in Table 13.
To measure the CPU time for DWOA-LS, CSA-P2M*Fit, and GA-P2M*Fit, the latter two algorithms were implemented to make a fair comparison between the CPU time consumed by each. The investigation of CPU time was divided into two experiments. The first computed the CPU time for the first benchmark until reaching the optimal value for each instance. After running each assembler (DWOA, GA-P2M*Fit, and CSA-P2M*Fit) for 30 independent runs, the average was calculated of the computational time needed for each of the datasets from X4 to BX7. As can be seen in Fig. 13a, the proposed assembler is faster than CSA-P2M*Fit and GA-P2M*Fit in reaching the optimal solution for those datasets.
The second experiment computed CPU time for the other datasets, which includes some instances with an optimal solutions that the algorithms couldn't reach, which investigates the CPU time consumed by each one until the end of the optimization process. After running each algorithm 30 independent on those datasets, the average CPU time is introduced in Fig. 13b, which shows that CSA-P2M*Fit is faster than the proposed algorithm and GA-P2M*Fit. However, the proposed algorithm is very close to the CPU time consumed by CSA-P2M*Fit, so they are competitive in terms of the CPU time.
To complete this experiment a statistical test known as the statistical ranking color scheme (SRCS) [75] is used to compare the algorithms: DWOA-LS, CSA-P2M*Fit, GA-P2M*Fit and P2M*Fit. SRCS sets all the algorithms to an initial value of 0. Then, the Krusskal-Wallis test is employed for detecting whether there are any differences between the algorithms. If there is no difference, the algorithms terminate in their initial value 0. If there is a significant difference among the algorithms, the Mann-Whitney-Wilcoxon + Holm test is applied for each possible pair of the algorithms, and the ranking value of the algorithm with the highest performance is incremented by 1, and the other is decreased by 1. If there are no differences, then the ranking value is preserved. Hence, the top-ranked algorithm has the highest performance. Here, this test is applied on the proposed algorithm and another three assemblers to illustrate the superiority of our proposed algorithm.  Table 14 shows the ranking value for four different assemblers according to the average fitness value-based the average overlap and the average number of contigs obtained by each algorithm for each instance. Based on the ranking values introduced, our proposed algorithm outperforms all other algorithms based on the number of contigs in all instances. If the algorithm attains a value 0, it means that all the algorithms obtain the same results. If the algorithm achieves any value of (1, 2, 3), it means that the algorithm outperforms one, or two, or three algorithms, respectively. If the algorithm obtains any value of (−1, −2, −3), it means that there are one, or two, or three algorithms that precede this algorithm, respectively. DWOA-LS achieves a value of 3 in most of the instances for the number of contigs. Also, for the fitness value, DWOA-LS is equivalent to the other algorithms or outperforms one or more of three algorithms. CSA-P2M*fit outperforms our proposed algorithm in 8 instances based on the average fitness value and so is a little higher in fitness value. There is, however, a clear difference in the number of contigs between the two algorithms as the proposed algorithm produces an output that is closer to the optimal order of the fragments.

E. COMPARISON OF THE PROPOSED ASSEMBLER AND OTHER ASSEMBLERS
The fourth experiment compared the proposed assembler with other selected state-of-the-art assemblers: 1) Transposition restarting and recentering genetic algorithm with island model (Trans. RRGA+IM) [50]; 2) Problem aware local search (PALS) [53]; 3) Parallel hybrid particle swarm optimization and deferential evolution (PPSO+DE) [56]; 4) Firefly algorithm (FF) [59]; 5) Genetic algorithm (GA) [64] 6) Queen-bee evaluation based on genetic algorithm (QEGA) [76]; and 7) Simulated annealing (SA) [76]. The comparison is based on the best of the overlap scores obtained on the first two DFAP collections (GenFrag and DNAgen). Table 16 records the results of the nine algorithms, from which it can be seen that DWOA-LS outperforms all other algorithms in all instances by obtaining the optimal overlap score values in eleven DFAP instances. SA is second bestwith three datasets. From the last columns at Table 16, which presents a comparison of the total average overlap among the algorithms over the first DFAP collections (GenFrag and DNAgen). The Average of the second column represents the average of all the optimal values recorded for these two collections, which is 128328. DWOA-LS performs best with a value of 128318, which is very close to 128328. This experiment shows that our proposed algorithm is robust and successful in tackling DFAP. Trans. RRGA+IM performs second best with a value of 127875. GA performs worst with a value of 119176.

F. SUMMARY OF OUR EXPERIMENTS
From the previous experiments, the proposed algorithm DWOA-LS has been shown to be an effective assembler for tackling DFAP compared to other existing assemblers. The proposed DWOA-LS is capable of obtaining the minimum VOLUME 8, 2020 number of contigs while increasing the overlap score. Also, DWOA-LS is proved to be more robust for solving DFAP as it performs well for medium-and large-scale instances. A new evaluation function has been proposed to measure the performance of the different assemblers based on achieving a one-contig solution and attaining a high overlap score. This function can be useful in situations when an algorithm gets a higher overlap, but the number of contigs is large. So, the best algorithm balances the two objectives.

VII. CONCLUSION AND FUTURE WORK
In this paper, a WOA was adapted to solve a discrete fragment assembly problem (DFAP). To fit this WOA for discrete problems (DWOA), the swap-based best-position mutation operator was used to simulate the action of encircling the prey to move the whale around prey within a shrinking circle. The ordered crossover operators were employed to simulate the spiral shape, where DWOA selects a random block of positions from the prey and his block is copied to the same locations in the current whale. Finally, to search for the prey, the whale positions were generated randomly from the fragment numbers instead of using a random whale to prevent the reduction of the variation in the population. A local search approach called PALS2-many was also employed with the proposed DWOA in a version abbreviated as DWOA-LS for a better order of fragments. The local search helps DWOA to minimize the number of contigs, in addition to maximizing the overlap score among the fragments. We propose a new evaluation function F+C to assess the quality of different assemblers. DWOA-LS was validated on 30 benchmark instances and compared with a number of the robust recent state-of-the-arts algorithm for the DFAP under two experiments. In the first experiment, DWOA was compared with five WOA and DE variants, in addition to SCA to demonstrate the superiority of DWOA to convert the continuous behaviors of the whale to discrete. Additionally, to show the significance of the DWOA, the Wilcoxon rank sum test was used to show the significance of DWOA over those algorithms. The second experiment was performed to show the superior performance of DWOA-LS over a number of recent robust state-of the arts assemblers suggested for the DFAP. The experimental results and statistical analyses of this experiment show that the DWOA-LS outperforms significantly the different assemblers in terms of the number of contigs, whilst being competitive for the overlap score with CSA-P2M*Fit, and superior to P2M*Fit and GA-P2M*Fit. Finally, DWOA-LS is shown to be the best approach. Despite its superiority, the proposed algorithm did not achieve better overlap scores than CSA-P2M*Fit on some instances, which is a limitation of the proposed approach, in addition to its time complexity.
Future work aims to apply DWOA to other existing problems such as travelling salesman problem, task scheduling, and the knapsack problem. Additionally, a new evaluation function can be considered for tackling DFAP to judge and guide the solutions inside the search space. Parallelization of the proposed algorithm can also achieve better results and to exploit the processing power of new computers.

CONFLICT OF INTEREST
Authors declare that there is no conflict of interest about the research.

FUNDING
This research has no funding source.

ETHICAL APPROVAL
This article does not contain any studies with human participants or animals performed by any of the authors.
MOHAMED ABDEL-BASSET (Senior Member, IEEE) received the B.Sc., M.Sc., and Ph.D. degrees in operations research from the Faculty of Computers and Informatics, Zagazig University, Egypt. He is currently an Associate Professor with the Faculty of Computers and Informatics, Zagazig University. He has published more than 200 papers in international journals and conference proceedings. His current research interests are optimization, operations research, data mining, computational intelligence, applied statistics, decision support systems, robust optimization, engineering optimization, multi-objective optimization, swarm intelligence, evolutionary algorithms, and artificial neural networks. He is working on the application of multiobjective and robust metaheuristic optimization techniques. He is also an/a editor/reviewer in different international journals and conferences.
REDA MOHAMED received the B.Sc. degree from the Department of Computer Science, Faculty of Computers and Informatics, Zagazig University, Egypt. His research interests include robust optimization, multiobjective optimization, swarm intelligence, evolutionary algorithms, and artificial neural networks. He is working on the application of multiobjective and robust metaheuristic optimization techniques in computational intelligence.
KARAM M. SALLAM received the Ph.D. degree in computer science from the University of New South Wales at Canberra, Australian Force Academy, Canberra, Australia, in 2018. He is currently a Lecturer at Zagazig University, Zagazig, Egypt. His current research interests include evolutionary algorithms and optimization, constrained-handling techniques for evolutionary algorithms, operation research, machine learning, deep learning, cybersecurity, and the IoT. He was the winner of the IEEE-CEC2020 Competition. He serves as an organizing committee member for different conferences in the evolutionary computation field and a reviewer for several international journals.
RIPON K. CHAKRABORTTY (Member, IEEE) received the B.Sc. and M.Sc. degrees in industrial and production engineering from the Bangladesh University of Engineering and Technology, in 2013 and 2009, respectively, and the Ph.D. degree in computer science from the Bangladesh University of Engineering and Technology, in 2017. He is currently a Lecturer in system engineering and project management with the School of Engineering and Information Technology, The University of New South Wales (UNSW), Canberra, Australia. He has written two book chapters and over 50 technical journal and conference papers. His research interests include a wide range of topics in operations research, optimization problems, project management, supply chain management, and information systems management.
MICHAEL J. RYAN (Senior Member, IEEE) is currently the Director of the Capability Systems Centre, The University of New South Wales, Canberra. He lectures and regularly consults in a range of subjects, including communications systems, systems engineering, requirements engineering, and project management. He is the author/coauthor of twelve books, three book chapters, and over 250 technical articles and reports. He is a Fellow of the Engineers Australia, a Fellow of the International Council on Systems Engineering, and a Fellow of the Institute of Managers and Leaders. He is the Co-Chair of the Requirements Working Group in the International Council on Systems Engineering (INCOSE). VOLUME 8, 2020