Local Alignment of DNA Sequence Based on Deep Reinforcement Learning

Goal: Over the decades, there have been improvements in the sequence alignment algorithm, with significant advances in various aspects such as complexity and accuracy. However, human-defined algorithms have an explicit limitation in view of developmental completeness. This paper introduces a novel local alignment method to obtain optimal sequence alignment based on reinforcement learning. Methods: There is a DQNalign algorithm that learns and performs sequence alignment through deep reinforcement learning. This paper proposes a DQN x-drop algorithm that performs local alignment without human intervention by combining the x-drop algorithm with this DQNalign algorithm. The proposed algorithm performs local alignment by repeatedly observing the subsequences and selecting the next alignment direction until the x-drop algorithm terminates the DQNalign algorithm. This proposed algorithm has an advantage in view of linear computational complexity compared to conventional local alignment algorithms. Results: This paper compares alignment performance (coverage and identity) and complexity for a fair comparison between the proposed DQN x-drop algorithm and the conventional greedy x-drop algorithm. Firstly, we prove the proposed algorithm's superiority by comparing the two algorithms’ computational complexity through numerical analysis. After that, we tested the alignment performance actual HEV and E.coli sequence datasets. The proposed method shows the comparable identity and coverage performance to the conventional alignment method while having linear complexity for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$X$\end{document} parameter. Conclusions: Through this study, it was possible to confirm the possibility of a new local alignment algorithm that minimizes computational complexity without human intervention.


I. INTRODUCTION
Sequence alignment is one of the popular methods for analyzing the relationship between biological information by finding similarities between various biological sequence data such as DNA, RNA, and protein.In the early stage of sequencing technology, the dynamic programming-based conventional alignment method was sufficient to analyze the sequences with few nucleotides [1], [2].However, with the advancement of NGS technology, biological sequence information is increased to the range of millions to billions of base pairs [3].This advancement in sequence lengths naturally brings the development of sequence alignment methods.
There are many types of the sequence alignment methods depending on the application, such as sequencing sequences with semi-global alignments in NGS, as well as comparing sequences with global, local, and multiple sequence alignments.In the development of methodologies such as global alignment, semi-global alignment, local alignment, and multiple sequence alignment, various heuristic sequence alignment methods were studied to improve the sequence alignment's performance [8]- [16].Although there have been studies on sequence alignment in various directions, it is hard to develop a sequence alignment method with linear complexity and high alignment performance.
Recently, there have been several efforts to solve the problems of conventional alignment methods.The greedy x-drop algorithm is one of the famous algorithms among the local alignment methods.The greedy x-drop algorithm is an alignment method that terminates the expansion process if the difference between current alignment score and the best alignment score is larger than X.Since conventional algorithm has a complexity linearly proportional to the alignment's length, the complexity problem seems to be solved.However, this greedy x-drop algorithm also has a complexity proportional to a square of the given X parameter.To align sequences in case of a large X parameter, the greedy x-drop algorithm needs a massive number of alignment steps proportional to O(X 2 ).Therefore, in this paper, we would like to use the DQNalign algorithm, which aligns a sequence pair with linear-complexity and shows satisfactory performance using the deep reinforcement learning method [18].
Up to now, many attempts have recently been made to apply deep learning into sequence comparison researches.At first, there were methods to analyze the result of sequence alignment to classify the sequences or to predict protein structure or analyze the characteristics of sequences with alignment-free features [30]- [34].These methods use regression or classification methods based on deep learning to predict similarity values resulting from sequence alignment or to predict characteristics of organisms.However, these methods did not directly align sequences and indicated results using the network which is trained from the results of the conventional sequence alignment methods.Moreover, several researchers tried to incorporate deep learning into the sequence alignment process.For instance, there were alignment methods that selected permutation of sequence alignment in progressive multiple sequence alignment [35] or used the entire sequences as the input of the deep neural networks [36].However, the conventional deep learning-based sequence alignment approaches could not find an appropriate protocol performing alignment for sequences with various sizes.That is, deep learning based sequence alignment methods could be applied only to align for very short sequences of tens to hundreds base pairs because of its fixed input size and network size limitation.
In the previous paper, the authors proposed a method called DQNalign that repeats selecting the next alignment direction while sliding a window using a deep reinforcement learning method.Here, the deep reinforcement learning method is one of the deep learning methods that the deep neural network-based agent select optimal actions to get the maximal total reward in a given environment [19]- [21].From video games to healthcare, the deep reinforcement learning method can be adopted in various fields.In a previous paper, the authors proposed state, action, and reward appropriate for the sequence alignment system to fit the sequence alignment system into reinforcement learning protocol.This DQNalign method opened the possibility of a completely linear sequence alignment method.
In this paper, we tried to examine the adaptability of this alignment method called DQNalign to various sequence alignments.The DQNalign algorithm, which performs alignment through window sliding, is similar to the gapped extension algorithm in the conventional local alignment method.Therefore, we tried to combine DQNalign with the local alignment method to compare two long complete genomes.So, we developed a novel local alignment algorithm by combining the x-drop algorithm with the DQNalign algorithm.The proposed DQN x-drop algorithm has the advantage of an utterly linear alignment method.Based on this advantage, the proposed algorithm can perform a broader search with low complexity than the conventional greedy x-drop algorithm.
This paper deals with a new local alignment method based on deep reinforcement learning: In section II, we briefly introduce the conventional greedy x-drop algorithm and discuss how the proposed algorithm differs from the conventional algorithm.In section III, we try to find out how the proposed algorithm exhibits superiority in view of complexity.Therefore, we show how the proposed algorithm can align genome sequences properly with linear complexity, and we compare the alignment performance of the proposed algorithm with that of conventional algorithm to verify and demonstrate the proposed algorithm's superiority in real genome sequences.Finally, we describe the conclusion of this paper and further research subjects in section V.

II. MATERIALS AND METHODS
In this paper, we will propose a novel type of the local alignment by combining DQNalign with x-drop algorithm.The detailed explanation and the code implementation of the proposed DQN x-drop algorithm are available at https://github.com/syjqkrtk/DQNalign.

A. Conventional greedy x-drop algorithm
We will briefly introduce the conventional greedy x-drop algorithm.A greedy x-drop algorithm is a method used in various sequence alignment methods and is a method used to extend the gapped alignment.First, the x-drop algorithm is a method of terminating the alignment when the alignment score falls below the highest score observed so far [6].This algorithm has advantages, in view of reducing computation time and preventing two distant exact matches from being linked [7].
In particular, the greedy x-drop algorithm is a method that combines dynamic programming-based optimal alignment and x-drop algorithm.Alignment scores are calculated based on the Smith-Waterman algorithm, and the alignment ends when the score falls by more than X in each column.So, this method searches for a region proportional to the square of X.Thus, it has a complexity proportional to the square of X, and we cannot use a large number of X in conventional greedy x-drop algorithm.
In this paper, BLAST is used as a representative tool of the greedy x-drop algorithm.BLAST is an algorithm that is considered one of the state-of-the-art tools for searching local alignments [8].We used the parameters of megablast for performance comparison of BLAST.To show the performance difference between the conventional greedy x-drop algorithm and the proposed DQN x-drop algorithm, we changed some of the parameters in BLAST as shown in Table S2.

B. Proposed DQN x-drop algorithm
1) Quick review of DQNalign: Before explaining the proposed DQN x-drop algorithm, we will briefly describe the DQNalign algorithm.DQNalign is an algorithm for aligning sequence pairs using a deep reinforcement learning algorithm [18].The learned Deep Q-network (DQN) observes only parts of predetermined length (window size) of sequences and continuously selects the optimal alignment direction to proceed.A simple conceptual diagram of the DQNalign algorithm is shown in Fig. 1(a).As shown in the figure, the DQNalign method only needs to determine the next direction immediately from the current position.The DQNalign algorithm consistently showed high performance, irrespective of the sequence pair's identity, by learning which path is optimal in the sequence without any human intervention [18].
To enable reinforcement learning in sequence alignment system, we defined state, action, and reward as follows: subsequence pair, alignment direction (forward, deletion, insertion), and alignment scoring system (match, mismatch, gap scores).First, we defined a sub-sequence pair of window sizes as a state in two sequence pairs.And the alignment directions are treated as Moreover, we used the -greedy exploration method to prevent overfitting by randomly experiencing various states and actions during the training.Also, we used the experience buffer which trains the network from the records including state, action, and reward.
Here, we introduce two network structures optimized in the sequence alignment system: DDDQN and faster DDDQN.The detailed DQN structure is depicted in Fig. S1 and Fig. S2.Each network is focused on performance and speed, and in this paper, we used a faster DDDQN structure considering speed to focus the advantage of our algorithm.In detail, DDDQN structure only used the existing methods described above.But,in the case of faster DDDQN, the separable convolutional layer is used which has same the perspective field compared to conventional convolutional layer, but the complexity is greatly reduced.With faster DDDQN structure, we were able to reduce the operations about 1/9 to 1/26 times compared to the DDDQN structure.
2) Proposed meta learning based training procedure: In the early DQNalign, the alignment agent was trained in the environment of virtual sequence sets created by the JC69 model [26].However, in this paper, a model-agnostic meta-learning (MAML) method is used in consideration of the properties of the actual sequence [23].The MAML is a kind of meta-learning approach, which can be applied to all learning models using gradient descent.As shown in Fig. 1(b) and Algorithm 1, in the inner loop, the neural network is trained in an environment considering various SNP and indel probability parameters shown in Table S4.The outer loop then updates the final network by evaluating the multiple networks learned in the inner loop.Finally, we fine-tuned the final network of DQNalign using a part of the real genome sequence DB to optimize the neural network in real genome cases [28], [29].
We set the environment distribution with various SNP, indel probability, and maximum indel length with uniform distribution within the range shown in Table S4.In each step of the outer Copy DQN outer W () into DQN 0 W () 10: (x, y) ← (0, 0) 11: for inner = 0, N − 1 do 12: Move position (x, y) by a given action a inner 17: Get a reward r inner from given action and state 18: Append (s inner , a inner , r loop, one environment is determined from this environment distribution.In the inner loop, two sequences are generated from the environment.Then, the agent is trained by a general DQN process, and the trained network is tested against other sequences created in the same environment.All the state, action and reward are recorded in the episode buffer.Finally, the initial network in Fig. 1 is updated in the outer loop using these episodes buffer.With this meta training process, we can optimize the DQNalign network for various environments.Then, in test scenarios, we finetune this converged network by using a part of the given sequence DB.Through this process, we were able to confirm the improvement of the alignment performance in the HEV sequence DB as shown in Fig. S3.

3) Proposed DQN x-drop algorithm:
In order to perform local alignment using DQNalign, the x-drop algorithm was applied to the DQNalign algorithm.In Algorithm 2, we have described in detail the proposed DQN x-drop algorithm.Here, the DQN W () is a deep neural network that receives two subsequences of a given window size as an input and sends out Algorithm 2: Proposed DQN X-Drop Algorithm.
1: Inputs: S 1 ← Query sequence S 2 ← Subject sequence (x, y) ← Start position of a given candidate seed 2: Initialize: Move position (x, y) by a given action a 9: Append (S if Score > Best then 13: Best ← Score 14: Align ← P ath 15: end if 16: end while 17: return Align Q for , Q ins and Q del , which are expected sum of future rewards in the direction of forward, insertion, and deletion.Then, we select the direction with the highest value among the Q values and proceed the alignment process by repeating this DQNalign procedure.Here, the x-drop algorithm terminates the DQNalign procedure if the difference between current alignment score and the best alignment score is larger than X.Then, we can complete the gapped extension process by performing the proposed DQN x-drop algorithm in the upstream and downstream directions.To compare the proposed algorithm with the conventional algorithm, we use the preprocessing and seeding procedure of the REMiner II method to reduce the candidate seeds with n-hit method [9], as shown in Fig. 1(c).The detailed parameters of REMiner II are given in Table S1.

III. RESULTS AND DISCUSSION
To show the feasibility and performance of the proposed DQN x-drop algorithm, we designed the following three comparisons: 1. Complexity analysis, 2. Performance comparison according to various window sizes, 3. Alignment time comparison according to x-drop parameter and alignment length.Based on these metrics, we will show how the DQNalign method can improve the conventional greedy x-drop algorithm.Also, all of these alignment results are included in Supplementary material S2.

A. Complexity analysis
For complexity analysis of the proposed method, first of all, we tried to derive the relationship between the window size and the x-drop parameter.Then, we used reference papers [18], [27] to analyze the step error probability of the local best path selection method.In [18], the step error probability problem of the DQNalign was solved and summarized as the following equation based on the Gumbel distribution [24], [25].
Here, P e,total is the step error probability to be calculated, and is expressed using the parameters of Gumbel distribution K and λ.In addition, p indel and p SNP means to the probability of occurrence of indels and SNPs according to the model of evolution, and score match , score mismatch , score gap , and score avg mean the match, mismatch, and gap scores of alignment and the average score of the sequence pair, respectively.Moreover, W means the window size.From (1), we can see that this step error probability converges to zero when the window size becomes large enough.
Secondly, we intend to get the expected value of L that can be obtained for a given x-drop parameter in the proposed algorithm.It is assumed that indel has the distribution of the Zipfian distribution expressed as following equation [27].
In the x-drop algorithm of the proposed scheme, since the alignment is unconditionally terminated when there are X/score gap indels, the alignment length L(X) is shorter than the expected length in case that indels larger than X/score gap will occur.Then, for convenience of expression, the formula X/score gap is expressed as X g as follows.

L(X) < E[L(l >= X
(3) Here, (3) can be expressed as following using the fact that where ζ(s) = ∞ n=1 1/n s .Thirdly, using this function L(X), we can express the total error probability, P e as follows.
Here, we express the terms except the window size as constants A and B for the convenience of expression.In case of any k less than 1, consider W that satisfies X s g = e kBW .Then, (5) can be rewritten as Fourthly, we will derive the relationship between window size and X parameter.According to the Lopital's theorem, if the denominator in (6) has a larger dimension than the numerator in (6), P e will converge to zero in case of a sufficiently large window size and X.From (6), we can see that for all constants k less than 1, P e converges to 0 when X s g < e BW is satisfied.In case of infinitely large X and W , it is confirmed that P e converges to 0 when W > s B ln X g is satisfied.Accordingly, in case that P e converges to 0, using W > s B ln X g , we can say that the relationship between the window size and the x-drop parameter is given as follows.
where α > 0. From ( 7), we can see that the window size is proportional to ln X as follows.
Fifthly, using the relationship between the window size and X parameter in (8), we want to get the Big-O notation of computational complexity of the proposed DQN x-drop algorithm.Consider the number of alignment step shown in Fig. 2. From this figure, we can see that the proposed algorithm has a number of alignment steps proportional to L + 2X.However, in the proposed algorithm, the deep neural network operations have O(W ) complexity for each alignment step.Thus, the total complexity is given in the form of O(LW + 2XW ), which is directly proportional to the X parameter.Furthermore, since the window size is proportional to ln X from (8), the proposed algorithm has the complexity of O(LlnX + 2XlnX).On the other hand, from Fig. 2, we can see that the conventional greedy x-drop algorithm has a number of alignment steps proportional to XL + X 2 for the alignment length L and the X parameter.Therefore, the proposed algorithm can be said to have an advantage in view of complexity in the case of a large X parameter by comparing the complexity O(LlnX + 2XlnX) of proposed algorithm with the complexity O(LX + X 2 ) of conventional algorithm.

B. Performance comparison according to various window sizes
To show the performance difference according to various window sizes, we used the HEV sequence set in Table S5.We grouped HEV sequences as Intra-genotype and Inter-genotype to distinguish the performance of same genotypes from that of different genotypes, respectively.In detail, the total number of sequence pairs for all 47 sequences is 1081.Then, Intra-genotype consists of 332 sequence pairs and Inter-genotype consists of 749 sequence pairs.Also, we compared the coverage and identity performance of the conventional local alignment algorithms with those of the proposed DQN x-drop algorithm, as shown in Fig. 3.The scoring parameters and x-drop parameters were fairly set as (1,-1,-2) and 100, respectively.Especially, FASTA algorithm is also compared with proposed DQN x-drop algorithm for HEV sequence dataset.Every other parameter in this simulation is listed in Table S1, Table S2, and Table S3.
As shown in Fig. 3, alignment performance of proposed method for various window sizes of 10, 30, 50, and 100 showed significantly different trends in Intra-genotype and Intergenotype.First, the alignment performance of proposed method for the intra-genotype shown in Fig. 3(a) and Fig. 3(b) showed similar alignment performance compared to the conventional algorithms in case of all window sizes.However, in Intergenotype, the difference between the proposed algorithm and the conventional algorithms began to appear as shown in Fig. 3(c) and Fig. 3(d).In the inter-genotype case, we demonstrated that the proposed DQN x-drop algorithm showed relatively low alignment performance in the case of the small window sizes.However, in case of the large window size, we could observe that the proposed DQN x-drop algorithm's accuracy gradually converges to the conventional local alignment algorithms, even the superiority of proposed DQN x-drop algorithm in view of complexity.Through this simulation, we could see that the performance of the proposed algorithm increases as the window size increases.

C. Alignment time comparison according to x-drop parameter and alignment lengths
Furthermore, we tried to compare the execution time of the conventional greedy x-drop algorithm with that of the proposed DQN x-drop algorithm for two E.coli sequences (Escherichia coli O157 and Escherichia coli K-12) [29].As shown in Fig. 4, we can obtain a 3D graph of alignment time for various x-drop parameters and alignment lengths.Like the complexity analysis results, we confirmed that the alignment time of proposed algorithm is proportional to LW + 2XW .
As can be seen from the results, an increase in the window size generally leads to a longer operation time.In a faster DDDQN structure, we replace the convolutional layer with a separable convolutional layer.Thus, we can reduce the convolutional layer's computational complexity, which is proportional to window size.However, the fully connected layer, which has constant complexity, becomes more extensive than the convolutional layer's operation time in the small window sizes.
In addition, we compared the execution time of the proposed algorithm with that of the conventional algorithm, as shown in Fig. 5.We used the same scoring parameter and x-drop parameter of both methods to compare the two algorithms fairly.As a result, we confirmed that the proposed algorithm operates faster than the conventional algorithm in case of sufficiently large X when the window size is 10 or 100, but when the window size is 1000, the proposed algorithm consumes more execution time than the conventional algorithm, although the proposed algorithm has linear complexity for X.In this case, we confirmed that the execution time results vibrate unstably when the X parameter becomes large.Because the large number of candidate seeds are combined in the extension process as the X parameter increases.
In detail, when the window size is 10, the number of exact matches is decreased because the alignment performance is insufficient, which brings the increase of the total number of the network calculation.Also, the alignment execution time results are similar in case of W=10 and W=100.On the other hand, when the window size is 1000, we can confirm that the complexity increases and the computation time increase rapidly.For these reasons, we consider that 100 was the best window size for E.coli cases, which shows high accuracy and low complexity results as shown in Fig. 5.

IV. CONCLUSION AND FUTURE PROSPECTS
We analyzed the computational complexity of proposed and conventional algorithms theoretically, and we verified through simulation for real E.coli genomes.Further, we confirmed that the alignment time increases linearly in proportion to the X and the alignment length.
Moreover, we showed that the proposed algorithm could obtain similar alignment performance to the conventional algorithm with less complexity.That is verified by the alignment performance of the proposed algorithm for the real HEV genomes.As the window size increased, our method showed high accuracy.Despite the significant improvement in complexity, the proposed DQN x-drop method showed the same level of accuracy as the conventional method.
In addition, we compared the total execution time according to the change of X.Then, we could see that the proposed algorithm has a significant advantage in terms of complexity in the case of large X, even in the actual alignment.
A. At present, we used four small separable convolutional layers because of the complexity limitation of the current deep learning technology.But we expected that the improvement of the network operation speed, layer structure, optimizer and learning strategies can improve both the performance and complexity of the DQNalign methodology.
On the ohter hand, various MSA methodologies have been developed using dynamic programming whose complexity is proportional to the power of the number of sequences and progressive methods whose complexity is proportional to the square of the number of sequences.However, an MSA method with linear complexity has not yet been proposed.In future, we intend to develop a multiple sequence alignment methods using DQNalign.To perform neural network computation for an unspecified number of input sequences, we are willing to use recurrent neural networks used in time series data, and will show a different innovative improvement to solve the multiple sequence alignment problem.

V. SUPPLEMENTARY MATERIALS
The two supplementary materials are included in this submission.The detailed figures, simulation parameters, and the HEV sequences list are included in the Supplementary material S1.Moreover, the alignment results of the simulations are included in the Supplementary material S2.

Fig. 2 .
Fig. 2. Alignment step comparison of two sequence alignment methods

Fig. 3 .
Fig. 3. Histogram of coverage and identity values for HEV sequences: The results of the proposed DQN x-drop algorithm with several window sizes (10, 30, 50, 100) and the conventional local alignment algorithms are compared.As the window size increases, it can be observed that the performance of the proposed algorithm is getting close to that of the conventional local alignment algorithms.

Fig. 4 .
Fig. 4. Alignment time distribution according to various X-drop parameters and alignment lengths.

Fig. 5 .
Fig. 5. Execution time of local alignment algorithms versus various X parameter

Algorithm 1 :
MAML Based Training Procedure.1: Inputs: S SNP ← Set of occurrence probability of SNP S indel ← Set of occurrence probability of indel S maxI ← Set of maximum length of indel 2: Initialize: SNP ∈ S SNP , P indel ∈ S indel , maxI ∈ S maxI 5: Set Env ← (P SNP , P indel , maxI) 6: Generate two random sequence S 1 and S 1 7: Mutate S 1 and S 1 by Env to get S 2 and S 2 8: Initialize a replay memory in inner loop D inner 9: total (p indel K(e λscore gap +