Trimming Gene Deletion Strategies for Growth-Coupled Production in Constraint-Based Metabolic Networks: TrimGdel

When simulating genome-scale metabolite production using constraint-based metabolic networks, it is often necessary to find gene deletion strategies which lead to growth-coupled production, which means that target metabolites are produced when cell growth is maximized. Existing methods are effective when the number of gene deletions is relatively small, but when the number of required gene deletions exceeds approximately 1% of whole genes, the time required for the calculation is often unfeasible. Therefore, a complementing algorithm that is effective even when the required number of gene deletions is approximately 1% to 5% of whole genes would be helpful because the number of deletable genes in a strain is increasing with advances in genetic engineering technology. In this study, the author developed an algorithm, TrimGdel, which first computes a strategy with many gene deletions that results in growth-coupled production and then gradually reduces the number of gene deletions while ensuring the original production rate and growth rate. The results of the computer experiments showed that TrimGdel can calculate stoichiometrically feasible gene deletion strategies, especially those whose sizes are 1 to 5% of whole genes, which lead to growth-coupled production of many target metabolites, which include useful vitamins such as biotin and pantothenate, for which existing methods could not.


INTRODUCTION
C OMPUTATIONAL strain design plays an important role in the production of substances through microbial metabolism [1], [2], [3], [4], [5], [6], [7], [8]. Such strain designs are often evaluated through simulations with constraint-based models, which can perform genome-scale simulations efficiently by restricting the analysis to steady states [9]. In constraint-based models, (1) the sum of incoming reaction rates is equal to the sum of outgoing reaction rates for each metabolite, (2) the ratio of the produced and consumed metabolites in each reaction satisfies the ratio in the chemical reaction equation, and (3) each reaction rate is given upper and lower bounds.
Each constraint-based model includes a virtual reaction that represents cell growth. The cell growth reaction was designed to match the results of the biological experiments. The most standard simulation in constraintbased models maximizes the cell growth rate (GR) because strains with high growth rates are more likely to remain in the culture during passaging. In contrast, the reaction that produces the desired metabolite is called a production reaction, and its production rate is denoted as PR. Therefore, designed strains are evaluated by PR at GR maximization in the simulations. The simultaneous occurrence of cell growth and target metabolite production is called growth-coupled production (See Fig. 1(A) and (B)). In this paper, we consider it growth-coupled production when the value of PR is 0.001 mmol/gDW/h or more at GR maximization.
Since only a limited number of metabolites are produced with growth coupling in the natural state of microorganisms, it is necessary to calculate gene deletion strategies for each target metabolite in such a case. However, the problem of calculating a gene deletion strategy that results in growth-coupled production is a computationally heavy task for genome-scale models, and many methods have been proposed to address this problem [10], [11], [12], [13], [14], [15], [16], [17].
One of the current best ideas for calculating gene deletion strategies for growth-coupled production is to find a minimal set of reactions, which is called a (constrained) elementary flux mode, where cell growth forces target metabolite production and then delete all reactions that are not included in that set. This idea was extended to calculate the minimal cut set (MCS), which is equivalent to gene deletion strategies because the MCS of the primary network is the elementary mode of the dual network [18], [19]. Examples of successful applications of the MCS-based method include the growth-coupled production of itaconic acid [20] and 2,3butanediol [21] by Escherichia coli and indigoidine by Pseudomonas putida [22].
Because a small number of reaction deletions is desirable in current metabolic engineering technology, the number of reaction deletions required can be reduced by combining the resulting minimal flows to create larger flows. On the other hand, it is often impossible to compute a feasible gene deletion strategy from the obtained reaction deletion strategies because the gene-proteinreaction (GPR) networks described by Boolean functions need to be considered [23], [24].
Recently, Tamura et al. developed gDel_minRN, which determines gene deletion strategies for the growth-coupled production of several vitamins in which the number of suppressed reactions is maximized [25]. Gene deletion strategies obtained by gDel_minRN are suitable for biological analysis of the resulting metabolic flow because it leads to growth-coupled production with as few reactions as possible. However, it is not suitable for current metabolic engineering technology because gDel_minRN deletes too many genes.
Therefore, it would be desirable to develop a method to result in growth-coupled production with a small number of gene deletions, using the large-scale gene deletion strategy obtained in gDel_minRN as a starting point. However, when the flows calculated by gDel_-minRN are combined, there is no corresponding feasible gene deletion strategy in most cases because of the conflict caused by GPR relations.
In this study, the author developed TrimGdel, which trims large-scale gene deletion strategies that lead to growth-coupled production while ensuring the original GR and PR. TrimGdel consists of three steps: (1) Step 1 employs gDel_minRN to obtain the large-scale gene deletion strategies that lead to growth-coupled production of target metabolites. (2) Step 2 minimizes the number of gene deletions so that the repressed reactions do not become unrepressed. (3) Step 3 reduces the number of genes to be deleted one by one under the condition that GR and PR do not fall below the original values when GR is maximized.
To evaluate the performance of TrimGdel and compare with other methods, TrimGdel, GDLS [10] and opt-Gene [26] were applied to iML1515 [27], iMM904 [28], and e_coli_core [29] in computational experiments: GDSL and optGene are ones of the most widely used software to derive gene deletion strategies and are available in COBRA Toolbox [30]; iML1515 and iMM904 are genome-scale constraint-based models of E. coli and S. cerevisiae, respectively; e_coli_core contains the only essential part of metabolism of E. coli.
When TrimGdel was applied to target metabolites in iML1515, iMM904, and e_coli_core, large-scale gene deletion strategies for growth-coupled production could be derived by gDel_minRn [25] for 39.9%, 13.9%, and 84.6% of the target metabolites, respectively. For each obtained large-scale gene deletion strategy, trimming was conducted to reduce the number of gene deletions. On average, TrimGdel succeeded in reducing 86.1%, 91.1%, and 81.6% of the number of gene deletions obtained by gDel_minRN while ensuring the original GR and PR. The size of the gene deletion strategies obtained by TrimGdel was less than 5% of whole genes in 86.9%, 91.7%, and 59.1%, and between 1% and 5% in 66.5%, 50.5%, and 56.8% of the gene deletion strategies. This includes important substances such as biotin and pantothenate, for which design methods for efficient production strains have not yet been established. Therefore, we conclude that TrimGdel effectively calculates gene deletion strategies with sizes between 1% and 5% of whole genes, which are difficult to calculate with existing methods but promising for designing production strains in the near future.
The remaining of this paper is organized as follows (See Fig. 1(C)): Section 2.1 describes the main problem of this study mathematically; Section 2.2 describes the developed algorithm TrimGdel mathematically; Section 2.3 illustrates the main problem and the application of TrimGdel to toy examples; Section 3.1 describes comprehensive computational experiments where TrimGdel was applied to iML1515, iMM904, and e_coli_core; Section 3.2 compares the performance of TrimGdel with The main objective of this study is to calculate gene deletion strategies that simultaneously lead to cell growth and target metabolite production. (B) Each gene deletion strategy is evaluated by the least value of the target metabolite production rate (PR) when maximizing the cell growth rate (GR). (C) When compared to the calculation of reaction deletion strategies, the calculation of gene deletion strategies that consider the gene-protein-reaction (GPR) network is more complex. TrimGdel, developed in this study, enables the calculation of gene deletion strategies with a size of 1% to 5% of whole genes, which is difficult to calculate using existing methods for many target metabolites.
GDLS and optGene; Section 3.3 describes the changes in metabolic flows when biotin is the target metabolite, and Section 4 analyzes the results of the computer experiments, evaluates the performance of TrimGdel and other methods, and discusses future work. The developed software is available online. 1

Definition
Let C 1 ¼ ðM; R; S; L; UÞ be a constraint-based metabolic network. M ¼ fm 1 ; . . . ; m a g and R ¼ fr 1 ; . . . ; r b g are sets of metabolites and reactions, respectively. S is a stoichiometry matrix, where S ij ¼ k means that r j produces k of m i per unit time. If k is a negative number, then m i is consumed. Let V ¼ fv 1 ; . . . ; v b g be a set of reaction rates per unit time (flux) of R. Let L ¼ fl 1 ; . . . ; l b g and U ¼ fu 1 ; . . . ; u b g be the sets of the lower and upper bounds of the reaction rates (speed) for V , respectively. R always includes one special virtual reaction r growth that represents cell growth, and the cell growth flux is represented by v growth , which is called the growth rate (GR). The rate of the reaction producing the target metabolite under the condition that cell growth is maximized is called the production rate (PR). Let r production be the target metabolite production and v production be its rate, that is, v production = PR.
For the gene deletion simulations, C 2 ¼ ðG; F; P Þ, which is the GPR network part of the constraint-based model, is also considered. G ¼ fg 1 ; . . . ; g c g and F ¼ ff 1 ; . . . ; f b g are sets of genes and Boolean functions, respectively. P ¼ fp 1 ; . . . ; p b g is the set of the outputs of F . If p j ¼ 0, then l j and u j are forced to be zero and r j is repressed. The constraint-based model C is defined as C ¼ ðC 1 ; C 2 Þ ¼ ðM; R; S; L; U; G; F; P Þ.
In the standard setting of growth coupling simulation using FBA with C 1 and C 2 , PR is evaluated when GR is maximized by the following linear programming (LP): . . . ; ag; j ¼ f1; . . . ; bg p; g 2 f0; 1g (evaluate the minimum PR= v production Þ Because LP does not always have a unique solution, the minimum PR at GR maximization is evaluated in this study.

Algorithm
TrimGdel aims to find the smallest gene deletion strategy for growth-coupled production by removing unnecessary gene deletions from the large-scale gene deletion strategy obtained by gDel_minRN for a given target metabolite and constraint-based model. TrimGdel consists of three steps: 1) Step 1 employs gDel_minRN to obtain a largescale gene deletions strategy, derives which reactions are repressed, and determines the initial GR and PR. Initial GR and PR values will be the minimum requirements that TrimGdel must satisfy.

2)
Step 2 minimizes the number of deleted genes while maintaining which reactions are repressed. 3) Step 3 trims unnecessary deleted genes while ensuring original GR and PR at the maximization of GR. In the following, TrimGdel is mathematically explained while an explanation with a small toy example can be found in Section 2.3.
Step 1: Let C and m target be the given constraint-based model and target metabolite. If C has a corresponding exchange reaction r to produce m target , r is treated as the target metabolite production reaction r target . If not, TrimGdel adds a virtual exchange reaction r target to produce m target to evaluate gene deletion strategies. For C and r target , TrimGdel employs gDel_minRN to obtain a large-scale gene deletion strategy D 1 , which results in the growth-coupled production of m target . If gDel_-minRN cannot find D 1 , TrimGdel stops and returns no solution.
According to D 1 and p j ¼ f j ðGÞ, which represents the GPR network for r j , TrimGdel determines the reactions that are repressed. Let P 0 and P 1 be the sets of repressed and non-repressed reactions, respectively. If p j ¼ 1, the original upper and lower bounds of the reaction rate are applied, l j v j u j . Because the solution of linear programming (LP) may not be uniquely determined, there may be multiple PR values at the time of GR maximization. The values of GR (=v growth ) and the minimum PR (=v target ) at the maximization of GR are set to GR threshold and PR threshold , respectively, using the flux variability analysis (FVA)-based calculation. It is to be noted that FVA can be easily performed by two LPs.
Step 2: TrimGdel minimizes the number of deleted genes while maintaining P 0 and P 1 . Because the set of repressed reactions does not change, GR and PR at the maximization of GR = v growth do not change either. D 2 , which is the set of genes assigned to 0, is the set of deleted genes at this time point.
Step 3: TrimGdel picks up one deleted gene g i in the ascendant order of i. For the provisional set of deleted genes D 2 À g i , TrimGdel maximizes v growth and checks whether v growth ! GR threshold and v production ! PR threshold are satisfied. If the conditions are satisfied, g i is deleted from D 2 and proceeds to the next deleted gene. If i reaches n, it goes back to 1. If no deleted gene can be trimmed for 1 i n, TrimGdel stops and returns D 2 .
The pseudo-code of TrimGdel is as follows.

Example
A toy example of the constraint-based model C ¼ ðC 1 ; C 2 Þ ¼ ðM; R; S; L; U; G; F; P Þ is shown in Fig. 2 and explained as follows.
Metabolic Network Part. M ¼ fm 1 ; . . . ; m 6 g and R ¼ fr 1 ; . . . ; r 9 g are sets of metabolites and reactions, respectively. The target metabolite is m 6 . The growth and target metabolite production reactions are r 8 and r 9 , respectively. The stoichiometry matrix is ½x; y denotes the lower and upper bounds for each reaction rate, that is, L ¼ ð0; 0; 0; 0; 0; 0; 0; 0; 0Þ and U ¼ ð10; 10; 5; 5; 10; 5; 5; 10; 10Þ. GPR Network Part. The set of genes, Boolean functions, and outputs of the Boolean functions are G ¼ fg 1 ; . . . ; g 5 g, F ¼ ff 1 ; . . . ; f 9 g, and P ¼ fp 1 ; . . . ; p 9 g, respectively, where For example, because the GPR relation for r 4 is given as f 4 : p 4 ¼ g 1^g3 , the reaction rate of r 4 , denoted as v 4 , is forced to be 0 if one of g 1 and g 3 is 0, while 0 v 4 5 is held if both g 1 and g 3 are 1. However, v 5 is forced to be 0 if both g 1 and g 3 are 0, while 0 v 5 10 is held if at least one of g 1 and g 3 is 1. 8 ; v 9 are forced to be 0 by genes.
Behavior of the Constraint-Based Model. In the original state, the maximum value of GR is v 8 ¼ 10. However, there are three paths to reach from the nutrient reaction r 1 to the growth reaction r 8 ; that is, ðr 1 ; r 2 ; r 5 ; r 8 Þ, ðr 1 ; r 3 ; r 6 ; r 8 Þ, ðr 1 ; r 4 ; r 7 ; r 8 Þ. If the first path is not used, GR = 10 and PR = 10 are obtained, as shown in ID 1 of Table 1. This is the most optimistic case regarding the value of PR. However, if only the first path is used, GR = 10 and PR = 0 are obtained as shown in ID 2 of Table 1. This is the most pessimistic case regarding the value of PR. In this study, we evaluate the most pessimistic value of PR when GR is maximized. Therefore, GR = 10 and PR = 0 are obtained in the original state. The behaviors of this constraint-based model are described for the cases in which the gene deletion strategies are f, fg 1 g, fg 4 g, fg 1 ; g 3 ; g 4 ; g 5 g, fg 1 ; g 3 ; g 4 g, and fg 3 ; g 4 g in Table 1. The other cases are omitted because they are not used in the following explanation. IDs 6, 8, 10, and 12 represent the cases that result in the growth-coupled production of m 6 . (GR,PR) = (10,10) is held for ID 6 and (GR,PR) = (5,5) is held for IDs 8, 10, and 12.
Behavior of TrimGdel. First TrimGdel employs gDel_-minRN [25]. Suppose that gDel_minRN outputs a largescale gene deletion strategy fg 1 ; g 3 ; g 4 ; g 5 g. Because GR = PR = 5 is obtained, GR threshold and PR threshold are set to 5. Therefore, TrimGdel must find a smaller gene deletion strategy that satisfies GR ! 5 and PR ! 5.
Step 2: The number of deleted genes is minimized while P 0 and P 1 are maintained using integer linear programming. Since trimming g 5 does not affect P 0 and P 1 , ðg 1 ; g 2 ; g 3 ; g 4 ; g 5 Þ ¼ ð0; 1; 0; 0; 1Þ is obtained. It is to be noted that GR and PR remain at 5.
Step 3: TrimGdel checks the effect of trimming g 1 from fg 1 ; g 3 ; g 4 g. When the gene deletion strategy is fg 3 ; g 4 g, GR = PR = 5 is still obtained. Next, the effect of trimming fg 3 g is checked. When the gene deletion strategy is fg 4 g, GR = PR = 10 is obtained. It is to be noted that GR and PR are allowed to be larger than GR threshold and PR threshold , respectively. However, when g 4 is trimmed from the gene deletion strategy, GR = 10 and PR = 0 are obtained since no gene is deleted. Then, TrimGdel stops and outputs fg 4 g.

COMPUTATIONAL EXPERIMENTS
First, to analyze the performance of the developed method, TrimGdel was applied to iML1515, iMM904, and e_coli_core. The results are described in Section 3.1 The number of genes, reactions, metabolites included in each model is described in Table 2.
Second, to compare the performance, GDLS and optGene were applied to iML1515, iMM904, and e_coli_core with the same problem setting as TrimGdel. Comparative analysis was conducted in Section 3.2.
All procedures in the computational experiments were implemented on a CentOS 7 machine with an AMD Ryzen Processor with 2.90 GHz 64 cores/128 threads, 128 GB memory, and 1 TB SSD. This workstation had CPLEX 12.10, COBRA Toolbox v3.0 and MATLAB R2021a installed and used for these analyses.
An auxiliary exchange reaction was temporarily added to the model to simulate target metabolite production if the target metabolite did not have a production reaction.

Comprehensive Experiments
For iML1515.
The model contains 1877 metabolites, but by solving a linear programming problem that maximizes PR, we can confirm that 785 of these metabolites cannot satisfy PR!0.001, which means that it is theoretically impossible to obtain the gene deletion strategies for growth-coupled production. In other words, for the remaining 1092 metabolites, there may be a gene deletion strategy that leads to growthcoupled production. See also Table 3

(A).
Because gDel_minRN succeeded in calculating largescale gene deletion strategies for 436 of these metabolites, trimming was conducted to these 436 large-scale gene deletion strategies.    for 86.9% of the target metabolites. For the remaining 13.1% of the target metabolites, the number of gene deletions ranged from 887 to 909. There were no target metabolites for which the number of gene deletions was between 54 and 886. The distribution of the number of gene deletions was divided into those 53 or less and 887 and more. In particular, the number of gene deletions was between 1% and 3.5% of whole genes for 66.5% of the target metabolites. Among the gene deletion strategies obtained by TrimGdel, those for the three vitamins shown in Table 3(D) represent typical cases. For biotin, the number of deleted genes was 978, 890, and 32 in Cases 1, 2, and 3, respectively. The total elapsed time was 18m54 s. Similarly, for pantothenate, the number of deleted genes was 954, 890, and 34 in Cases 1, 2, and 3, respectively. The total elapsed time was 17m33 s. For biotin and pantothenate, the final number of deleted genes was around 30 for both. However, for riboflavin, the number of deleted genes was 952, 890, and 3 in Cases 1, 2, and 3, respectively. The total elapsed time was 16m13 s.
The same framework of experiments as iML1515 was applied to iMM904. The results are summarized in Table 4.
As shown in Table 4(A), gDel_minRN succeeded in calculating large-scale gene deletion strategies for 109 of the 782 target metabolites. The average number of deleted genes for the 109 target metabolites was 613.39 for Case 1, 534.60 for Case 2, and 54.39 for Case 3. The maximum number of deleted genes was 630 in Case 1 and 547 in Cases 2 and 3, while the minimum number of deleted genes was 591 for Case 1, 518 for Case 2, and 1 for Case 3. The number of gene deletions was 36 or less for 91.7% of the target metabolites. For the remaining 8.3% of the target metabolites, the number of gene deletions ranged from 530 to 547. There were no target metabolites for which the number of gene deletions was between 37 and 529. The distribution of the number of gene deletions was divided into those 36 or less and 530 or more. The number of gene deletions was between 1% and 4% of whole genes for 50.5% of the target metabolites.
The same framework of experiments as iML1515 and iMM904 was applied to e_coli_core as well. The results are summarized in Table 5.
As shown in Table 5 The number of gene deletions was between 1% and 5% of whole genes for 56.8% of the target metabolites.

Comparison With Other Methods
GDLS and optGene were applied to iML1515, iMM904, and e_coli_core to compare the performance with TrimGdel. In the GDLS experiments, the minimum GR was set to 0.001. The time limit was set to 20 m for all methods. The results are summarized in Table 6.    For iML1515, the success ratio of GDLS and optGene were 0.2% and 0.64%, respectively. In the GDLS strategies, the maximum PR was 0.001 or more for all 1092 target metabolites. However, GR and the minimum PR at GR maximization were zero for 1078 and 1085 target metabolites, respectively. Therefore, only two gene deletion strategies resulted in growth-coupled production when the worst PR was evaluated at GR maximization. In the optGene strategies, the maximum GR was 0.001 or more for 900 cases. However, the worst PR at GR maximization was 0,001 or more only for 12 target metabolites. The average numbers of gene deletions in the successful cases were 1.342 and 4.556 for GDLS and optGene, respectively. The average elapsed times of TrimGdel, GDLS, and optGene for the successful cases were 17m42 s, 38 s, and 20m4s, respectively. For iMM904, the success ratio of GDLS and optGene were 0.12% and 4.6%, respectively. In the GDLS strategies, the maximum GR and PR at GR maximization were 0.001 or more for 125 and 670 target metabolites, respectively. However, the minimum PR was 0.001 or more only for one target metabolite. Therefore, only one gene deletion strategy resulted in growth-coupled production when the worst PR was evaluated at GR maximization. In the optGene strategies, the maximum GR was 0.001 or more for 664 cases. However, the worst PR at GR maximization was 0,001 or more only for 36 target metabolites. The average numbers of gene deletions in the successful cases were 1 and 6.69 for GDLS and optGene, respectively. The average elapsed times of TrimGdel, GDLS, and optGene for the successful cases were 2m34 s, 0.81 s, and 20m19 s, respectively. For e_coli_core, the success ratio of GDLS and optGene were 13.5% and 48.1%, respectively. In the GDLS strategies, GR, the minimum PR, and the maximum PR at GR maximization were 0.001 or more for 52, 9, and 17 target metabolites, respectively. In the optGene strategies, the maximum GR was 0.001 or more for 46 cases. The worst PR at GR maximization was 0,001 or more for 39 target metabolites. The average numbers of gene deletions in the successful cases were 3.22 and 7.12 for GDLS and optGene, respectively. The average elapsed times of TrimGdel, GDLS, and optGene for the successful cases were 1.305 s, 0.797 s, and 20m5s, respectively.

Gene Deletion Strategies for Biotin Production
The gene deletion strategies for biotin in iML1515 were analyzed to compare metabolic flows for different cases. The larger gene deletion strategy after Step 1 deleted 978 genes, while 32 genes were deleted after Step 3 for growth-coupled biotin production. Fig. 3 shows an overview of the area around the biotin production reaction in the main flux flows obtained in the computational experiments. There is almost a single path from L-homocysteine to biotin production. There are two paths to reach l-homocysteine: from malonyl CoA and from l-cysteine. Table 7 shows the reaction rates of the growth reaction, biotin synthase, Malonyl-CoA methyltranferase (MAL-COAMT), 0-succinylhomoserine lyase (SHSL1) and ATP synthase for the natural state (Case 0), Cases 1 and 3. It should be noted that the reaction rates are the same for Cases 1 and 2. The sums of the absolute values of all reactions are also shown.
In Case 0, the growth reaction rate was 0.8770, but the biotin synthase reaction rate was 1:7 Â 10 À6 , which is lower than the required lower bound 0.001. After deleting 978 genes determined by gDel_minRN (Case 1), GR decreased to 0.1493, but the biotin synthase rate increased to 0.1313. In Case 3, TrimGdel trimmed 946 deleted genes, and the strategy with 32 gene deletions increased both GR and biotin synthase to 0.5272 and 0.2629, respectively. The MAL-COAMT rates were the same as the biotin synthase rates in all cases. The SHSL1 rates may be linked to the biotin synthase rates since its value increases in the application of Steps 1 and 2 and the subsequent application of Step 3. The ATP synthase rate was almost the same in Case 0 and Case3, but it decreased only in Case 1. The sum of the absolute values of all reaction rates was the largest in Case 0, decreased significantly in Case 1, and took an intermediate value in Case 3.    (20 = 72-52) of the metabolites in iML1515, iMM904, e_coli_core, there is no gene deletion strategy for growth-coupled production, respectively. Therefore, it is reasonable to use the remaining 58.2% (1092), 63.8% (782), 72.2% (52) as the target metabolites to evaluate the performance of TrimGdel and other methods.
TrimGdel could calculate large-scale gene deletion strategies in iML1515, iMM904, e_coli_core for 39.9% (436), 13.9% (109), and 84.6% (44) of the target metabolites, respectively. However, these values depend on the performance of gDel_minRN because TrimGdel employs gDel_minRN in Step 1. If a method cannot obtain a large-scale gene deletion strategy, Steps 2 and 3 cannot be applied. Therefore, improving the performance of gDel_minRN is a necessary future work for the effective utilization of TrimGdel.
As shown in Table 3(B), the average number of deleted genes in Cases 2 and 3 was 93.3% (898.75) and 14.0% (134.81) of Case 1 (963.74), respectively, for iML1515. In other words, the number of gene deletions could only be reduced by 6.74% on average when whether each reaction was suppressed or not was maintained. On the other hand, when the constraint was only that GR and PR were not reduced, the number of gene deletions was reduced by 86.0% compared to Case 1. These trimming rates were 12.9% and 91.1% for iMM904, and 27.4% and 81.6% for e_coli_core as shown in Tables 4(B) and 5(B). In all three models, Step 3 trimmed much more genes than Step 2.
In Cases 1 and 2, the differences between the maximum and minimum numbers of gene deletions for iML1515 were small, 31 and 24, respectively; however, in Case 3, it was 902. These values were 39, 29, and 546 for iMM904, and 7, 11, and 48 for e_coli_core. This may be due to the fact that the number of genes that can be trimmed does not differ greatly until Case 2 but varies greatly in Case 3. Table 3(C) summarizes the distribution of the number of gene deletions in Case 3 for iML1515. 86.9% were 53 or less, 13.1% were 887 or more, and 0% were between 54 and 886.
The number of gene deletions could be reduced to 53 or less except for the 57 target metabolites for which the number of gene deletions was 887 or more. Although this number may be related to the distribution of the number of gene deletions in Case 2, the reason is unclear and remains to be clarified in future studies. As shown in Tables 4(C) and 5(C), similar discussion can be made for iMM904 and e_coli_core. 91.7% were 36 or less for iMM904, and 84.1% were 13 or less for e_coli_core.
The number of gene deletions was between 1% to 5% of whole genes for 66.5%, 50.5%, and 56.8% of the TrimGdel methods for iML1515, iMM904, and e_coli_core, respectively. Because existing methods focus on gene deletion strategies with sizes less than 1% of whole genes, TrimGdel can be considered an effective complement. However, the number of gene deletions obtained by TrimGdel is a local optimum, for which a smaller gene deletion strategy may exist. In particular, Step 3 of TrimGdel is very simple, and further improvement may lead to a smaller gene deletion strategy.
As shown in Table 3(D), biotin and pantothenate need around 30 gene deletions, while riboflavin needs only three gene deletions in the TrimGdel method for iML1515. It should be noted that although there are many known production strain strategies for riboflavin [31], there is no established design strategy for biotin or pantothenate. The elapsed time was 18m54 s, 17m33ss, and 16m13ss, respectively. Considering the number of combinations in the solution space, gene deletion strategies with a size of 1% or less of whole genes were already computable by existing methods. However, TrimGdel made it possible for the first time to compute gene deletion strategies for many target metabolites that require a gene deletion number greater than 1% of whole genes for genome-scale models.
Steps 1 and 2 of TrimGdel are exponential time algorithms since they solve mixed-integer linear programming, which is known to be NP-complete [32].
Step 3 is solvable in polynomial time since it employs linear programming at most Oðc 2 Þ times and linear programming is solvable in polynomial time [32], where c is the number of genes.
To evaluate the applicability of TrimGdel on larger-scale models, TrimGdel was applied to Recon3D [33], which is a three-dimensional genome-scale model of human. Recon3D has 2248 genes, 5835 metabolites, and 10600 reactions. As gDel_minRN could derive gene deletion strategies for only 20 target metabolites, TrimGdel was also applied to the 20 target metabolites. The average number of deleted genes was 2131.9 for Cases 1, 1208.4 for Case 2, and 187.3 for Case 3. The average number of deleted genes in Cases 2 and 3 was 56.7% and 8.8% of Case 1, respectively. The average elapsed time was 1h47m55 s. It can be considered that Step 1 (gDel_minRN) was not effective, but Steps 2 and 3 were effective for Recon3D. The reason for the low success ratio of gDel_minRN may be that Recon3D is more complex as it considers three-dimension. Steps 2 and 3 of TrimGdel can be applied to any large gene deletion strategies derived by any method if growth-coupled production is achieved. However, it seems that such a method is not available other than gDel_minRN at the moment.
Comparison With Other Methods. As shown in Table 6, TrimGdel had the highest success ratio among TrimGdel, GDSL, and optGene, for iML1515, iMM904, and e_coli_core. GDLS had the minimum elapsed time, but its success ratio was low. As described in Section 3.2, many GDLS strategies resulted in target metabolite production without growth or growth-coupled production when the best PR was evaluated. However, when the worst PR at GR maximization was evaluated, growth-coupled production was rarely achieved. As optGene returns the best solution at that point when the specified computation time runs out, designating a longer computation time may increase the success ratio.
TrimGdel, optGene, and GDLS are in descending order of the average number of deleted genes, which is consistent with the descending order of the success ratio. One reason for the high success ratio of TrimGdel is the success in searching for larger gene deletion strategies. The elapsed time of TrimGdel, which was less than 20 m for every case, even for genome-scale models, is acceptable for actual use. The average numbers of deleted genes for TrimGdel were 8.90%, 6.02%, and 9.04% of whole genes, but they were less than 5% for more than a half of the gene deletion strategies.
For TrimGdel, the number of gene deletions was between 1% and 5% of whole genes for 66.5%, 50.5%, and 56.8% of the successful gene deletion strategies for iML1515, iMM904, and e_coli_core, respectively. On the other hand, for GDLS, the number of gene deletions was 1% or less of whole genes for 100%, 100%, 33.3% of the successful gene deletion strategies for iML1515, iMM904, and e_coli_core, respectively. These numbers were 100%, 80.1%, and 4% for optGene, respectively. It was seen that GDLS and optGene could derive gene deletion strategies whose sizes are 1% or more in a small model. However, we can conclude that TrimGdel is much more suitable for deriving gene deletion strategies whose sizes are between 1% and 5% of whole genes for genome-scale models.
In all cases described in Table 7, the values of biotin synthase and MALCOAMT are identical, which may imply that malonyl CoA should be produced in abundance in biotin production. The sum of the absolute reaction rates of all reactions (total flux) is correlated with the number of gene deletions and the growth rate but not with the biotin production rate. Because more gene deletions lead to more repressed reactions, it results in a lower total flux and GR. However, there seem to be other factors that determine whether growth-coupled production of biotin occurs.
It is difficult to derive a clear conclusion from Table 7. However, one possible hypothesis is that a lower GR and a relatively high ATP synthase rate lead to biotin production. In genome-scale constraint-based models, it is not easy to manually and visually analyze how each gene deletion strategy modifies the metabolic flow distribution and leads to growth-coupled production. An important future task is to develop algorithms and software to automatically analyze and extract the key points of metabolic distribution changes.