Exploring Observability of Attractor Cycles in Boolean Networks for Biomarker Detection

Boolean Network (BN) is a simple and popular mathematical model that has attracted significant attention from systems biology due to its capacity to reveal genetic regulatory network behavior. In addition, observability, as an important network feature, plays a vital role in deciphering the underlying mechanisms driving a genetic regulatory network and has been widely investigated. Prior studies examined observability of BNs and other complex networks. That said, observability of attractor, which can serve as a biomarker for disease, has not been fully examined in the literature. In this study, we formulated a new definition for singleton or cyclic attractor observability in BNs and developed an effective methodology to resolve the captured problem. We also showed complexity is of O(Pmn), when the maximal period of cyclic attractor is P, the number of attractor is m and the number of genes is n. Importantly, we have confirmed our method can faithfully predict the expression pattern of segment polarity genes in Drosophila melanogaster and showed it can effectively and efficiently deal with the captured observability problem.


I. INTRODUCTION
Systems biology, refers to the study of biological systems component interactions and a field that has undergone rapid development in the post-genome era. Genetic regulatory networks (often simply referred to as genetic networks) have been intensively studied to better understand the interactions of various genes, molecules and proteins. Several formalisms have been employed to better model genetic regulatory processes. BNs [1]- [3] have received particular attention owing to their capacity to capture the highly dynamic behavior of genetic regulations.
The first interesting problem typically encountered is Boolean network topological structure, such as fixed points, cycles and attractors basin, etc [4]- [6]. Many studies dealing with the analysis of attractors of randomly given BNs have been performed [3], [7] in large part due to the fact that different attractors can be utilized to represent unique cell types. Several methods have been suggested [8]- [10] to enumerate and/or identify attractors in BNs. As examples, Devloo et al. proposed an effective method based on transforming to a constraint satisfaction problem [8], and Irons developed another method employing small subnetworks [9]. Zhang et al. [10] proposed algorithms to enumerate singleton and small attractors, and also, analyzed the time complexities of the average cases of these algorithms. Of note, it has been shown that singleton attractors (i.e., a fixed state) detection problem is NP-hard [11]. In addition, Akutsu et al. [12] developed algorithms with guaranteed "worst case" time complexity allowing singleton attractor detection in BNs of limited Boolean functions. Importantly, all these approaches dealt specifically with 2 n ×2 n matrices (where n represents the gene number in a BN), making applying them to large-scale BNs exceedingly challenging. Therefore, Akutsu et al. [13] developed an integer linear programming (ILPbased) approach to construct the attractor control problem of medium size BNs. Even though the authors in [13] have reported that the attractor control problem is ∑ 2 P -hard, an ILP-based methodology can be applied to moderate size BNs. Each of these studies, however, has simply examined randomly generated, simple BNs (i.e., one cell type). It is therefore critically necessary to develop strategies capable of analyzing multiple BNs (i.e., various cell types). In the genuine cellular context, there are various kinds of cells, and it is therefore simply more realistic to perform attractor analysis for multiple BNs. Motivated by this, Qiu et al. [14] recently examined this problem and employed the ILP-based method to solve it.
The control of Boolean networks is also a challenging problem. According to control theory, dynamic systems are controllable if they can be driven from any initial state to any desired final state with a suitable choice of inputs [15]. Several studies have been conducted on this problem [16], [17], and when a random Boolean network is examined, the principle interest lies on the stationary distribution of the system. The reachability problem as in control theory becomes a common concern for the deterministic network [18].
Observability, as a dual of controllability, is a significant concept which needs further exploration. Several studies have recently examined such observability problems. Cheng and Qi [19], [20] adopted a semi-tensor strategy to examine observability and related problems. While their design was effective for formulating and constructing the problems, the application of their method to large-scale BNs is also impractical. Of note, Liu et al. [21] proposed a novel view of complex networks observability. They showed the aim of attractors observability was to determine the minimum consecutive nodes necessary to discriminate distinct attractor cycles from each other in the system. This work provided a great opportunity for one to diagnose disease types since different attractors represent different cell types. That said, we suggest here that the problem proposed represents a particular case, binary alphabet of the minimal key problem examined in [12], [22], [23]. It has been showed that the minimal key problem is NP-hard even when dealing with the binary alphabet [23]. Cheng et al. [24] have proposed to apply the ILP-based method to solve the observability problem which was limited to singleton attractors. However, cyclic attractors exist in the real-world and the issue of observability for cyclic attractors has not yet been fully addressed. Qiu et al. [25] have proposed a method to study the observability of singleton and cyclic attractor. However, the analysis on the sensitivity of number of genes and attractors need to be further explored. In addition, the application of our method to the real model requires further discussion. Thus, the problem we try to address in this study is to identify the minimum set of contiguous nodes such that we can discriminate the singleton or cyclic attractors. Furthermore, we analyze the distribution of gene number and attractor number for distinguishing the cell types, respectively. We show that our proposed method is efficient and can effectively solve large-size problems in O(P m n) for the worst case, where P is the maximum number of period of cyclic attractor, m is the number of (singleton or cyclic) attractors and n is the number of genes (nodes) in a BN.
This paper has two main contributions. First, we formulate a new biological problem based on Boolean networks to discriminate the cell type with minimum consecutive nodes. Second, we propose an effective and efficient method, (works in O(P m n)) to solve the captured problem. By integrating the expression pattern of the segment polarity genes in Drosophila melanogaster, we have demonstrated that our approach can efficiently identify the minimum contiguous genes or proteins to discriminate the attractors. Furthermore, experimentation confirms that our proposed methodology is extremely efficient and can effectively solve the problem in seconds.
The remainder of the paper is structured as follows. In the "Problem formulation" section, we introduce some background and formulate the problem. "Methodology" presents a novel method for our captured problem and give two propositions. "Numerical Experiments" describes the materials and experimentation results to validate the efficiency and effectiveness of our method. And finally, conclusions and future directions are described in the last section.

II. PROBLEM FORMULATION
This section (1) introduces the BN model and the attractor detection problem which are closely related to attractor observability, and (2) then formulates the attractor observability problem in BNs.
A BN G(V, F) consists of a set of nodes V used to represent multiple genes V = {ν 1 , ν 2 , ⋯ , ν n } and a set of Boolean functions F = (f 1 , f 2 , ⋯ , f n ) where f i : {0, 1} n → {0, 1}. Each node (e.g., a gene) is assumed to take either 1 (active) or 0 (inactive) as its state value. And ν i (t) is denoted as the state of node i at time t, after which the state of node i at time t + 1 is given as follow: v i (t + 1) = f i (v(t)), i = 1, 2, …, n This indicates the gene state ν i at time t + 1 is determined by the h i gene states at previous time t, where h i represents input node number called indegree of ν i . Moreover, the maximum indegree of BN is denoted by H = max i {h i }. We note x ∨ y, x ∧ y, ¬x, x ⊕ y is logical OR of x and y, logical AND of x and y, logical NOT of x and exclusive OR of x and y, respectively. Overall gene expressions at time t in a BN is determined by and called the network Gene Activity Profile (GAP) at time t. Note that gap(t) ranges from [0, 0, . . . , 0] to [1, 1, . . . , 1], therefore we have a total of 2 n possible global states. The following details an example of a BN.
Each gene ν i is regulated by a Boolean function f i . A state transition diagram and the dynamics of a BN are shown in Figure 1. The gene state is determined by its associated regulatory function. Given the initial state of a network, a BN will ultimately enter into a certain set of global states (i.e., a directed cycle as depicted in state transition diagram). We denote such a set as an attractor. When an attractor corresponds to one global state (i.e., a single fixed point), it is classified as a singleton attractor. In all other events, it would be classified as a cyclic attractor, and we denote p as the period of cyclic attractor if it consists of p global states. We can see from Figure 1 that the network will eventually evolve into one of two attractors: either a singleton attractor, [0, 1, 0] or a cyclic attractor of period 2, [1,

A. OBSERVABILITY OF ATTRACTORS
Once attractors are detected, one can conduct an analysis of the BN attractor control problem [13], [14]. Observability, as a compliment of controllability, has been examined by Cheng and Qi [19]. They developed necessary and sufficient conditions for dealing with the captured observability problem. Cheng et al. [24] proposed an efficient and effective approach for solving singleton attractor observability in BNs. That said, the problem of observability for cyclic attractor has not yet been resolved. Thus, in this study, we propose a new methodology for addressing the observability problem of (singleton and cyclic) attractors, not restricted to singleton ones in BNs, and therefore representative of the realities of biology. Importantly, when a BN is given, attractors can be further detected. In brief, assume a set of singleton or cyclic attractors is given as S = {S 1 , S 2 , . . . , S m }, our method then attempts to identify a tandem gene set of minimum cardinality that can be used to distinguish distinct attractor cycles from each other in a system. For instance, we consider four singleton or cyclic attractors with seven genes which is given as S = {S 1 , S 2 , S 3 , S 4 }. where S 1 and S 2 are singleton attractors, S 3 and S 4 correspond to cyclic attractors of period 2. Note that there are four attractors in total, it would require at least two nodes to distinguish them. Next, if we observe the third and forth nodes only, we can identify which attractor a system belongs to (i.e., (1, 1), (0, 0), (0, 1), (1, 0) mean S 1 , S 2 , S 3 and S 4 , respectively). As such, the attractor observability problem is formulated as: Instance: List of (singleton or cyclic) attractors in a BN, Problem: Determine the minimum cardinality of consecutive nodes that can distinguish different attractor cycles from one another in a BN.

III. METHODOLOGY
In the following section, we describe our attractor observability problem approach.

A. PROPOSED ALGORITHM FOR AO
Assume a set of (singleton or cyclic) attractors are given as S = {S 1 , S 2 , . . . , S m }, where size of each S i , (i ∈ {1, 2, ⋯ , m}) is l i . We then have total of L = l 1 • l 2 ⋯ l N possible combinations and employ a matrix set A = {A 1 , A 2 , ⋯ , A L } to describe all possible case combinations. Each matrix A i , i ∈ {1, 2, ⋯ , L} is therefore of size m × n (m: attractor number; n: nodes (genes) number). Then an individual row A i corresponds to one global state from singleton attractor or cyclic attractor and individual columns represent gene states. We then develop our algorithm based on matrix A for AO. The major steps (procedures) for our algorithm are details below.
The Procedure-Step I: For i ∈ {1, 2, ⋯ , L}, assume the size of matrix A i is m × n, where m is attractor number and n is gene number. Repeat Step II-VIII.
Step II: Apply arithmetic ⊕ operator for A i to generate a new matrix biState i which is of size C m 2 × n. We then define recall that the arithmetic ⊕ is presented as follows: x ⊕ y = (x ∧ ¬y) ∨ ( ¬x ∧ y) .
Step III: For an indiviual row t ϵ C m 2 in matrix biState i , identify consecutive 0′s with length no less than ⌈log 2 (m)⌉ and place them in matrix B con (assume that total entry number in B con is r). Another matrix B index size r × 2 is then created to record indices of B con (e.g., B index (i, 1) and B index (i, 2) correspond to the first and last elements of row i of B con , i ∈ {1, ⋯ , r}, respectively).
Step IV: Generate the union of B con and denote it as B uni .
Step V: Let U represent a vector which ranges from 1 to n (i.e., U = {1, ⋯ , n}). If U \ B uni ≠ ∅ holds, then we can conclude this represents the desired minimal contiguous nodes and corresponding minimal number for A i is ⌈log 2 (m)⌉ and denoted as minL i . Therefore, the desired minimal number of consecutive nodes has been identified (i.e., ⌈log 2 (m)⌉ and stop. Otherwise, continue.
Step VI: Employ a sorting algorithm (such as bucket sort) to B index . Then rank vector B index (:, 1) in an ascending order and update the order in B index (:, 2) accordingly. As an example, . (2) Step VII: Compare last elements of rows with identical first element in B index . Discard rows with smaller last elements and keep the rows with the largest last element in a new matrix B ni of size l × 2 as the rows with smaller last elements are subsets of the row with the largest last element. Thus, B ni is Step VIII: The desired node number for A i is Proof: Let (ν 1 , ν 2 , . . . , ν n ) represent n consecutive nodes in a given BN. Assume that element number in B uni < n, at least one element in B uni c must exist which is denoted as ν x .
Then it suffices to show that there exists a pair of (x 1 , x 2 ), such that (ν x 1 , . . . , ν x , . . . , ν x 2 ) represent desired minimal consecutive nodes, where x 2 −x 1 + 1 = ⌈log 2 (m)⌉. In other words, we must show that there is no row with elements that are all 0 from column x 1 to x 2 in the matrix biState. For x-th column elements, there are two optional values (i.e., 0 or 1). One possible case is biState tx = 0 (for some t), where the length of consecutive 0's covering biState tx is < ⌈log 2 (m)⌉. Otherwise, x ∈ B uni . Therefore, this implies that no less than one 1 exists from column x 1 to x 2 . As another possible case, i.e., biState ik = 1, then it is obvious that no continuous 0's can exist from column x 1 to x 2 . Based on this, we conclude that (ν x 1 , . . . , ν x , . . . , ν x 2 ) is the desired minimal consecutive nodes for one attractor combination A i in the BN. □ then it suffices to demonstrate that in the sub-matrix biState(:, B ni (j + 1, 1) − 1 : B ni (j, 2) + 1), there is no row with all 0 elements. Note that biState t 1 ,B ni (j,2) corresponds to the last 0 in the t 1 row starting from column B ni (j, 1) to B ni (j, 2) (for some t 1 ), indicating that biState t 1 ,B ni (j,2)+1 value must be 1. However, if t ≠ t 1 , there are then two potential values of biState t,B ni (j,2)+1 (i.e., 0 or 1).
Case two: If min{B ni (j, 2) − B ni (j + 1, 1) + 3} < log 2 (m) then it is sufficient to consider this case (i.e., biState t 3 ,B ni (j+1,1)−1 is 0), due to the fact that for the other cases, min{B ni (j, 2) − B ni (j + 1, 1) + 3} is not less than ⌈log 2 (m)⌉. Thus, we extend indicated columns to guarantee containing column biState t 3 ,B ni (j+1,1)−1 and also guarantee the length of them is ⌈log 2 (m)⌉. Then columns of length ⌈log 2 (m)⌉ will produce the desired solution. Assume c = max{min{B ni (j, 2) − B ni (j + 1, 1) + 3}, log 2 (m) }, j ∈ {1, ⋯, l − 1} and since we have shown there exist node set that can differentiate different cell types, we can next prove c represents the optimal solution. Put another way, there is no set of nodes length c′ (c′ < c) that can distinguish A i attractors. Assume there exist a column set (c 1 , ⋯ , c t ) with length c′, then it is sufficient to show that c′ < c does not hold. If c′ is less than c, then a row exists such that the set of column from column c 1 to c t exactly represents a subset of column B ni (j, 1) to B ni (j, 2). This means that a row exists with elements all corresponding to 0's. Therefore, we can conclude that c′ does not exist such that c′ < c, and max {min{B ni (j, 2) − B ni (j + 1, 1) + 3}, ⌈log 2 (m)⌉}, j ∈ {1, ⋯ , l − 1} corresponds to the desired solution of A i . □ We give an example to illustrate the first case of our algorithm. . (5) and we can see that the matrix B con will be in the form It is noted that the union of B con is {[1 2 3], [5 6]} which is a subset of U. Thus any subset of the remaining elements of U \ B con which is of length ⌈log 2 (m)⌉ (i.e., 2) will be the desired set of nodes. Actually, the elements of U \ B con is 4, all the subset including 4 of length 2 is [3,4] or [4,5] which are exactly the desired minimal set of nodes necessary to discriminate attractors. Thus, [ν 3 , ν 4 ] or [ν 4 , ν 5 ] may be taken to represent the desired minimal consecutive nodes. However, if the union of B con is exactly equal to U, for instance, Similarly, we can obtain the matrix B con which is in this form: Since the union of vectors in B con ranges from 1 to n, it is necessary to apply Step VI. Thus, the matrix B index is sorted using bucket sort algorithm according to the first column and keep the row with the greatest last element in B index , then B ni is given by Then applying our proposed algorithm, the minimum set of columns is as follows: which is equal to 2. Furthermore, since the row number of A is 4, it requires at least ⌈log 2 (m)⌉ (i.e., 2) to distinguish the attractors, and this implies that it is impossible to find any less number for the other combination of attractor set. Thus, we conclude the length of the minimal set of nodes to distinguish the attractors is two. Based on this example, (ν 2 , ν 3 ) likely represents the desired minimal consecutive nodes.
In the case where a BN consists of singleton and cyclic attractors, we consider the following example to illustrate our method.
Example 3: We assume the steady states are as follows, By applying our algorithm, the minimal length of consecutive nodes for each combination A i (i = 1, 2, 3, 4) are 4, 6, 3 and 2, respectively. After comparing the above values, we conclude that the minimal cardinality of consecutive nodes necessary to distinguish different attractor cycles is 2. And the cyclic attractors are of period 2, thus the algorithm efficiently works in O(2 m n). Furthermore, it shall be noted that m is typically extremely small, and it is therefore reasonable to utilize our method to singleton or cyclic attractors with large-scale networks.

B. COMPLEXITY ANALYSIS
We have conducted a complexity analysis for our algorithm. The principle computational costs come from two parts: one comes from generating the matrix biState and the other comes from the sorting algorithm. The first part clearly requires O(C m 2 n) operations where Step VI, we have adopted bucket sort that works in O(n), and repeat the algorithm for P m time such that all possible combinations of singleton or cyclic attractors are considered. Thus, we conclude our algorithm is both effective and efficient for the problem of attractor observability and works in O(P m n).

IV. NUMERICAL EXPERIMENTS
In this section, we performed computational experiments to validate our methodoloy for attractor observability. Initially, we randomly generated a singleton attractor set and repeated the experiment ten times with different simulated attractors. Then we took the average value as the final result. Notations utilized are as follows.
• m: attractor number; • n: node number; • time: average time (in seconds) for each trial; • numNode: the desired minimal node number.
We utilize the above notations herein.
As seen in Table 2 our method was effective and efficient in solving the observability problem of singleton attractors for large-scale networks. Although the size of BN was up to 300 nodes, the average elapsed time was less than one second which is much faster than the ILP-based method [24]. Besides, the desired number of nodes to distinguish different attractor cycles was consistently small although the node number was large. For our next analysis, we randomly generated a cyclic attractor set, and set the period of cyclic attractors to two. Computational times are shown in Table 3 which illustrated the efficiency of our method. Running time was around 1 second even the number of network nodes was set to 1000. Therefore, our algorithm can be applied to cyclic attractors, instead of being confined to singleton attractors.
To examine the distribution of the desired number of nodes when the number of attractor was fixed (m), we fixed m as 20 and conducted analysis for different numbers of n ranging from 100 to 1000. The desired number of nodes are shown in Figure 2. Since most of the desired numbers are 6, this indicates that our method is insensitive to the node number in BNs with a fixed number of attractors. Therefore, n variation does not significantly effect desired node number for discriminating the attractor system.
We also examined the distribution of the desired node number when the number of nodes (n) was fixed to be 100. We adopted 10 as the step size for m in [10,100]. As illustrated in Figure 3, most of the desired numbers of consecutive nodes are quite small although m is large. Therefore, we can discriminate the set of attractors in large-scale network by identifying the small list of desired nodes.
To further verify our strategy, we applied our methodology to a popular genetic regulatory model in Drosophila melanogaster. Previously, in [26], Albert et al. proposed a model to describe embryonic pattern formation in the fruit fly Drosophila melanogaster. Their Boolean model consisted of 60 variables whose steady states were identified by manually solving a system of Boolean equations. After this, authors in [27] developed a strategy for efficiently identifying attractors of the network. To analyze the model, they first standardized their variables using the Boolean rules described in [26] by renaming them, i.e., SLP i or wg i to x 1 , ⋯ , x 60 . The variable x i and corresponding genes are summarized in Table 4. They next applied their proposed method and obtained the steady states shown in Table 5. Each row in Table 5 corresponds to a stable attractor and each column represents a gene (or protein). Attractors have been denoted as binary values with 1 representing a gene being expressed (or high protein concentration), and 0 representing a gene not being expressed (or low concentration). Based on the established given attractor set, we successfully applied our model to identify the minimum cardinality of contiguous nodes (i.e., 22) necessary to distinguish them. Specifically, (x 3 , ⋯ , x 24 ) are the desired minimum consecutive variables which implies we can use these 22 nodes to discriminate the attractors and the 22 variables correspond to a list of genes or proteins which are shown in Table 6.

V. CONCLUSION
In this work, we addressed a novel problem, observability of singleton or cyclic attractors, which is of value in distinguishing different attractor cycles. We have developed an effective and efficient methodology to solve the captured problem, and complexity analysis shows our methodology works in O(P m n) time. As the number of attractors (m) is typically small, our method can be employed to resolve large-scale networks in addition to medium-size ones.
We have also performed computational experiments verifying the efficiency and effectiveness of our novel method, and applied our algorithm to characterize a real-world scenario, i.e., segment polarity gene expression in Drosophila melanogaster. Our results suggest that our strategy may provide a novel tool for use in identifying useful genetic network biomarkers for the detection of disease. In the future work, further investigations will focus on developing more efficient methods to solve the captured problem.   Sensitivity analysis on the number of nodes in BNs.

SHAOBO TAN
Step size 100 is used for n in [100,1000] and m is fixed to be 20. Sensitivity analysis on the number of attractors in BNs.
Step size 10 is used for m in [10,100] and n is fixed to be 100.