On Quasi Cycles in Hypergraph Databases

The notion of hypergraph cyclicity is important in numerous fields of application of hypergraph theory in computer science and relational database theory. The database scheme and query can be represented as a hypergraph. The database scheme (or query) has a cycle if the corresponding hypergraph has a cycle. An Acyclic database has several desired computational properties such as making query optimization easier and can be recognized in linear time. In this paper, we introduce a new type of cyclicity in hypergraphs via the notions of Quasi <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-cycle(s) and the set of <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-nodes in hypergraphs, which are based on the existence of an <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>–cycle(s). Then, it is proved that a hypergraph is acyclic if and only if it does not contain any <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-nodes. Moreover, a polynomial-time algorithm is proposed to detect the set of <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-nodes based on the existence of Quasi <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-cycle(s), or otherwise claims the acyclicity of the hypergraph. Finally, a systematic discussion is given to show how to use the detected set of <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>-nodes to convert the cyclic hypergraph into acyclic one if the conversion is possible. The acyclic database and acyclic query enjoy time and/or space-efficient access paths for answering a query.


I. INTRODUCTION
In the last decades, a class of ''acyclic'' database scheme and different degrees of acyclicity has been introduced [1]. Codd [2] has defined a relational database scheme as a collection of table skeletons (a set of subsets of the attributes, which are the column names of the database tables). These tables can be represented as hypergraphs. Each attribute of a database scheme R corresponds to a node in a hypergraph H and each relation scheme R in R corresponds to an edge in H [3], [4].
The class of conjunctive queries (CQs) is one of the most important and simplest classes of database queries [5]. A conjunctive query is a form of queries with a logical conjunction operator. Each CQ can also be represented as hypergraph H . Acyclic CQs are efficiently solvable, i.e., all answers of an acyclic CQ can be computed in linear time [6].
The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei.
The α-acyclicity of a hypergraph is the least restrictive degree of acyclicity and it has more studies in the literature than the other two acyclicity degrees. The α-acyclicity has many applications in relational databases for query optimization [4] and constraint programming [9].
An important function in any database system is query answering. A good database scheme design would allow information to be retrieved easily and efficiently [10]. Acyclic databases are preferred due to the variety of desired computational properties it enjoys such as making query optimization easier than in the case of cyclic database and might be recognized in linear time [11]- [17].
Due to the great importance of the acyclic database scheme, Graham [18] and Yu and Ozsoyoglu [19] have introduced a polynomial-time algorithm for detecting the acyclicity of hypergraphs that is known in the literature as Graham or the GYO algorithm. The algorithm returns either the hypergraph is acyclic or not, but does not detect the cycle(s) or the graph nodes that perform that cycle(s) in the case of cyclic hypergraphs. This paper introduces the definitions of Quasi α-cycle(s) and the set of α-nodes in hypergraph H , which are based on the existence of α-cycle(s). Then, it is proved that a hypergraph is acyclic if and only if it does not contain any α-nodes. Also, a polynomial-time algorithm is proposed to detect the set of α-nodes based on the existence of Quasi αcycle(s) or otherwise claims that the input hypergraph is an α-acyclic. Moreover, a systematic discussion is given to show how to use the detected set of α-nodes to convert the cyclic hypergraph into acyclic one if the conversion is possible. The acyclic hypergraph database and acyclic query enjoy time and/or space-efficient access paths for answering a query.
The paper is organized as follows: in Section II, the basic definitions of hypergraphs and databases scheme are given. Section III contains the related work. Section IV introduces a new formalization of the GYO Algorithm, the notions of Quasi α-cycles and the set of α-nodes, and finally the proposed polynomial algorithm for detecting the set of α-nodes, if it exists. In Section V, the result of the proposed algorithm and a systematic discussion is given, Finally, Section VI gives the conclusion of the paper and suggestions for future work.

II. PRELIMINARIES
In this section, the basic definitions of hypergraphs and its properties and database scheme are given.

Definition 1 (Hypergraph):
A hypergraph H is a finite set of non-empty finite sets, called its hyperedges (or simply edges), the set V (H) of nodes of a hypergraph H, is defined to be the union of all its hyperedges [20]. For example,

Definition 2 (Size of a Hypergraph):
The size of a hypergraph |H| is defined to be the number of edges in it [3]. For example, the size of the hypergraph H in Figure 1  Definition 3 (Trivial Set): A set S is said to be trivial, if it contains less than two elements [4], i.e., |S| = 1.
Definition 4 (Size of an Edge): The size of an edge |e| is defined to be the number of nodes in it. For example, the size of any edge in the hypergraph of Figure 1 is 3.
Definition 5 (Singleton Edge): An edge e ∈ H is said to be a singleton, if |e| = 1, i.e., if it contains exactly one node [4].
Definition 6 (Subhypergraph): A hypergraph H is said to be a subhypergraph of a hypergraph H if H ⊆ H [20].
Note that, the hypergraph H , in H ⊆ H , is obtained from H by removing edges, and not by removing nodes from edges. For example, Figure 2 shows that H = {e 1 , e 2 , e 3 } is subhypergraph of the hypergraph in Figure 1 where V (H ) = V (H ).
Notation: Let H = {e 1 , . . . , e m } with V (H ) = {v 1 , . . . , v n }, then e i − {v j } denotes the edge e i after removing the node v j from it.
i.e., the set of all maximal edges in H (for inclusion) [20]. It is obvious to see that definition 7 is equivalent to the definition of reduction of a hypergraph, which is introduced by Fagin [4].

Definition 8 (Minimized Hypergraph):
A hypergraph H is said to be minimized if H = Min(H), that is, if no edge in H is a subset of another edge in it [4]. This definition is the same as the definition of the set of partial edges generated by the set M ⊆ V (H ) of a hypergraph, which is introduced by Fagin [4]. Figure 3 shows an example of the induced hypergraph of the hypergraph of  Definition 10 (Y-Tuple): Let V be a finite set of attributes and let Y a subset of V, a Y -tuple (or tuple) is a mapping that associates a value for each attribute in Y [4]. VOLUME 8, 2020 Definition 11 (Y -Relation State): A Y -relation state r (a relation state r over Y or a relation state r) is a finite set of Y-tuples. If r is a Y-relation and X ⊆ Y, then r [X] is a projection of r onto X, which is the set of all tuples t [X], where t ∈ r [4].
Definition 12 (Database Scheme): A database scheme R = R 1 , . . . ,R p , where each relation scheme R i is a set of subsets of the attribute V, i.e., R i = A i1 , A i2 , . . . ,A in i , i = 1, . . . , p and n i is the number of the attributes of the relation scheme R i . Each attribute A ij is associated with a domain D ij . This database scheme can be represented as hypergraph H = Definition 16 (Path): A path from node u to node v, where u, v ∈ V (H) , in a hypergraph H is a sequence of edges (e 1 , . . . , e k ) of length k ≥ 1 such that: It is also said that the sequence of edges (e 1 , . . . , e k ) is a path from e 1 to e k if condition (iii) is satisfied. For example, the path from node A to node F in the hypergraph of Figure

Definition 18 (Connected Hypergraph):
A hypergraph H is connected if there is a path between each pair of hyperedges. Equivalently, a hypergraph is connected if it consists of only one component [21]. For example, the hypergraphs in Figures 1 and 2 , v belongs to precisely one edge [4].

III. RELATED WORK A. α-ACYCLICITY
The terms α-acyclic hypergraph and acyclic hypergraph are synonymous.
Beeri et al. [11], introduced a special class of database schemes, called acyclic database scheme, which is based on the following concept of articulation set. Philippe and Samba [21] proposed a definition of α-cycles in hypergraphs, based on the same principle of α-cycle in graph theory. This definition does not depend on the concept of articulation set, it depends instead on the following definitions:

Definition 22 (Sequence of Neighborhoods):
Let e and f be two properly intersecting hyperedges of a hypergraph H. A sequence (e = e 1 , . . . ,e p = f), such that p > 2 is called a sequence of neighborhoods between e and f, if e ∩ f e k ∩e k+1 , for k = 1, . . . , p − 1 [21]. For example, (e 1 , e 4 , e 2 ) in Figure 1 is a sequence of neighborhoods connecting e 1 and e 2 .
Definition : Let e and f be two properly intersecting hyperedges of a hypergraph H, the two edges e and f are α-neighboring if there is no sequence of neighborhoods between them [21]. For example, the two intersecting edges e 1 and e 2 of the hypergraph in Figure 2 are α-neighboring, because there is no sequence of neighborhoods between e 1 and e 2 , while the two intersecting edges e 1 and e 2 of the hypergraph in Figure 1 are not α-neighboring, because (e 1 ,e 4 , e 2 ) is a sequence of neighborhoods between e 1 and e 2 .
Theorem 1: A hypergraph H is α-acyclic if and only if it does not contain an α−cycle [21].

B. THE GRAHAM (OR GYO) ALGORITHM
The GYO algorithm [18] was introduced to determine the α -acyclicity of a hypergraph. The algorithm is applied to a hypergraph H and conveys to the two rules: • Rule 1: If an edge e of H contains an isolated node, then delete this node from that edge.
These two rules are applied repeatedly until no rules can be applied by getting either the hypergraph H became empty and then H is acyclic, or the hypergraph H is not empty and then H is cyclic.
Theorem 2: A hypergraph H is α-acyclic if and only if H became empty after applying the two rules of the GYO algorithm to H [4]. For example, the hypergraphs of Figure 2 and Also, r is called globally consistent if there is a relation s over attributes V = R 1 ∪ . . . ∪R n such that r i [R i ] = s[R i ] for each i, i.e., r is consistent if there is a ''universal relation'' s such that each r i is a projection of s [4].
Definition 28 (Join Expression): The join operator is used to combine related tuples from two or more relations into a single tuple and is denoted by either r 1 . . . r n or r. A join expression θ of the relation state r is the set of all tuples t with attributes R 1 ∪ . . . ∪R n , such that t[R i ] is in r i , for each i. For example, if R 1 , R 2 , R 3 , and R 4 are among the relation schemes, then ((R 2 R 3 ) (R 1 R 4 )) is a join expression, which joining R 2 and R 3 relations, joining R 1 and R 4 relations, and then joining the two results together [4]. For example, consider the following join: (EMP_WORK EMP_WORK.DEPT=DEPT_INFO.DEPT DEPT_INFO) of the database of Figure 4. This combines each employee tuple with the tuple of the department where he/ she works for, into a single tuple.
Definition 29 (Sequential Join Expression): Let θ be a join expression over where R 1 , . . . ,R n is an ordering of the distinct members of R, then θ is called sequential. For example, if (. . . ((R 1 R 2 ) R 3 ) . . . R n ) is a sequential join expression, then firstly the relations R 1 and R 2 are joining, then the result is joining with the R 3 relation, and so on.
Notation: Let θ be a join expression whose relation schemes are all in R, and let r = {r 1 , . . . , r n } be a database over R. The relation that results by replacing each relation scheme R in θ by r where r ∈ r and r has attributes R will be denoted by θ (r) [4].
Definition 30 (Monotone Join Expression With Respect to r): Let θ be a join expression over relation scheme R = {R 1 , . . . ,R n } and let r = {r 1 , . . . ,r n } be a database over R, then θ is called monotone with respect to r, if for each subexpression (θ 1 θ 2 ) of θ , the relations θ 1 (r) and θ 2 (r) are consistent [4], i.e., no tuples are lost in taking the join of relations r and s if r and s are consistent.
Definition 31 (Monotone Join Expression): A join expression θ is called monotone if it is monotone with respect to every pairwise consistent database over R [4].
Theorem 3: Let r = {r 1 , . . . , r n } be a database over relation scheme R then the following are equivalent.

1) R is α-acyclic.
2) There is a monotone join expression over R. 3) There is a monotone, sequential join expression over R [1].

IV. MATERIALS AND METHODS
This section will be started by reformulating the GYO algorithm followed by introducing the definition of Quasi α-cycle VOLUME 8, 2020  14. Return H End denoted by α Q -cycle, and the definition of the set of α-nodes. Then, it will be proved that a hypergraph is acyclic if and only if it does not contain any α-nodes. Finally, a polynomial-time algorithm is proposed to detect the set of α-nodes based on the existence of Quasi α-cycle(s), in a given hypergraph H .

A. A NEW FORMALIZATION OF THE GYO ALGORITHM
The GYO algorithm can be reformulated using the star of each node in Rule 1 of the algorithm as given above. For example, the sequence (e 1 , e 2 , e 3 ) in Figure 2 is α Qcycle.
is called the set of α-nodes in the hypergraph H. Similarly, letḈ α Q −cycle = {C Q :C Q is an α Q -cycle ∈ H}, For a given cycle, C Q =(e 1 , . . . ,e q ), q > 2, the set is called the set of α Q -nodes of the α Q -cycle, C Q in a hypergraph H, and the set, 147564 VOLUME 8, 2020 Let e ∈ H , then from Remark 2, e contains at least two nodes, i.e., ∃v 1 and v 2 ∈ V (H ), such that {v 1 , v 2 } ⊂ e, and ∃e , e ∈ H , such that e ∩ e = ∅, and e ∩ e = ∅, where |e | > 1 and |e | > 1.
Setting h = e 4 , and g = e 5 , we get the sequence (e 1 , e 2 , e 3 , e 4 , e 5 ) with e 3 = e, ∀ 4 k=1 e k , e k+1 are properly intersecting, e 1 and e k are properly intersecting, and 1 ≤ a < b < 4, Continuing, in the same manner, we get that ∀e ∈H there exists a sequence (e 1 , . . . ,e k ), such that e i and e i+1 are properly intersecting, e 1 and e k are properly intersecting, and Proof: Let H be an α-acyclic, then by Theorem 1, H does not contain any α-cycles and e i ∩ e i+1 = ∅ and therefore H does not contain an α-cycle, therefore from Theorem 1, H is αacyclic.

C. THE PROPOSED ALGORITHM
In this subsection, a polynomial-time algorithm is proposed to detect the set of α-nodes based on the existence of Quasi αcycle(s), in a given hypergraph H. The algorithm returns the set of α-nodes of H if the input hypergraph is α-cyclic or ∅ if the input hypergraph is α-acyclic. Algorithm A 0 (H) uses the linear time GYO algorithm, therefore A 0 (H) is also a linear time algorithm.

V. DISCUSSION
Detecting the set of α-nodes is important to convert the cyclic hypergraph into acyclic one if the conversion is possible. More precisely, it is enough to detect the set of α-nodes to convert the cyclic hypergraph into acyclic one instead of detecting the α-cycle(s), that requires checking all permutations which is an NP problem.
Ghaleb et al. [22], proposed an algorithm, α Rem (H, n, K), to convert a cyclic hypergraph H, which corresponds to a database scheme R into an acyclic one. The algorithm returns an acyclic hypergraph if the conversion is possible or return failure otherwise. The input of this algorithm is: 1. An undirected hypergraph, H, corresponding to a cyclic database scheme in the third normal form. The third normal form guarantees that each non-key attribute A in the relation scheme R is fully functionally dependent on the primary key of R, and no non-key attribute of R is transitively dependent on the primary key. 2. The set of α-node(s), n, of the hypergraph H, which was returned from algorithm A 0 (H). The set of αnode(s), n, returns from algorithm A 0 (H) since the hypergraph H is cyclic. 3. A set of keys, K, which is the set of all keys of this database scheme R.
Algorithm α Rem (H, n, K) has two steps: VOLUME 8, 2020 1. The first step renames only the α-node(s) which represents the non-key attributes in the database scheme R according to the name of the table it belongs to. 2. After renaming the first non-key attribute, algorithm α Rem (H, n, K) calls algorithm A 0 (H) to determine whether the resulted hypergraph became acyclic or still cyclic. The algorithm returns the set of α-node(s) if it is cyclic. These two steps are applied repeatedly until the hypergraph becomes acyclic or all the α-nodes of H are keys (which cannot be renamed) and hence the algorithm returns failure.
For instance, consider the hypergraph corresponds to the database scheme, each node corresponds to an attribute in a table of the database which can be either a key or non-key in its table. It is desirable to study the possibility of renaming an attribute to another name.
For example, after applying the proposed algorithm A 0 (H) to the hypergraph of Figure 5, corresponding to the database scheme of Figure 4, the following set of α-nodes is obtained: This set has two key attributes EMP, DEPT, and a nonkey attribute CITY. Therefore, from the instance of the cyclic database of Figure 4, there are two distinct {EMP, CITY} relationships. The first one has the tuple (Lorin, Giza) that, relates an employee Lorin to the city Giza where she works. This relationship is obtained from joining the EMP_WORK relation with the DEPT_INFO relation. The second one has the tuple (Lorin, Cairo) that, relates an employee Lorin to the city Cairo where she lives in the EMP_HOME relation. Since the attribute CITY is a non-key, we can rename it in the DEPT_INFO relation scheme to be WORK_CITY, and the attribute CITY in the EMP_HOME relation scheme to be HOME_CITY. There is now a unique {EMP, HOME_CITY} relationship, which includes the tuple (Lorin, Cairo), and a unique {EMP, WORK_CITY} relationship, which includes the tuple (Lorin, Giza). This gives the acyclic hypergraph of Figure 7, which is corresponding to the database schemes of Figure 6.
It is important to note that, when relationships are unique, it preserves the semantics of the database scheme [23] and the system has a great deal of flexibility in optimizing how to find the result of any query. It is also following from Theorem 3 that, the system might be able to exploit the fact that whatever relations in the database are joined together, the join expression is guaranteed to be monotone, and therefore be efficient.
Note that, when a relational database scheme R is γacyclic then, there is a unique relationship among each set of attributes, for each consistent database over R [4], while in the case of α-acyclicity this uniqueness is not always guaranteed. That is because each subgraph of the γ -acyclic hypergraph is γ -acyclic, but not each subgraph of the α-acyclic hypergraph is α-acyclic.  The cyclic queries (that have a corresponding cyclic hypergraph) need exponential time [24] to be computed even for small outputs, (just one tuple or checking whether the answer of a query is non-empty). Therefore, many attempts have been made in the literature to specify how suitably transforming a cyclic CQ into an equivalent acyclic one [25]- [32]. In this framework, detecting the set of α-nodes will facilitate this transformation, or finding acyclic approximations [33] for such query, i.e., to find another approximate acyclic CQ' that will be much faster than CQ, and has an output that is close to the output of the original CQ on all databases.

VI. CONCLUSION AND FUTURE WORK
We introduced a new formalization of the GYO algorithm using the star of the hypergraph nodes. A new type of cyclicity in hypergraphs is also introduced. The new type is called the Quasi α-cycle. Moreover, the notion of the set of αnodes in hypergraphs is also introduced. The Quasi α-cycle and the set of α-nodes are based on the existence of αcycles. A polynomial-time algorithm is also proposed to detect the set of αnodes that is based on the existence of Quasi α-cycle(s), if it exists or otherwise claims that the input hypergraph is an α-acyclic. The detected set of α-nodes is important to study the possibility of converting the cyclic hypergraph into an acyclic one. More precisely, it is enough to detect the set of α-nodes to convert the cyclic hypergraph into acyclic one instead of detecting the α-cycle(s), that requires checking all permutations which is an NP problem. Acyclic databases are preferred due to the variety of desired computational properties it enjoys, such as making query optimization easier than in the case of cyclic database and might be recognized in linear time. The acyclic database and acyclic query enjoy time and/or space-efficient access paths for answering a query. Detecting the set of αnodes will facilitate transforming the cyclic query into an equivalent acyclic one or finding acyclic approximations for such a query. For future work, more study of various cases of the attributes (key and non-key) is needed to convert a given cyclic database schema into an acyclic one. Also, we will extend our work to introduce the set of βnodes and γnodes, which corresponds to β-cycle and γ -cycle respectively. FAYED F. M. GHALEB received the B.Sc. degree from Ain Shams University, in 1966, and the Ph.D. degree from Moscow University, in 1978. He is currently a Professor Emirate with Ain Shams University. His research interests include foundations of information science, databases, data mining, image processing, and bioinformatics. He is a member of the Mathematical Society Foundation Egyptian.
AZZA A. TAHA received the M.Sc. degree in computer science from Ain Shams University, Cairo, Egypt, in 1994, and the Ph.D. degree in computer science from the Graduate School of Informatics, Kyoto University, Japan, in 2001. She is currently working with Ain Shams University. Her research interests include foundation of information science, lambda calculus, type theory, natural language processing, and computational mathematics. She is a member of the Egyptian Mathematical Society.
MARYAM HAZMAN received the Ph.D. degree in computer science from the Faculty of Computers and Information, Cairo University, in 2009. She is currently a Senior Researcher with the Central Laboratory for Agricultural Experts Systems, Agricultural Research Centre, Egypt. Her research interests include data and text mining, multi-criteria decision making, knowledge engineering, knowledge discovery, and information management. VOLUME 8, 2020 MAHMOUD M. ABD ELLATIF is currently a Professor of information systems with the Faculty of Computers and Information, Helwan University, Egypt. He is also a Contract with the College of Business, University of Jeddah, Saudi Arabia. His research interests include information systems focus on using data mining techniques and semantic web technologies to apply PIECES framework on software projects.
MONA ABBASS received the bachelor's and master's degrees in computer science. She is currently pursuing the Ph.D. degree with the Department of Mathematics, Faculty of Science, Ain Shams University. She works as a mathematics and statistics specialist at the Central Lab for Agricultural Experts Systems, Agricultural Research Centre, Egypt. Her research interests include databases, image processing, and graph theory.