An Efficient Prefix-based Labeling Scheme for XML Dynamic Updates using Hexagonal Pattern

To improve XML query processing, it is necessary to label XML documents efficiently for the indexing process because it allows the structural relationships between the XML nodes to be preserved without having to access the original document. However, XML data on the Web is updated as time passes, which means that the dynamic updating of XML data is an issue that may need to be handled by a XML labeling scheme specifically designed for dynamic updates. Previous XML labeling schemes have limitations when updates take place. For example, a lot of node labels need to be relabeled, a lot of duplicate labels occur during this relabeling process, and the size and time costs of the updated labels are high. Therefore, this paper proposes an efficient prefix-based labeling scheme that uses a hexagonal pattern. The proposed labeling scheme has three main advantages: (i) it avoids the need for node relabeling when XML documents are updated at random locations, (ii) it avoids duplicated labels by creating a new label for every inserted node, and (iii) it reduces the size and time costs of the updated labels. The proposed scheme is evaluated against the three most recent prefix-based labeling schemes in terms of the size and time costs of the updated labels. In addition, the ability of the proposed labeling scheme to handle several updates (such as insertions) in XML documents is also evaluated. The evaluations show that the proposed labeling scheme outperforms previously developed prefix-based labeling schemes in terms of both size and time costs, particularly for large-scale XML datasets, resulting in improved query processing performance. Moreover, the proposed scheme efficiently supports frequent updates at arbitrary positions. The paper concludes with several suggestions for further research.


I. INTRODUCTION
For a long time, XML, or extensible markup language, has become the Web's de facto standard for data exchange and representation [1][2][3]. Indeed, XML has attracted broad acceptance and support from all of the major providers of databases, servers, and software tools [4,5]. XML is a hierarchical, self-describing semi-structured data format that employs an easy-to-write and easy-to-parse syntax to simplify data interchange across a variety of Web-based applications [5][6][7]. It provides communication across multiple computing systems, which was previously extremely difficult, if not impossible. Therefore, XML provides a global format for data interchange, regardless of the platforms and data models utilized by the applications.
Hence, the computing world now has a considered trying method for developing distributed system applications, as well as Data may be exchanged across nearly any hardware, software, or operating system because to XML's flexibility [2,4]. However, XML is used to represent a large amount of semi-structured data on the Internet. As a result, handling these data in terms of storage, updating, and querying has become a big challenge [2,[7][8][9][10].
One answer to this challenge is the labeling scheme, which assigns a number (label) to each node to represent the identifier or the absolute position of the node in the document [11][12][13][14][15][16]. It maintains the structural relationships among the nodes such as those between parent and child (P-C), ancestor and descendant (A-D), and siblings [1][2][3]. This improves XML data storage, updates, and queries. [7,17, VOLUME XX, 2017 18]. The adoption of a labeling scheme improves the efficiency of XML data retrieval and indexing because its usage results in a reduction in the query processing time [19,20]. In a labeling scheme, labels are used to preserve relationships between nodes in XML documents as well as to facilitate in querying. [16,[21][22][23][24]. Hence the efficient response to data queries depends largely on XML labeling schemes, keyword searches and XML data queries [25,26]. When a labeling scheme is used, an XML query is similar to a relational database (RDB) query, which also depends on the indexing of nodes [1,27,28]. In addition, an ordered XML tree and structural relationships are encoded into lowest size of labels by labeling schemes [1,29]. However, when compared to alternative XML query approaches, an effective XML labeling system requires less storage space and offers more flexibility [30][31][32][33]. As can be seen, the labeling scheme is crucial in improving the retrieval and updating of XML document contents. This is a more important issue when the contents of XML document are stored in a RDB. A RDB is composed of two-dimensional tables, where rows indicate an entity or person and columns indicate the entity or personal attributes (i.e., ID, name). The order of these rows and columns is not crucial in a RDB, but it is for text-centric documents such as e-books, ejournals, and e-mails. To address this issue, many labeling schemes have been developed. However, these schemes still have drawbacks in terms of their poor ability to efficiently support the dynamic updating of XML documents [1,3,29].
In regard to the support provided by labeling schemes for the dynamic updating of XML data, the challenges include the duplicating of labels and the relabeling of old nodes [1,11,[34][35][36][37]. Two other key challenges are the time needed by labeling schemes to relabel XML data and the resulting size of the labels generated [22,38,39]. A big label size might have an adverse influence on query and update performance [40,41]. There are two forms of skewed insertion that can occur when updating new XML nodes. The first is known as order skewed insertion, where nodes are inserted frequently before or after a certain node. The second is termed random skewed insertion, where nodes are inserted frequently between two random nodes. However, in another form, the inserting node is uniform, in which the node is added between random pairings of consecutive nodes [42,43]. Some labeling schemes have been evaluated only in respect of skewed insertions [1,44,45]. Generally, four types of node insertion are considered by labeling schemes: insertion before the leftmost sibling, insertion after the rightmost sibling, insertion between two siblings, and insertion of a child into a leaf node [34,39,42,[46][47][48][49][50].
However, regardless of the type of scheme, an efficient labeling scheme should possess four main properties. Firstly, its labeling must be dynamic, i.e., it must avoid relabeling XML nodes during XML documents updates. Second, the amount of time it takes to create labels initially and after dealing with skewed node insertions must be as short as possible. Third, the size of XML label must be less, i.e., the labeling scheme must produce compact labels that lead to reduced storage cost for both the initial labeling and after the skewed node instructions have been dealt with [39,45,67,68]. Fourthly, the labeling scheme must support all types of queries because labels play an important role in efficient query and answer processing in XML enabled databases and native XML databases [2,20,69,70].
In this paper, an efficient prefix-based labeling scheme is proposed that uses a hexagonal pattern to support dynamic updates in XML documents. The proposed labeling scheme is designed based on previous prefix-based labeling schemes for generating labels [1,11,13,20,30,34,46,57]. The proposed labeling scheme has three major advantages, and its novelty lies with: First, preventing node relabeling when XML documents are often updated at random locations. Second, avoiding duplicated labels by creating a new label for every inserted node. Third, reducing the size and time costs of the updated labels. The proposed labeling scheme is compared to various prefix-based labeling schemes in terms of the size of the produced labels, the time required for the labeling process, and its ability to process many cases of updates.
The remainder of this work is structured as follows: First, Section II presents the related work. Next, Section III presents the basic operators that support XML updates. Then, Section IV goes into detail about the proposed labeling scheme. Section V describes the experimental design. Section VI analyzes and discusses the experimental results in terms of label size and labeling process time. Finally, Section VII concludes with a summary and some recommendations for further work.

A. INTERVAL-BASED LABELING SCHEME
The interval-based labeling scheme (also known as the range-based labeling scheme) is characterized by incorporating <start, end> arguments into the labels [3,71]. Using those arguments, the start, and the end of the corresponding position of the nodes in the XML tree are determined [1,3,23]. Therefore, this labeling scheme efficiently preserves the P-C and A-D relationships among the nodes [1,3]. However, it is difficult to maintain the lowest common ancestor (LCA) [1]. In addition, it does not provide efficient support for dynamic XML documents [1,3,56]. This because any insertion of new nodes may incur the relabeling of numerous nodes, which increases the time taken to label the nodes [1,52]. Hence, using this type of scheme, the labeling time and size are both high, especially for larger XML documents [23,56].
Nevertheless, many interval-based labeling schemes for XML documents have been proposed. Zhang et al. [72] adopted a region-based labeling approach to build their labeling scheme, in which each node in the XML document is handed by two sets of values (Begin, End). The first value, "Begin", is the sequence number that is assigned to the node when it is visited for the first time by the parser, i.e., when the tag is opened. The second value, "End", is the sequence number that is assigned to the node when it is visited for the second time, i.e., when the tag is closed. For example: If the region of node u is inside the region of the other node v, then v is the ancestor of u. This property is termed the containment property [50,56]. By comparing the values, the structural and sequential relationships among the nodes can be determines by this labeling scheme in a rapid manner. However, when new nodes are inserted, other nodes must be relabeled, resulting in reduced updating efficiency [29]. Moreover, because this labeling scheme visits XML nodes twice to generate a label for each node, the labeling time and label size are both high, especially when the XML document grows in size [56]. Figure 1 below illustrates how, in the interval-based labeling scheme, the insertion of a node a incurs relabeling of other nodes (within the dotted box in the figure).
Li and Moon [52] presented another region-based labeling approach in which each XML document node is assigned to two sets of values (Order, Size). The first value, "Order", is the unique identifier of the XML document's preorder traversal. The second value, "Size", represents the number of descendant nodes. By utilizing the redundancy of "Size", This labeling scheme, rather than the second encoding of other nodes, can offer a partial solution to this issue of new nodes being inserted [29]. However, this scheme is not well suited to handling other information such as the sibling and level for retrieval [23].

B. PREFIX-BASED LABELING SCHEME
The prefix-based labeling scheme is similar to the Dewey decimal classification traditionally used by libraries [57]. This labeling scheme can maintain different kinds of structural relationships between nodes such as P-C, A-D, and siblings [73]. In prefix-based labeling schemes, the parent node's label is encoded as a prefix to the self-label node. These labels are created by depth-first tree traversal, and they are separated by a delimiter, either ',' or '.' [1,20,56,73]. A popular prefix-based labeling scheme known as Dewey encoding was proposed by [20], which is essentially a Dewey order labeling scheme.
The Dewey order labeling scheme is a combination of two order labeling schemes: the global scheme and the local scheme [20]. As shown in Figure 2(a) below, each node in the global order labeling scheme is allocated a number that specifies the node's absolute position in the document. Dynamic updating is extremely difficult with this scheme because all nodes following the inserted node must be relabeled and determining the P-C and A-D relationships is almost impossible. As illustrated in Figure 2(b) below, each node in the local order labeling scheme is allocated a number that denotes its relative position among its siblings. Under this scheme, a node's actual position inside the document is determined by combining its position with that of its ancestors as a path vector. Because only the siblings of the new node need to be relabeled, an update in the local labeling scheme has less overheads than an update in the global labeling scheme [37,74]. However, obtaining the P-C and A-D relationships remains extremely challenging.
On the other hand, in Dewey order labeling scheme, each node is assigned a label based on Dewey decimal classification. Each node is given a label that indicates the root-to-node pathway in the XML document. Consequently, each part of its path indicates the local order of an ancestor node, as illustrated in Figure 2(c) below. Label 1.2.5, for example, denotes node #5 in level 3, whose parent is node #2 in level 2, and whose ancestor is the document's root. Hence this scheme provides a straightforward way to extract node labels from ancestors. When inserting a new node, however, all the sibling nodes to the right of the inserted node, as well as their descendants, must be relabeled. Because of its simplicity, the Dewey order labeling scheme [20] has gained popularity among indexing schemes [75][76][77][78]. However, it is unsuitable with dynamic XML data. For example, to insert a new node a between two sibling nodes in an XML tree, as shown in Figure 2(c), the Dewey order labeling scheme necessitated the relabeling of all the right sibling nodes of node a as well as their descendants (within the dotted box part of the figure) [78]. However, due to its advantages, numerous labeling schemes have been proposed based on prefix-based labeling scheme. In [49], ORDPATH scheme was designed based prefix-based labeling scheme, which processes structural queries efficiently based on the preceding label and the current label of a node. For initial labeling, it assigns a distinct label for each node in the XML document based on the ancestor and parent labels, followed by an odd and positive number. Each integer is separated by "." to distinguish between the ancestor and the descendant labels. For updating or insertions into an existing document, it uses even and negative integer labels. The drawback of ORDPATH is that it only allows for a limited number of insertions, which increases the complexity of decoding for the inserted labels [30,34,78]. Therefore, overall, it has an adverse impact on the efficiency with which XML queries are processed [34,79].
On the other hand, two dynamic labeling schemes named dynamic Dewey encoding (DDE) and compact dynamic Dewey encoding (CDDE) were proposed by [34]. The CDDE scheme is an enhanced version of the DDE scheme, especially designed to improve the performance of DDE for insertions. DDE and CDDE have the same initial labels as the Dewey system proposed in [46] for the insertion between two consecutive siblings "1.1" and "1.3". However, then DDE and CDDE add together each Dewey component of these two siblings in order to label node a, which becomes "2.5", i.e., (1+1).(2+3), as shown in Figure  3(a) below. According to the results of a comparison with some previous works, DDE and CDDE show better performance in the case of XML node insertion. However, because the values of the DDE labels grow exponentially when nodes are frequently inserted between two consecutive siblings, as shown in Figure 3(b) below, the labels in DDE and CDDE are not compact, which commonly results in an extra storage cost [1,46,59]. According to Figure 3( [1,46,73,78]. For example, given two labels "2.5" and "2.3", while it is known that these are siblings in Dewey, this sibling relationship may not be maintained in DDE (see node a labeled 2.5 and 2.3, respectively, in Figure 3(a) and 3(b)). As a result, DDE requires an additional document identifier to distinguish labels across various XML documents.
Consequently, DDE is slower in determining structural relationships in XML query processing, and storage costs rise [1].
Bao, et al. [80] presented the Pathed-Dewey Order, a novel labeling scheme for capturing both path and position information of each node in an XML tree, as well as a double-layered mapping structure for storing documents with flexible schemas into NoSQL databases such as HBase tables. To reduce storage space and query response time, four optimization techniques were developed and implemented. Extensive experiments on two well-known XML benchmarks show that this scheme outperforms three stateof-the-art methods. However, this scheme did not support data updating and inserting operations, efficiency optimization of data mapping and storing, or Pathed-Dewey Order compression to store more encodings in memory. The dynamic float-point Dewey labeling (DFPD) scheme was proposed to improve the way dynamic queries in XML documents are handled [46]. This labeling scheme generates initial labels based on the Dewey order labeling scheme. It overcomes the limitation of previous schemes in respect of single insertions in XML data processing. It handles XML updates by considering the same three cases that are covered by the DDE scheme: insertion before the leftmost sibling, insertion after the rightmost sibling, and insertion of a child as a leaf node. However, because the DFPD scheme generates a float-point number for a label when it places the label between two following siblings, this increases the size of the label as well as the time required to calculate the label [1,78]. In addition, this scheme consumes a lot of numbers by assigning and calculating new labels for the inserted nodes without taking into account the reuse of the deleted labels [1].
The DFPD scheme is shown in Figure 4(a) below, where new nodes inserted in XML trees are shown by the dotted circles that contain alphabetical letters. The order of inserting nodes is based on the alphabetical order of these letters. According to the DFPD labeling scheme, node a is inserted between "1.2" and "1.3" and its label is 1.5/2, which is equal to 1.(2 + 3)/(1 + 1). Node b is inserted between "1.5/2" and "1.3" and its label is 1.8/3, which is equal to 1.(5 + 3)/(2 + 1). Node c is inserted between "1.5/2" and "1.8/3" and its label is 1.13/5, which is equal to 1.(5 + 8)/(2 + 3). Node d is inserted between "1.13/5" and "1.8/3" and its label is 1.21/8, which is equal to 1.(13 + 8)/(5 + 3). When insertion and deletion in the XML document occur alternately, the values of the DFPD components rapidly grow. As a result, the newly inserted labels in DFPD become less compact, resulting in an increased requirement for storage [1,78].
An enhanced version of the DFPD scheme, named the dynamic prefix-based labeling scheme (DPLS), was proposed by [1]. The initial labeling in the DPLS is based on the Dewey order labeling scheme [20]. Each label in the DPLS uses a series of components to describe an unique path from the root that is labeled with a non-zero number, which is represented as 1 until it reaches a node. Like the DFPD scheme, the DPLS uses a fraction fragment (i.e., a floating-point number) to assign the last part of the label. This is a drawback because it leads to limited accuracy [71,78]. However, the DPLS has some advantages when compared to the DFPD scheme: (i) it reduces the query costs, (ii) it avoids the relabeling process under various scenarios, and (iii) it can reuse deleted node labels. In DPLS, four cases of insertions are proposed, which are: insert leftmost (node a), insert rightmost (node b), insert leaf (node c), insert between siblings (node d, Node e, node f, node g). Figure 4(b) below illustrates the insertion process of the DPLS, where new nodes inserted into XML tree are presented by the dashed circles and lines. Node a is inserted to the leftmost (before the first child of the root), thus, it is labeled as 1.0 (subtracting 1 to the local order of 1.1). Node b is inserted to the rightmost (after the last child of the root), thus, it is labeled as 1.4 (adding 1 to the local order 1.3). Node c is inserted as the child of the node labeled 1.2, thus, it is labeled as 1.2.1 (adding a new component with the number 1). Nodes d, e, f, and g are inserted in alphabetical order between 1.2 and 1.3, thus, they are labeled as 1.5/2, 1.8/3, 1.13/5, and 1.21/8, respectively.

C. OTHER LABELING SCHEMES
A number of other labeling schemes of note have also been proposed. For example, the dynamic prefix encoding scheme based on fraction (DPESF) was proposed by [14] in an attempt to further improve performance. This scheme improves time and storage performance, while supporting the dynamic updating process. In general, the DPESF and the DPLS are similar except that the DPESF represents the numerators by using a collection of alphabet letters. This scheme involves a number sequence [0, 1, 2, 3, 4, 5, 6, 7, 8,9]   Ko and Lee [79] proposed an improved binary string labeling (IBSL) scheme. This scheme uses a string-based binary encoding approach, with the IBSL label consisting of a sequence of digits 0 and 1. When updating XML documents, the IBSL scheme avoids the requirement for relabeling by reusing the deleted label in the same position. However, in the case of frequently skewed insertions, it raises the cost of storage [1,78]. Furthermore, the IBSL scheme exclusively evaluates leaf node insertions. Ghaleb and Mohammed [13] proposed a dynamic XDAS, which is a combination of the original XDAS [81] and the IBSL scheme [79]. The XDAS generates labels based on a lexical order with masking techniques. For example, the label of node a is 1,001, where the first part denotes the level, followed by the self-label ID 001, which is called the first child. Node b, which is the sibling of node a, will have the label 1,010 and so on. On the other hand, the label of the first child node of node a is 2,01001, where 2 is the level, 01 is the self-label ID, and 001 is the label of its parent node. The XDAS significantly reduces the label size, but the updating time is not improved when compared with that of the IBSL scheme [ 1 , 38 ] . A quaternary encoding scheme (QED) for dynamic XML documents is proposed [47]. QED label is a sequence of numbers in the range "0", "1", "2", and "3", that are each stored with two bits. The number "0" is used as a separator in QED scheme, and the numbers "1", "2", and "3" are utilized in the QED label itself. In [20], a new labeling scheme, called Reuse is proposed based on the QED scheme. When XML updates occur, the reuse technique employs all deleted labels to handle the increasing label size and enhance query performance. In addition, SCOOTER is a dynamic encoding scheme proposed by O'Connor and Roantree [45] for handling updates in XML documents that employs the quaternary encoding used in QED and allows the reuse of shorter deleted node labels. However, when new nodes are skewed insertions, the lengths of QED labels extensively [34]. According to the QED, when new nodes are skewed insertions, the lengths of the IBSL scheme, enhanced binary string labeling (EBSL) and dynamic XDAS, as well as SCOOTER labels, would increase rapidly. Furthermore, the IBSL, EBSL, dynamic XDAS, and QED schemes do not allow for the reuse of deleted labels, which has a negative impact on both storage and updating performance [1,34].
Dhanalekshmi and Krishna [82] proposed a lexicographic-based persistent labelling (LPLX) scheme based on the prefix-based labeling scheme. LPLX assigns a distinct label to each node in the XML tree, which includes three sections (Prefix, Level, Selfcode). The label of the parent node is shown by the Prefix section. The Prefix section includes strings that are a collection of letters and digits. The Level section is an integer that shows the node's depth from the root of the XML tree, while the Selfcode section specifies the node's self-label. Each node in LPLX is labeled using a combination of digits (0-9) and uppercase letters (A-Z). LPLX supports dynamic XML document updates without relabeling existing nodes. Furthermore, structural relationships such as A-D, P-C, and siblings have been calculated across time. Furthermore, experiments were carried out on XML data sets with varying node counts and depths. However, the lengths of LPLX labels vary greatly in the case of frequently skewed insertions in large, deep XML documents. This is because the LPLX scheme adds a new letter to the end of the Selfcode section for insertions; for example, if the final letter in Selflabel is "Z", it adds a new letter beginning with "A", and so on. Moreover, the LPLX scheme has not been evaluated with many of the existing labeling schemes.
Another labeling scheme was proposed by [29], which was named the prime-based middle fraction labeling scheme (PMFLS). This scheme is built based on a set of algorithms that preserve structural relationships between nodes while also supporting for XML updates. The PMFLS is a hybrid labeling scheme, which means it combines a prefix-based and an interval-based (region-based) labeling scheme. Order-sensitive updates are also supported by the PMFLS without the need for recalculation. However, However, when XML data is updated frequently during insertions, the size of prefix labels may grow, producing overflow issues [78]. Another scheme, named the ReLab scheme, was designed and proposed based on the regionbased labeling scheme [23]. In regard to the amount of time required to create labels for each XML node, the experimental evaluation of the ReLab scheme showed that, it outperformed other region-based schemes like Dietz and region. The Relab scheme is used not just for XML node unique identification, but also for structural relationship purposes. It processes XML queries efficiently using the depth-first traversal approach. Another labeling scheme known as the ME scheme, is a robust hybrid scheme for dynamic updates in XML databases [53]. The ME scheme determines and maintains structural relationships among XML nodes, as well as supporting dynamic updates without the need to relabel nodes in the case of an update. The ReLab scheme's simplicity allows it to produce labels more rapidly than existing region-based schemes [65,78]. The drawback of ReLab is that it only supports static XML documents, not dynamic XML data [1,65,78].
The research [83] proposed a new prefix based labelling scheme which is compact, dynamic, maintaining the structural relationships among XML nodes for improving the query processing. The proposed scheme can handle both static as well as dynamic XML documents. The experiments are carried out to evaluate the proposed scheme in terms of its performance of storage requirement, structural relationship computation, and update processing. The result is compared with some of the existing labelling mechanisms.
From the above, many recent labeling schemes have been proposed to support XML updates efficiently. However, these labeling schemes have many limitations: (i) some of them do not support dynamic updates [20,23,25], (ii) some have been evaluated only on a limited number of insertions [49], (iii) some generate a large label size, which increases the storage and labeling time, especially when frequent and skewed insertions occur between two siblings [25,34,45,48], and (iv) others have a negative impact on the efficiency of XML query processing [1,45,49]. Table I Below summarizes the focus and contribution(s), and limitations of the existing labeling schemes thar are reviewed above.
In light of the above, this paper aims to propose a novel prefix-based labeling scheme, which prevents node relabeling when the XML documents are frequently updated at arbitrary positions, avoids duplicated labels by creating a new label for every inserted node, reduces the size and time costs of the updated labels, and supports dynamic updates. In addition, the proposed labeling scheme is implemented to maintain the structural information among the nodes based on four different insertions cases, namely, insertion before the leftmost sibling, insertion after the rightmost sibling, insertion between two siblings, and insertion of a child into a leaf node. Moreover, the proposed labeling scheme is also evaluated by comparing it with the most recent labeling schemes in terms of label size, labeling time, and its ability to handle several insertions in XML documents. Furthermore, the proposed labeling scheme is intended to provide random skewed updates with fast computations and has a simple implementation.  Zhang et al. [72] It determines the structural and sequential relationships among the nodes in a rapid manner.
The nodes need to relabel, resulting in reduced updating efficiency. The labeling time and label size are both high, especially when the XML document grows.
Li and Moon [52] It provides a partial solution to the issue of new nodes being inserted.
It is not well suited to handling other information such as the sibling and level for retrieval.
Dewey Order [20] It has become popular among indexing schemes because of its simplicity.
Dynamic updating is extremely difficult. All nodes need to relabel. It has negative performance in obtaining the P-C and A-D relationships. it is unsuitable with dynamic XML data ORDPATH [49] It processes structural queries efficiently based on the preceding label and the current label of a node.
It only allows for a certain number of insertions. It has a negative effect on the efficiency of XML query processing.

DDE & CDDE [34]
They show better performance comparing with some previous works in the case of XML node insertion.
The labels in DDE and CDDE are not compact, which effect on storage cost. They are not appropriate for efficiently preserving structural relationships across numerous XML documents.
DFPD [46] It overcomes the limitation of previous schemes in respect of single insertions in XML data processing. It handles XML updates.
It consumes a lot of numbers by assigning and calculating new labels for the inserted nodes without taking into account the reuse of the deleted labels. The newly inserted labels in DFPD become less compact, resulting in high storage cost.

DPLS [1]
It reduces the query costs. It avoids the relabeling process under various scenarios. It can reuse deleted node labels. It outperforms several state-of-art labeling schemes in terms of label size and updating time. This is because the deleted labels are reused for encoding new inserted nodes.
It uses a fraction fragment (i.e., a floating-point number) to represent the last component of the label, which leads to limited accuracy. As compared to DFPD, label size and update time of DPLS were slightly improved.
DPESF [14] It improves time and storage performance, while supporting the dynamic updating process.
It is not clear whether the DPESF achieves improvements in the label size and the updating time when compared with the ORDPATH scheme.
IBSL & EBSL [79] They avoid the requirement for relabeling by reusing the deleted label in the same position when updating They have high storage cost in the case of frequently skewed insertions. They exclusively evaluate leaf node insertions. The lengths of the IBSL labels would increase rapidly when new nodes are XML documents. skewed insertions. They do not allow for the reuse of deleted labels, which has a negative impact on both storage and updating performance XDAS [13] The XDAS generates labels based on a lexical order with masking techniques. It significantly reduces the label size.
The updating time is not improved when compared with that of the IBSL scheme. The lengths dynamic XDAS labels would increase rapidly when new nodes are skewed insertions. It does not allow for the reuse of deleted labels, which has a negative impact on both storage and updating performance QED [47] QED label is a sequence of numbers in the range "0", "1", "2", and "3", that are each stored with two bits.
The lengths of QED labels extensively when new nodes are skewed insertions. It does not allow for the reuse of deleted labels, which has a negative impact on both storage and updating performance SCOOTER [45] It handles updates in XML documents and allows the reuse of shorter deleted node labels.
The lengths of SCOOTER labels would increase rapidly when new nodes are skewed insertions. It does not allow for the reuse of deleted labels, which has a negative impact on both storage and updating performance LPLX [82] It supports dynamic update of XML documents without relabeling the existing nodes. The structural relationships such as P-C, A-D, and siblings have computed at constant time The lengths of LPLX labels extensively in the case of frequently skewed insertions in large XML documents with high depth. It has not been evaluated with many of the existing labelling schemes.
PMFLS [29] It is built based on a set of algorithms that preserve structural relationships between nodes. It supports XML updates.
The size of PMFLS labels may grow, producing overflow issues, when XML data is updated frequently during insertions.
ReLab [23] It outperforms other region-based schemes like Dietz and region. It maintains the structural relationship. It processes XML queries efficiently using the depth-first traversal approach.
It only supports static XML documents, not dynamic XML data.

III. XML LABEL UPDATES
The issues associated with XML label updates are presented in this section. The main question that needs to be addressed is as follows: How can the labeling scheme maintain the structural relationships among nodes when XML updates occur? Updates on XML trees can be defined by using node deletion and insertion operations. As regards node insertions, the XML data as an ordered and directed tree is defined and called X, and four primitive insertions are determined to keep the structural relationships among the nodes with aim of avoiding a relabeling cost and reducing the size and time costs of the updated labels. These primitive insertions are insertion before the leftmost sibling, insertion after the rightmost sibling, insertion between two siblings, and insertion of a child into a leaf node. So, let M, P, and N be three nodes, X be an order and directed XML tree, L be a label for an inserted node c which is the parentLabel(c) || selfLabel(c), and the primitive node insertions be given as follows: • insertSibling(X, M, P, N, L): insertion of a new node P between two sibling nodes M and N in X, and its label L which contains parentLabel(P) || selfLabel(P).
The inserted nodes in Figure 5 are shown by dotted circles, which represent basic instances of primitive node insertions in a simple tree. When deleting a node, consider the following four primitive deletions: (i) deletion of a node to the left of all existing children of a node, (ii) deletion of a node to the right of all existing children of a node, (iii) deletion of a leaf node, and (iv) deletion of a middle node between two siblings. Obviously, these four node deletions will not generate new nodes into XML documents; therefore, no new labels for the deleted nodes are required. Furthermore, because the labels before the deletions are distinct, the deletion has no effect on the relationship (or order) of other labels, and the labels after the deletions remain distinct. In other words, utilizing the encoding scheme, the structural relationship between nodes can still be identified by directly comparing their labels. As a result, deletions have no effect on the labeling scheme's effectiveness [1]. As deletions do not affect the effectiveness of the labeling scheme, this means that the real challenge that needs to be overcome by the proposed labeling scheme is the efficient handling of insertions. Therefore, the focus of this study shifted from finding a way to deal with the processing of updates to instead explore how to effectively manage insertions.

IV. PROPOSED LABELING SCHEME
In this section, we propose an efficient prefix-based labeling scheme to improve the label size and labeling time when updates occur in XML trees. The proposed labeling scheme is designed based on the hexagonal pattern theorem [84] for insertions when XML trees are updated. The Hexagonal numbers are those indicating a collection of objects which can be arranged in the form of a regular hexagon (i.e., all the sides are the same length, and all the vertices have the same angle). The nth hexagonal number h(n) is the number (n) of vertices in the form of regular hexagon [85]. The nth hexagonal number's formula is h(n) = n(2n -1) where n = 1, 2, 3, 4, … etc.
According to Equation (1)  In the proposed scheme, the initial labeling mechanism for the nodes generated is based on the hexagonal pattern approach. Each node has a nodeID, which contains a path (root-to-last component) [25]. The selfLabel and prefixLabel indicate to last and prefix components for that node, respectively. The proposed scheme based on the hexagonal pattern can be represented by (n, h), where n is synchronized with the prefix-based labeling scheme (i.e., the original number) and h is the hexagonal function for generating the hexagonal number. An overview of the architecture framework for proposed labeling scheme is shown in Figure  6 below. The initial labeling and the handling of updates in the proposed scheme are described in the following two subsections.

A. INITIAL LABELING
The initial labeling in the proposed scheme is based on the Dewey ordering labeling scheme. Each label is a series of numbers, which are separated by dots and represent a distinct path from a non-zero number labeled root to a node itself. However, the proposed labeling scheme differs from the Dewey order labeling scheme in that the nodes in the proposed scheme are labeled with non-zero numbers and non-hexagonal numbers. As shown in Figure 7 below, the root node is given the label value "1" and is known the parent label. A series of numbers separated by dots are known the child node labels. For example, assume that node M has the label m1.m2. … .mx in the XML tree, thus m1.m2. ... .mx.i. are assigned the labels of its children, whereas, i in the initial label is non-hexagonal number, so we cannot begin with i = 1. The label value "1.2" is assigned to the first child of the root node. The label values "1.3", "1.4", "1.5", and "1.7" are assigned to the next children, respectively. The label value "1.3.2" is assigned to the first child of node "1.3" and so on. The aim is to preserve the hexagonal numbers so that they can be used to support XML updates, especially when new nodes are inserted. Hence, the relabeling of nodes is avoided.

Definition 1.
Suppose there are two node labels, M: m1.m2.m3. … .mx and N: n1.n2.n3. … .ny, and a subtree containing the node N has M node as its root. If there is a connected path of nodes from the root M to the node N and node M's self-label is mx, and the node N's self-label = ny, where x < y, m1 = n1, m2 = n2, and m3 = n3, then node M is an ancestor of node N and node N is a descendant of M.

Definition 2.
Suppose there are two node labels, M: m1.m2.m3. … .mx and N: n1.n2.n3. … .ny, and node M is a parent of node N; if M and N are directly connected in an XML tree and M appears exactly one level above N (i.e., m = n -1), then M is a parent of node N and N is a child of M. Definition 3. Suppose there are two sibling node labels, M: m1.m2.m3. … .mx and N: n1.n2.n3. … .ny. This indicates that both nodes are on the same level and have the same parent in an XML tree (i.e., m1 = n1, m2 = n2, m3 = n3, …, mx != my, and x = y). In an ordered XML tree, if node M appears to the left of N, M is considered a pre-order sibling to node N, whereas N is called a post-order sibling to node M.
In the proposed scheme, Algorithm 1 is employed to produce the initial labels. The digit '1' is assigned to the root label in the first two lines. The digit '1.2' is allocated to the first child label, which is preceded by a sequence of prefixLabel(M) || selfLabel(M). Because '1' is a hexagonal number, selfLabel(M) cannot begin with '1'. By incrementing the label, the labels of the remaining child nodes are generated. The scheme avoids utilizing hexagonal numbers at this stage.

B. HANDLING XML UPDATES
This section addresses the issue of dealing with XML updates, especially with regard to a dynamic labeling scheme that handles insertions without relabeling existing nodes. In XML updates, the proposed labeling scheme completely avoids relabeling existing nodes. Labeling schemes generally consider four cases of node insertion [34,46,78]. These four cases (see Section III) are considered in the proposed labeling scheme, as follows: • insertLeftmost(X, M, N, L), which represents inserting a new node M to the left and before the first sibling node N in X, and its label L which contains parentLabel(M) || selfLabel(M). • insertRightmost (X, N, M, L), which represents inserting a new node M to the right and after the last sibling node N in X, and its label L which contains parentLabel(M) || selfLabel(M). • insertLeaf (X, M, N, L), which represents inserting a new node M as a child of a node N in X, and its label L which contains parentLabel(M) || selfLabel(M). • insertSibling(X, M, P, N, L), which represents inserting a new node P between two sibling nodes M and N in X, and its label L which contains parentLabel(P) || selfLabel(P).
The implementations of the above four cases of insertion are done by using the following algorithms: Let M and N be two nodes, where node M is labeled as (m1.m2. … .mx) and node N is labeled as (n1.n2. … .ny). Algorithm 2 is used for the insertion of a new node (such as M) before the leftmost sibling (such as N). This done by reducing the self-label of the leftmost sibling N by 1 as shown in line 02, and then by applying the following Equation (2) to create the self-label of a new node M. If the self-label of M is the same as the result of the equation, then it is a hexagonal number, so reduce the self-label of N by the value of 1 again. The parent label of M is linked with the new self-label of M to generate a new label. For instance, the leftmost sibling node is labeled as m1.m2. . . .mx, and the generated new label is m1.m2. … .(mx-1). However, if (mx-1) is a hexagonal number, then the generated new label is m1.m2. … .(mx-2).
Algorithm 3 is used for the insertion of a new node (such as M) after the rightmost sibling (such as N). This done by incrementing the self-label of the rightmost sibling N by 1 as shown in line 02, and then applying Equation (2) above to create the self-label of a new node M. If the self-label of M is the same as the result of the equation, then it is a hexagonal number, so increment the self-label of N by the value of 1 again. The parent label of M is linked with the new self-label of M to generate a new label. For instance, the rightmost sibling node is labeled as m1.m2. . . .mx, and the generated new label is m1.m2. … .(mx+1). However, if (mx+1) is a hexagonal number, then the generated new label is m1.m2. … .(mx+2).
Algorithm 4 is used to insert a child into a leaf node, such as M, where the new label is produced by linking the parent label of M with the number "2". For instance, the generated child node M is labeled as m1.m2. … .mx.2.
Algorithms 5 is used to handle four sub-cases of inserting node P between two siblings M and N. Firstly, if the self-label of a node M and the self-label of a node N are non-hexagonal numbers, then the self-label of a node P is created by adding the self-label of M to the self-label of N and then applying a hexagonal conversion function by using Algorithm 6 and Equation (1) to obtain the self-label of P. The parent label of P is linked with the new self-label of P to generate a new label. For instance, suppose we have two siblings, the left sibling is labeled as M: m1.m2. … .mx and the right sibling is labeled as N:n1.n2. . . .ny, where M = N, and mx, ny are non-hexagonal numbers; Algorithm 6 is applied by using a hexagonal conversion function for (mx+ ny) in order to obtain the self-label of P and generates the new label as the parent label and concatenated with the new self-label.
Secondly, if the self-label of a node M and the self-label of a node N are hexagonal numbers, then the original conversion function is applied by using Algorithm 7 and Equation (2) to obtain the self-labels of nodes M and N, and these two self-labels are added together to obtain a new value. This value (i.e., the summation of the self-labels of a node M and a node N) is converted by using the hexagonal conversion function in Algorithm 6 and Equation (1) to obtain the self-label of a node P. The new label is produced by linking the parent label of P with the new self-label of a node P. For instance, suppose we have two siblings, the left sibling is labeled as M: m1.m2. … .mx and the right sibling is labeled as N:n1.n2. . . .ny, where M = N, and mx, ny are hexagonal numbers; Algorithm 7 is applied by using an original conversion function for mx and my, and then applying the hexagonal conversion function for the original values mx'+ ny' in order to get the new self-label. Then, the concatenation between the prefix label for the parent and the new self-label is assigned to the new label.
Thirdly, if the self-label of a node M is a hexagonal number and the self-label of a node N is non-hexagonal number, the original conversion function for the self-label of M is applied by using Algorithm 7 and Equation (2) and the two values are added together. Then, in order to produce a new self-label, the hexagonal conversion function is used. The parent label of P is linked with the new self-label of P to generate a new label. For instance, suppose we have two siblings, the left sibling is labeled as M: m1.m2. … .mx and the right sibling is labeled as N:n1.n2. . . .ny, where M = N, mx is a hexagonal number, and ny is a non-hexagonal number; the algorithm 7 is applied by using the original conversion function for mx and then the hexagonal conversion function is also applied for the original values mx'+ ny in order to obtain the new self-label. Then, the concatenation between the parent's prefix label and the new self-label is assigned to the new label Fourthly, if the self-label of a node M is a non-hexagonal number and the self-label of a node N is a hexagonal number, the original conversion function for the self-label of N is applied by using Algorithm 7 and Equation (2) and the two values are added together. Then, in order to produce a new self-label, the hexagonal conversion function is used. The new label is produced by linking the parent label of P with the new self-label of P. For example, given two siblings, the left sibling is labeled as M: m1.m2. … .mx and the right sibling is labeled as N:n1.n2. . . .ny, where M = N, mx is a non-hexagonal number, and ny is a hexagonal number; the algorithm 7 is applied by using the original conversion function for ny and then the hexagonal conversion function is also applied for the original values mx+ ny' in order to get the new self-label. Then, the new label is produced by concatenating the parent's prefix label with the new self-label. Figure 8 below illustrates all four cases of node insertion in the proposed approach. The new nodes inserted into the XML trees are represented by dashed circles and lines. The alphabetical order of the inserted nodes is represented by the letters inside the dashed circles. Node a is inserted before the first leftmost child; the leftmost sibling is labeled as 1.2, and the newly generated label is 1.0. Node b is inserted between two non-hexagonal numbers, label nodes 1.2 and 1.4, and its 1.hexagonal (2 + 4), which is 1.66. Node c is inserted between hexagonal and non-hexagonal numbers, which are 1.66 and 1.4, respectively. So, the original number is returned, which is 6 and its 1.hexagonal (6 + 4) is 1.190. Node d is inserted between two hexagonal numbers, 1.66 and 1.190, and the original numbers, 6 and 10, are returned; its 1.hexagonal (6 + 10) is 1.496. Node e is inserted after the rightmost sibling and labeled as 1.4; the produced new label is 1.5 since 1.(4+1) is not a hexagonal number and so does not need to be increased again by 1. Node f is inserted as a child of label node 1.2, and the produced new label is 1.2.2.

A. EXPERIMENTAL SETUP
A set of experiments were carried out in order to evaluate the proposed labeling scheme, which involved comparing the scheme with the dynamic Dewey encoding (DDE) scheme, dynamic prefix labeling scheme (DPLS), and the ORDPATH scheme. These conducted experiments were on the initial labeling as well as the handling of XML updates (in the case of insertions). The comparisons were made in terms of labeling time in milliseconds and the label size in Kbytes. Java programming language was used to code the compared schemes and the proposed labeling scheme. The labeling schemes were run by using the same experimental platform: Eclipse IDE 4.18.0 with a Java platform (JDK 8) as the Java interface. All the experiments were run on the same machine with an Intel Core i7 processor, 8 GB of main memory and a Windows 10 64-bit operating system.

B. Experimental DATASETS
The experiments were conducted on different XML datasets [86] in order to test the schemes on a variety of XML trees that vary in terms of number of nodes, file size and depth. A variety of XML datasets that reflect their scalability in evaluation and results is considered important. In this study, real-life XML datasets were employed which are as follows: Digital Bibliography Library Project (DBLP), XMark, TreeBank, NASA, Sigmod, Ebay, and UWM.
The DBLP database is a massive XML file that contains data about computer science journals, conferences, series, and books. Many XML database systems employ the DBLP dataset, which was chosen for this study because it can supply a wide range of sibling nodes with a maximum breadth of 328,858 [87]. The XMark dataset is well-known as the most prominent XML data management benchmark [88,89]. The XMark is a scalable document database with a deep recursive ancestor structure and a large file size of 111 MB. Moreover, the total number of descendant nodes are 25,500 with depth of 12 and with a varying breadth at each level. The TreeBank dataset was generated by the University of Pennsylvania's Department of Computer and Information Science, and it has a size of 82 MB and a maximum breadth of 144,493 [86]. The NASA dataset, which was generated by a NASA XML Project from a flat file format, provides reliable astronomical data. NASA's XML file is 23 MB in size and has a maximum width of 80,396. The other three datasets are the Sigmod dataset, which is commonly used to analyze and examine small XML documents [86]; the Ebay dataset, which contains auction data converted to XML from Web sources, and the UWM dataset, which stores data about university courses derived from university websites [86]. These XML datasets are summarized in the following Table  II.

VI. EXPERIMENTAL RESULTS AND DISCUSSION
Five experiments were carried out to evaluate the proposed labeling scheme in order to compare its performance against that of the DDE, DPLS and ORDPATH schemes. The first experiment was carried out to measure the initial labeling process of each scheme in terms of the labeling time and label size. This process was individually executed on the datasets using the various schemes many times because it has been recommended that the number of runs had to total at least 10 for a certain scheme [90,91]. However, it has also been suggested that there should be at least 30 runs for more accurate results [92]. In these experiments, the first three runs were excluded to make the results more reliable and accurate as well as to clear cache memory.
The first experiment was conducted to assess the initial labeling time and label size. As shown in Figure 9 below, the proposed scheme outperformed the other schemes (i.e., DDE, DPLS, and ORDPATH) in all datasets (see Table II above) because the time taken to label the documents was less and more efficient when compared with the other schemes, particularly when the XML document size became larger and had greater depth. As mentioned above, it is necessary to have an adequate number of runs to gain significant results, therefore all schemes were run 13 times with the first three runs ignored to achieve accuracy in the results for the time needed to generate initial labels in the different datasets. In terms of the initial labeling size, the results in Figure  10 show that there was no substantial difference between the schemes. This because DPLS and DDE utilize the same scheme for producing the initial labels, despite the proposed scheme avoids the hexagonal labels at this stage. In addition, the expectation of growth in the label size would be high, particularly, when XML document become bigger in size. As a result, the size of loading the initial labels for the proposed scheme, the DPLS, and DDE schemes is nearly the same for the DBLP, XMark, Treebank, NASA, UWM, Sigmod, and Ebay datasets, as shown in Figure 10. In addition, the results in the figure indicate that the initial label size produced by the ORDPATH scheme is the highest as compared to the remaining schemes.
Because the effectiveness of the dynamic labeling schemes arise during consecutive sibling insertions [34,42], the second experiment was conducted to evaluate the random skewed insertions that handle new nodes between two consecutive siblings' nodes on a frequent basis. This experiment was conducted to compare the proposed scheme with the DDE, DPLS, and ORDPATH schemes in terms of labeling time and label size because all the compared schemes possess a powerful labeling dynamic when applied to XML data. According to the labeling time measurement results, the proposed scheme achieved the fastest labeling time when the random skewed node insertions are handled as compared with the other schemes, as shown in Figure 11 above. A lot of insertions between two siblings were supported efficiently by the proposed scheme. Therefore, the proposed scheme's reliability was reasonable. The results show that DPLS had slowest labeling time when the random skewed node insertions are handled because it used a fraction fragment (i.e., a floating-point number) to represent the last component of the label for each insertion. As regards the label size, the proposed scheme also outperformed the other schemes, especially when a large number of skewed nodes has been updated randomly, as shown in Figure 12 above. The results show that the label size of newly inserted nodes in DPLS was highest as compared the remaining schemes.
The remaining three experiments were conducted to evaluate the performance of the proposed labeling scheme compared with other schemes when handling XML updates based on three different cases: the insertion of nodes before the leftmost sibling, the insertion of nodes after the rightmost sibling, and the insertion of leaf nodes. In these experiments, the labeling time and label size were measured when the number of insertions increased. Each experiment was individually executed many times, and the number of runs had to total at least 10 for a certain scheme [90,91]. In the current study, the first three runs were excluded to ensure the results be more reliable and accurate, as well as to clear cache memory.  In respect of labeling time, the results in Figures 13, 15, and 17 show that the proposed scheme was able to outperform the other schemes (i.e., DDE, DPLS, and ORDPATH) in handling XML updates (i.e., the increase in the number of insertions) for the three cases: the insertion of nodes before the leftmost sibling, the insertion of nodes after the rightmost sibling, and the insertion of leaf nodes, especially when the number of insertions increased in large XML document with high depth. The results also show that the other schemes (i.e., DDE, DPLS, and ORDPATH) had negative performance (i.e., highest labeling time) in handling large XML updates (i.e., the number of insertions increased) for those three cases. This because that they only allowed for a certain number of insertions and did not support dynamic updates efficiently [23,25,83].
Regarding to label size, the results in Figures 14 and 16 show that the proposed scheme was able to outperform the other schemes in handling XML updates (i.e., the number of insertions increased) for the two cases: the insertion of nodes before the leftmost sibling and the insertion of nodes after the rightmost sibling, respectively. It generated more compact labels every time as compared to the other schemes, leading to reduced storage costs, especially, when the number of insertions increased in large XML document with high depth. In addition, the results in the figure indicate that the label size produced by the DDE and DPLS schemes was the same because they used the same scheme in updating. Moreover, the results also show that the label size produced by ORDPATH was the highest as compared to the remaining schemes because it used even and negative integer labels for the inserted nodes, which led high label size when the number of insertions increased in large XML document with high depth. However, the results in Figure 18 shows that there was a negligible difference in how the labeling schemes handled XML updates (i.e., the increase in the number of insertions) in case of the insertion of leaf nodes. This because that they had the same scheme when labelling new leaf nodes.

VII. CONCLUSION
In this study, an efficient prefix-based labeling scheme that uses a hexagonal pattern to support dynamic updates in XML documents was proposed. The proposed labeling scheme is based on the prefix-based labeling scheme. According to the experiment results, the scheme is considered to improve on the performance of the DDE scheme and the DPLS. The proposed scheme prevents node relabeling when the XML documents are updated at random locations, avoids duplicated labels by creating new labels for every inserted node, and reduces the size and time costs for the updated labels. In addition, the proposed labeling scheme was evaluated against other prefix-based labeling schemes in terms of the label size produced and time taken for the labeling process and in terms of its ability to handle different types of updates. Five experiments were conducted to compare the proposed scheme's effectiveness and scalability to handle initial labeling and XML updates (insertions) with other schemes, namely, the DDE scheme, DPLS, and ORDPATH scheme.
The first experiment was carried out to assess labeling time and label size for different types and sizes of XML dataset. The experiment indicated that the proposed scheme outperformed other schemes in most of the XML datasets because it took less time to label the documents and was more efficient as compared with the other schemes, particularly when the XML document size became larger and had greater depth, as in the DBLP, XMark and Treebank datasets. The experiment results also indicated that the size of the initial labels produced by the proposed scheme and by the DDE scheme and DPLS were almost the same. This because the DDE scheme and DPLS use the same procedure to produce the initial labels, and the proposed scheme avoids using hexagonal numbers during the initial labeling stage.
The second experiment was carried out to measure the random skewed insertions between two consecutive siblings' nodes. The results of this experiment indicated that the proposed scheme outperformed the other schemes in terms of labeling time and label size. The proposed scheme produced labels with lesser size every time as compared to the other schemes, leading to reduced storage costs. It also outperformed when a large number of randomly skewed nodes was updated.
The remaining three experiments were carried out to evaluate the performance of the proposed labeling scheme as compared to that of the other schemes when handling XML updates based on three different cases: the insertion of nodes before the leftmost sibling, the insertion of nodes after the rightmost sibling, and the insertion of leaf nodes. In respect of labeling time, the experiments indicated that the proposed scheme outperformed the other schemes in handling XML updates (i.e., the increase in the number of insertions) for the three cases. As regards the label size, the experiments showed that the proposed scheme was able to outperform the other schemes in handling XML updates (i.e., the increase in the number of insertions) for the two cases: the insertion of nodes before the leftmost sibling and the insertion of nodes after the rightmost sibling, respectively. It generated more compact labels every time as compared to the other schemes, leading to reduced storage costs, especially, when the XML document size became larger and had greater depth. However, the results also show that there was a negligible difference in how the labeling schemes handled XML updates (i.e., the increase in the number of insertions) in case of the insertion of leaf nodes.
The contributions of the research presented in this paper can be summarized as follows: 1) The existing and different types of XML labeling schemes were reviewed and discussed such as Interval-based labeling schemes, Prefix-based labeling scheme, and other labeling schemes in handling XML updates.
2) The limitations of the existing XML labeling schemes were presented and summarized in terms of labeling time and size in handling XML updates. 3) An efficient prefix-based labeling scheme that uses a hexagonal pattern to support dynamic updates in XML documents was proposed. The advantages of the proposed labeling scheme are: (i) it avoids node relabeling when the XML documents are updated at random locations, (ii) it avoids duplicated labels by creating a new label for every inserted node, (iii) reduces the size and time costs of the updated labels, and (iv) it supports dynamic updates. 4) The proposed labeling scheme was implemented to maintain the structural information among the nodes based on four different insertions cases, namely, insertion before the leftmost sibling, insertion after the rightmost sibling, insertion between two siblings, and insertion of a child into a leaf node. 5) The proposed labeling scheme was evaluated comparing with the existing labeling schemes in terms of label size, labeling time, and its ability to handle several insertions in XML documents based on four different insertions cases.
Based on the findings of this study, it is hoped that the labeling scheme proposed in this paper will contribute to the ability of commercial organizations to exchange and manage their data efficiently on the Web in terms of both label size and labeling time. The scheme also serves as an introductory basis upon which researchers can build to further enhance the efficiency of labeling schemes to support dynamic XML updates and extensions into new research areas. In our future work, we aim to research the issue of reusing deleted labels. Another possible dimension of future work is to consider the syntactic and semantical relatedness of node labels in designing new labelling scheme.