Reducing Multiple Occurrences of Meta-Mark Selection in Relational Data Watermarking

Contrary to multimedia data watermarking approaches, it is not recommended that relational data watermarking techniques consider sequential selection for marks in the watermark and embedding locations in the protected digital asset. Indeed, considering the database relations’ elements, i.e., tuples and attributes, when watermarking techniques are based on sequential processes, watermark detection can be easily compromised by performing subset reverse order attacks. As a result, attackers can obtain owner evidence-free high-quality data since no data modifications for mark removing are required for the malicious operation to succeed. A standard solution to this problem has been pseudo-random selection, which often leads to choosing the same marks multiple times, and ignoring others, thus compromising the embedding of the entire watermark. This work proposes an engine that contributes to controlling marks’ recurrent selection, allowing marks excluded by previous approaches to be considered and detected with 100% accuracy. The experiments performed show a dramatic improvement of the embedded watermark quality when the proposed engine is included in watermarking techniques’ architecture. They also provide evidence that this proposal leads to higher resilience against common malicious operations such as subset and superset attacks.


I. INTRODUCTION
Digital watermarking has become popular for providing evidence of ownership [1]- [3] or data tampering [4], [5], among others, and it allows the benefits of data spreading on the internet while protecting intellectual property. Firstly proposed as a field of information hiding techniques to secure multimedia data, watermarking approaches were also defined for relational data, considering the growth of web-based services with the need to store and share countless information. Two main processes characterize watermarking techniques [6]: the watermark embedding and its verification. Watermark verification is performed when suspicious of piracy, false ownership claims over protected data, and others.
The design and implementation of the watermark insertion and extraction actions highly depend on the nature of the protected data. For instance, multimedia data are featured by a sequence of bits that offer high watermark embedding The associate editor coordinating the review of this manuscript and approving it for publication was Ahmed Farouk . coverage. Contrary, relational databases do not present data redundancy, thus decreasing the chances for watermark coverage on database relations. In terms of updates frequency, multimedia data are not meant to suffer highly-frequent updates, except for special applications. For this reason, any hidden information, e.g., the watermark, on multimedia objects is expected to endure. Instead, relational data are updated daily. Operations such as insertion, deletion, and updates of tuples cause a degradation of the watermark signal in time. Also, multimedia data are meant to be perceived by human systems, such as the visual (HVS) or the auditory (HAS). The limitations of those systems offer a high coverage to hide information, e.g., changes caused by embedding marks in frequencies outside the 20Hz -20 kHz interval of an audio file are overlooked by HAS. Differently, data stored in relational databases are meant to be interpreted with the help of programming layers oriented to implement business rules and deploy the information generated in a graphical user interface (GUI) and printed in digital reports. Therefore, any distortion caused in relational data will contribute to decision-making based on distorted information. Finally, the binary stream of multimedia data cannot be resorted without affecting the perception of the multimedia object, which allows the use of sequential watermark embedding, i.e., the selection of positions to embed the marks following the order of the bits in the binary stream that constitutes the digital asset, over the spatial domain of multimedia objects. In comparison, attributes and tuples in a relation do not present a fixed order, making sequential embedding not recommended. The order in which the database elements are presented is purely for the human comprehension of the database structure. For that reason, reordering tuples and/or attributes of relation do not compromise the result of the query performed over the data, not their usability. Therefore, relational data watermarking techniques implementing sequential synchronization compromise watermark detection if the protected data are reordered, e.g., hit by a subset reverse order attack [6], not requiring any updates in the process. This happens considering sequential watermark synchronization is based on embedding and extracting the watermark signal following the order of the marks in the watermark source and the embedding positions in the protected digital asset. In this way, the attacker can cause reasonable doubts in ownership claims while preserving data quality.
To avoid sequentiality in watermarking of relational data, some techniques propose the use of pseudo-random selection for picking the marks from the generated watermark and selecting their embedding place in the relation, e.g., Sardroudi & Ibrahim [7], Pérez Gort et al. [1], Hu et al. [8]. This approach successfully creates robustness against subset reverse order attacks, but once applied, another problem emerges. Indeed, some marks can be recurrently selected during the embedding, while others are entirely ignored. Embedding the same mark multiple times is good to overcome the effects of benign updates and update-based attacks over the watermark signal, if a majority voting is performed during the watermark extraction process. However, the recurrent selection of some marks comes at the expense of the partial embedding of the watermark, compromising the insertion of a comprehensible watermark signal or resulting in the embedding of a weak signal, not requiring high-degree attacks to compromise the detection of the watermark before compromising data quality.
This work aims to provide distribution of repetition of mark selection as uniform as possible, allowing marks previously excluded to be considered and improving the quality of the embedded watermark.

A. ARCHITECTURE OF RELATIONAL DATABASE WATERMARKING
In the following, we address the protection of a single relation in a relational database. Watermarking techniques for relational databases can be classified in several ways. For instance, by the type of watermark information to embed, the type of attributes into which the marks are embedded, or if the watermark distorts the data or not [9]. In particular, techniques modifying the relation content for its protection are defined as distortion-based, whereas the others as distortionfree. For distortion-based watermarking techniques, marks are first generated and then embedded into a relation with the condition that the embedding will not compromise data usability.
We introduce the definition of meta-mark, consisting of a bit extracted from the watermark source, to generate the mark being embedded. Notice that previous works do not make a distinction between meta-marks and marks. This term is highly relevant, considering that the recurrent selection occurs with the meta-marks. Fig. 1 highlights the steps generally characterizing the embedding and extraction phases of a distortion-based VOLUME 10, 2022 watermarking technique, featured by pseudo-random selection. The embedding phase is an iterative process. Given a relation R in a relational database DB, a watermark source S, and a set of parameters P, each iteration s pseudo-randomly selects the so-called meta-mark m k from S. The meta-mark m k is combined with a bit of the attribute value being watermarked in the database to generate the mark m l to be embedded.
Also, in the same iteration s, the location to embed the mark m l in R is computed. Once m l is generated, it is inserted into the chosen location of R. 1 After the mark of stage s is embedded, we enter into stage s + 1 and the process proceeds with the generation of the next mark and the selection of the next embedding position. Thus the embedding process can be expressed by the function I(R, S, P) = R , where R denotes the resulting watermarked relation. The embedding ends when all attributes and tuples from R has been considered according to the locomotion rules defined for the watermark synchronization. The locomotion rules establish the way to go through the tuples and attributes of R to find places suitable for mark embedding or extraction according to the parameters' values used by the embedding and extraction processes. In this work, AHK algorithm principles are applied (cf. Section II).
The extraction phase is given by the function E(R , P) = S , where R denotes the copy of the protected relation that may have been distorted due to benign updates or malicious attacks performed once R is deployed, P is the set of parameters used for the extraction phase, and S is the watermark source obtained from R . For the watermark extraction, the number of parameters and their values must be the same as those used during the watermark insertion, according to P = P. The extraction phase is featured by an iterative process too. Indeed, a mark is detected and extracted at each iteration, and the corresponding meta-mark is generated. Several values for the same meta-mark can be extracted. After all the iterations are performed, a majority voting is carried out for the watermark source reconstruction, and each place's final meta-mark is produced. Once the meta-marks are collected, the watermark source is reconstructed by the ''Source Assembly'' step. The watermark source construction is the only sub-process of the watermark extraction that does not occur inside iterations.

B. PAPER CONTRIBUTION
This paper proposes the so-called Recurrence Reduction Engine (RRE) to limit the maximum number of times each meta-mark can be considered during watermark embedding (cf. Fig. 2) while preserving the pseudo-randomness property of the process. In particular, we add a ''Probability Box'' to generate transition probabilities for meta-marks selection, according to the following rule: at stage s + 1 neither the 1 The iteration stage and the meta-mark identifier k are not coincident since the number of meta-marks is constrained to the watermark source's length, and the number of generated embedding places gives stages (cf. Section II). meta-mark selected by s nor the one chosen by s -1 can be used to generate the mark to embed. By adding this, we increase the chances of other meta-marks being selected. Furthermore, a ''Chaos Generator'' is included to guarantee the same order of meta-marks selection during the embedding and the extraction processes, not compromising watermark synchronization. Our proposal functioning does not depend on the watermarking technique.
The experiments performed to validate our work show how the quality of the embedded watermark improves due to considering more marks in the embedding. Despite recurrent selection reduction, most marks are still embedded multiple times, which, combined with majority voting during the extraction process, contributes to the resilience of the watermark against update-based malicious operations. By avoiding marks exclusion during the embedding and using meaningful watermark sources, the watermark can be restored by applying enhancement algorithms. 2

C. PAPER STRUCTURE
The rest of the paper is organized as follows. Section II, introduces the preliminaries on relational database watermarking and details the problem statement. Section III presents the watermarking architecture of our proposal along with the mark Recurrence Reduction Engine (RRE). Section IV presents a theoretical analysis of our approach. Section V depicts the experimental results. Sections VI presents details of the related work. Section VII concludes.

II. PROBLEM STATEMENT
Let R[ν, η] be a relation, where ν denotes the number of attributes and η the number of tuples in R. According to the relational model [11], the corresponding relation schema is defined as R(PK, A 0 , . . . , A (ν -1) ), where PK is the relation's primary key, and A j , with j ∈ [0, ν -1], represent the rest of the attributes of R. Tuples in R[ν, η] are denoted by T i with i ∈ [0, η -1], and T i [A j ] is the database element in the i-th tuple corresponding to the j-th attribute. Similarly, T i [PK] is the database element in the i-th tuple corresponding to the 2 Note that watermark signals can be meaningless or meaningful. Meaningful watermark signals are generated from the meaningful textual content, images, and audio, among other sources [6], [10]. For meaningless watermarks, the meta-mark concept does not apply.
Primary Key. Fig. 3 presents an example of a database relation structure.
Moreover, let S be the source used to generate the watermark, i.e., a binary array of meta-marks, and m k the k-th meta-mark in S, with k ∈ [0, n -1], where m ∈ {0, 1} and n is the size of S. Also, let WM denote a watermark signal, i.e., a binary array collecting the marks generated during the embedding process, and m l the l-th mark in WM, with l ∈ [0, u -1], where m ∈ {0, 1} and u is the size of WM [13]. The watermark WM being embedded into R can be generated from sources of different data types. When the watermark source is an image i.e., image-based watermarking technique (IBW) [14]- [16], we represent S as a matrix whose dimensions are given by the height H and width W of the image I used to generate the watermark. Watermarking techniques using binary images provoke less distortion on R while embedding WM, considering each pixel value is a bit, i.e., 0 for black and 1 for white.
To give an idea of the consequences arising from the embedding of marks generated from pseudo-random selected meta-marks, consider the source S = 1, 0, 1, 1, 1, 0, 0, 0 and the relation R [4,10]. We assume the second meta-mark, i.e., m 1 = 0, is used to generate the marks embedded in the Regarding the positions into which m 1 was considered for the embedding, the mark value 0 has been extracted 3 times and the mark value 1 has been extracted just once. Thus, performing a majority voting, the value 0 is assigned to m 1 . Let WM be the watermark signal extracted from R (cf. Section I-A), the function sim(WM, WM ) computes the similarity level between the embedded and extracted signals. Suspicion of piracy will be confirmed if ε ≤ sim(WM, WM ), where ε is a threshold of detection for piracy assertion. Details about the design and implementation of the sim(·) function depend on the nature of the watermark source.
Note that majority voting allows overcoming minor inconsistencies, like ignoring the mark with a value contradicting the values of the other marks extracted. Still, for it to contribute to resilience against benign updates and updatebased attacks, the more the same meta-mark is considered for the embedding, the better. But, while some meta-marks are considered multiple times, the chances for others to be chosen by the insertion process reduce, leading to the partial embedding of the watermark. This fact compromises the embedding of a comprehensible watermark signal or results in the embedding of a signal weak enough, not requiring a high degree of attacks to compromise the detection of WM before affecting data quality.  Table structure of a database relation [12].

A. MOTIVATING EXAMPLE
The partial watermark embedding caused by pseudo-random selection of meta-marks affects watermarking techniques generating WM from images. In this case, missing pixels in the image reconstructed from the embedded watermark is a consequence of ignoring meta-marks.
To illustrate the problem, we use the IBW technique presented in [17] to watermark the numerical data set ''Forest Cover Type'', available in [18]. The technique uses a binary image as a watermark source, which allows the generation of a single mark per pixel, considering the pixels' values are one bit. Contrary, if colored images are selected as watermark sources, several marks are generated for each pixel to reconstruct the three-channel value, i.e., red, green, and blue, between 0 to 255, causing higher distortion over R compared to techniques using binary images.
We choose images of different sizes to generate the watermark (cf. Fig. 4) and later in this section we use red color to identify the positions where no pixels are selected. By using images with a different number of pixels, we can analyze the role of the source size n in pseudo-random selection when watermarking relations with the same number of attributes ν and tuples η. Each experiment is performed with only one image.
Technique in [17] selects for marking approximately γ tuples out of η, where γ ∈ [1, η] is defined as the Tuple Fraction (TF). The lowest values of γ will be responsible for involving a higher number of tuples in the process. Instead, the number of attributes marked per tuples, defined as the Attribute Fraction (AF), is denoted as δ ∈ [1, ν]. All attributes of the selected tuples are marked if δ = 1. VOLUME 10, 2022 Algorithm 1: Selection of Embedding Positions [17].
The pseudo-random selection of tuples, attributes within chosen tuples, and bits within the available number ξ of least significant bits (lsb) of selected attributes is performed according to the value generated by using a one-way hash function H. As stated in [12], H takes as input the primary key PK identifying the tuples of R and the secret key SK given by the data owner according to (1), where • denotes the join operator.
The selection of embedding positions is carried out according to the combination of the AHK algorithm [19] and the analysis of each attribute per selected tuple considering the value of the attribute fraction δ (cf. Algorithm 1). The maximum embedding capacity of the database will depend on how many values in R can be marked without compromising its usability. This is why the highest the size of R, the better. The attributes analysed are those belonging to tuples satisfying the condition F(T[PK]) mod γ = 0 (line 2).
For each attribute A in the attribute list AL of R, the Attribute Virtual Value (AVV), denoted as a v , is generated using its β-th most significant bits (msb), to guarantee watermark synchronization, considering mark embedding, benign updates, and attacks do not modify their values to preserve data usability. Next, a v is combined with F(T i [PK]), and a new value a h is created to decide if the attribute will be marked (line 5). Mark embedding proceeds only for attributes accomplishing the condition a h mod δ = 0 of line 6. Finally, the lsb position b to embed the mark generated from the selected m k is chosen and the embedding proceeds for the attribute A within the tuple T (line 8).
Besides avoiding the downsides of sequential embedding stated in Section I, the selection of embedding places in R is pseudo-randomly performed to difficult the attackers from guessing the locations of marks in R as well as to scatter the distortion caused by the embedding. Enhancing the selection of embedding places with pseudo-randomness positively affects the watermarking architecture. Contrary, when applying pseudo-randomness to selecting the meta-marks from S, new challenges arise.
Considering that multiple marks can be embedded in one tuple, the generation of the pixel's coordinates depends on the attributes selected within the tuple according to (2) and (3), where w and h are the selected width (out of W) and height (out of H) of the image I respectively.
To avoid obtaining the same coordinate every time the same value of a h is generated, a different seed is applied for each one of the equations. Ideally, the seeds and a h should be different every time, making the selection of new pixels possible. However, due to using the same values of W, H, and msb for generating a h , this is very unlikely.
The implementation of the seeds generation in [17] was done by considering γ , δ, and values of msb combined with a h . Nevertheless, since both coordinates are generated for the same attribute, a constant element was added to differentiate seed1 from seed2.
To show the consequences of uncontrolled pseudo-random pixel selection we performed several experiments, watermarking R each time using different values of γ whereas δ = 10. The scatter of meta-marks' selection is obtained by using the binary capacity c b introduced in [1] according to (4), where m denotes the number of meta-marks selected during the embedding, out of n meta-marks composing S. 3 The behavior of recurrent selection is described by the weight-based capacity metric c w also introduced in [1] according to (5), where the number of times the meta-mark m k is selected, is given by the function (m k ). The weight-based capacity constitutes the accumulation of every time each meta-mark was selected.
To obtain the scattering of recurrent selection of meta-marks in S, we use the standard deviation of the recurrent selection according to (6). Best results will be featured by values of σ w as close as zero as possible, meaning that recurrent selection of meta-marks is uniform.
Considering E as the set of values obtained by (m k ) for all meta-marks, i.e., E = {m k ∈ S | (m k )}, the function max(E) returns the set's maximum element, which lets us know the extreme recurrence. Finally, the general situation of the selection is depicted in (7), where the number of meta-marks considered from S (given by m) is analyzed along with the scattering of the meta-marks' recurrent selection (given by σ w ).  According to (4), the more values are considered for the process the highest m. Uniform embedding is described by high values of (E), which are proportional to m, and directly affected by σ w . For cases describing more differences in the number of times each meta-mark is selected, σ w presents a high value, resulting in a low value of (E). On the other hand, the lowest σ w the better. 4 Tables 1, 2, and 3 show the results from the experiments carried out to illustrate the consequences of uncontrolled pseudo-random selection, when different watermark sources are used (cf. Fig. 4). The results of each column correspond to the embedding performed using different parameters. In general, γ varies whereas AF is kept the same (δ = 10).
The rest of the metrics are computed for each experiment, along with the quality of the images reconstructed from the watermark embedded in R. As expected, for values of γ closer to 1, the number of red pixels in the image reconstructed is lower. Also, the values of c b decrease, whereas the number of red pixels increases.
Regarding the comparison between the tables, reducing n while marking the same number of tuples and attributes in R has a direct effect on the metrics describing the spreading of meta-marks' selection. Table 3 shows better results for c b and c w . Nevertheless, σ w and max(E) are also higher, which evidences the problem of meta-marks exclusion. The values given by these metrics let clear that the meta-mark exclusion 4 By definition, when σ w = 0 then (E) = ∞.   problem can be solved without compromising embedding recurrence. Indeed, there are much more marks embedded than meta-marks considered for the generation of marks to embed, which means other meta-marks can be included in the process without drastically reducing recurrent selection.
Partial consideration of meta-marks in S for the generation of WM directly affects the technique's robustness. In fact, when fewer meta-marks (out of n) are used, later updates on data will compromise a higher number of marks, and attackers will not need to update too many tuples to compromise watermark detection, being able to perform malicious operations while preserving data usability.  Tables 1, 2 and 3 respectively, when a tuple-deletion subset attack is performed. A different number of tuples are involved for each of the simulations, gradually increasing the attack's degree.
In the figures, the quality of the watermark is measured by using the Correction Factor (CF) and the Structural Similarity Index (SSIM). The Correction Factor, according to (8), compares the values of each pixel for the same position of the embedded image vs. the extracted one. The embedded image is denoted as Img emb whereas the extracted one is given by Img ext . When CF is equal to 100, the two images are identical. On the other hand, CF = 0 means that the images are entirely different.
The SSIM is obtained according to (9) and returns an appreciation of the extracted watermark quality closer to human perception. For this case, multiple windows of size N × N are defined by x and y. The domain of SSIM in this work is between 0 and 1, where 1 depicts the perfect structural similarity between the two images and 0 indicates no structural similarity. In the equation, the symbols µ x and µ y represent the average of x and y respectively, σ 2 x and σ 2 y their respective variance, and σ xy their covariance. The elements C 1 and C 2 are two stabilization constants.

SSIM(x, y) =
(2µ x µ y + C 1 ) + (2σ xy + C 2 ) (µ 2 x + µ 2 y + C 1 )(σ 2 x + σ 2 y + C 2 ) As is depicted in the figures, the quality of the detected WM is affected when the number of attacked tuples increases. Also, when the number of tuples marked is bigger (e.g., for γ = 5 or γ = 10), WM is compromised when more tuples are involved in the attack. Nevertheless, for the case of the watermark generated from the Dào character (cf. Fig. 7), since its size is smaller compared to the one generated from UTM and WWF sources, higher resilience is depicted.
Although the differences depicted in previous figures, allowing recurrent selection by managing its scattering is expected to have a direct benefit in the robustness of the technique, independently on the watermark source size.
The problem described in this section affects every technique implementing pseudo-random selection to prevent subset reverse order attack success. This is an inherent feature of the AHK algorithm, which is the base for a significant number of watermarking approaches for relational data [7], [19], [20].

III. PROPOSED APPROACH
The problem addressed in this work has two main roots. First, there are no limitations defined for the recurrence of pseudo-random selection. Second, a source featured by low-scattered data limits the generation of the seed. The  second root is harder to face by blind watermarking techniques, considering no-external content should be required for the watermark extraction [6]. Because of it, once a position is chosen to embed a mark, the elements involved in the mark generation must depend only on the value been watermarked (e.g., A, a h , a v ) and the parameters used for watermark synchronization (e.g., SK, source size, ξ , β).
Let us start addressing the first root. The second root will be treated in Section III-B. The solution we propose addresses the first root with the help of the structures involved in the watermarking process. Our approach aims to restrict the selection of meta-marks until a specific limit ρ ∈ Z + , to increase the chances for other meta-marks to be selected. Fig. 8 presents an example with a watermark source composed of nine meta-marks. In Fig. 8.a) each time a meta-mark is selected, a gray rectangle is drawn on top of it. Some meta-marks, such as m 2 , are selected multiple times, while others are ignored, e.g., m 1 , m 4 , and m 7 . If a boundary is defined to only allow selecting the same meta-mark three times, as it is depicted in Fig. 8.b), there might be a chance for those meta-marks previously ignored to be considered. Fig. 9 depicts a case of the benefits of recurrence selection limitation involving some of the metrics introduced in Section II. In this case, 12 marks are embedded. Fig. 9.a) shows the results of using traditional recurrence embedding, which considers only three meta-marks (m = 3), some of them being embedded too many times considering others are ignored (e.g., m 3 vs. m 2 ). On the other hand, if the number of recurrence embedding is limited to 3 according to ρ = 3 (cf. Fig.9.b)), then more meta-marks are selected, resulting in the embedding of a stronger watermark signal. This will be reflected in the value of σ w , which will be lower for the case of the second figure, describing a more calibrated embedding.
In Fig. 9.b), values in red color represent recurrence embedding reduction compared to traditional recurrent embedding approaches. Furthermore, meta-marks in green are those previously ignored and considered for the recurrence limitation scheme when the excessive recurrence of other marks is avoided.
The challenge is that recurrent selection limitation must also be implemented as a pseudo-random process since any sequentiality can compromise watermark synchronization if data ordering is redefined. In this scenario it is also important to guarantee the same selection order of the meta-marks during their embedding and extraction, to avoid watermark synchronization failures in case of subset reverse order attack.

A. DYNAMIC TRANSITION PROBABILITIES GENERATION FOR META-Marks' SELECTION
The challenge is to design a transition probabilities generation for meta-marks' selection that is as blind as possible. For this aim, considering sequential selection is not recommended, we would like probabilities depending only on the number of the meta-marks in the watermark source S. Table 4 shows a matrix representing a generic watermark source. We assign sequential character values to the meta-marks instead of a single bit to better illustrate the approach.
To increase the chances of other meta-marks' selection, first we do not allow the meta-mark selected at stage s to be considered in the next embedding stage. Assume M(s) being the function returning the meta-mark chosen for the embedding stage s. If M(s) = B, the rule M(s + 1) = | B is mandatory. Also, since previous selections want to be avoided, if M(s -1) = A another mandatory rule is M(s + 1) = | A.
In this example, we use different values for each meta-mark to facilitate the reader's comprehension; nevertheless, different meta-marks can have equal values in the binary string composing S. For this reason, instead of defining previous rules based on the marks' values, their positions must be used. Let M(s) = m s k . Considering the function P(m s k ) returning the position of the meta-mark selected in the stage s, the following statements formalize our approach.
The binary stream composed of the meta-marks in S will have a circular structure. According to this, the meta-mark immediately before the first one will be the last one. As well, the meta-mark following the last one will be the first one. At each stage, the current and immediately previous positions of meta-marks must be excluded from the pseudo-random selection in the next stage, given by s + 1. Considering P h (P(m s k )) as the function that returns the horizontal probability for the position P(m s k ) of being selected in the stage s + 1 given the current one, then P h (P(m s k ) -1) = 0 and P h (P(m s k )) = 0, according to (10) and (11). The rest of meta-marks' positions probabilities are assigned according to (13). The number of meta-marks considered by the selection is n -2 since two of them have already zero probability assigned.
The position P(m s k ) + 1 will have a probability higher by U n than the following position and the last position available for selection will have probability U n . So probabilities strictly decrease. Table 5 shows an example for the current selection of the meta-mark B (highlighted in red color and underlined) where meta-marks' order in S reflects the ordering of the weights of the assigned probabilities.
The probabilities for next meta-marks for being selected should depend just on the current selection, and we can model them by Markov chains according to P h (X s+1 = x|X s = x s ), where X s denotes the current stage selection and X s+1 the selection for the next stage. By considering again example of Table 4, the generation transition matrix obtained this way results in: Formally, the generation transition matrix is obtained according to (15).
Since the selection of the next meta-mark will depend on the current selection and T, there is the possibility to increase chaos in the process by assigning zero to other meta-marks' probabilities modifying the transition matrix. In this work, we present a symmetric matrix representing linear behavior to understand our proposal better.
According to the current selection, the probability of selecting the element is also combined with another factor, i.e., the recurrent selection boundaries. In order to enforce our VOLUME 10, 2022  probability table corresponding to Table 4 with M(s) = B.
approach we consider how the probability of selecting one element varies each time that element is chosen. In this case, the selection instead of being conceived horizontally (from the storing structure perspective) is analyzed vertically (from the stack formed every time the same meta-mark is selected).
Every time the same meta-mark is chosen, the probability for its recurrent selection P v , i.e., the vertical probability, is reduced in 1/ρ units. For a given m k , P v is obtained according to (16), where the function (s, k) returns the number of times the meta-mark in the position k has been selected so far. 5 Note that the differences in the meta-marks position notation in (13) and (16) are given by the different domains analyzed. Indeed, when meta-marks are identified according to their spatial location in S, we use k. On the other hand, we use P(m s k ) when the temporal domain is used. The overall probability for selecting m k considering recurrent selection and the stage following the current one is obtained according to (17). Table 6 shows P sel values for the watermark source of Table 4 when the meta-mark selected at stage s is B. Each probability depends on the number of times each meta-mark has been selected. The last column shows values for P v where we allow to select no more than 6 times. Every time the same position is chosen, P v reduces in 1/6 units until it is equal to 0. When a meta-mark has not been selected, P v remains 1.
Values in light blue colored cells show the probability P h considering the meta-mark currently selected is B. For this reason, and according to (10) and (11), no matter the value of P v , A and B will not be considered for the next stage. According to this, P sel = 0 for all combinations involving meta-marks A and B (cf. columns 1 and 2). Even if those columns store always 0 as value, they have been kept in the table to stress the concept of next stage exclusion according to the current selection. The rest of the meta-marks are assigned a probability depending on how distant they are from the meta-mark currently selected, and by the number of times they have been selected in previous stages. The final probability of selection P sel is obtained according to (17) and depicted in light orange colored cells.

B. THE CONTEXTUAL-CHAOTIC SEQUENCE GENERATOR
The challenge derived by trying recurrent selection control is that keeping track of requires considering the same sequence to embed and extract WM; otherwise, synchronization gets compromise. For example, having the relation R [2,7], lets assume that the meta-mark m D of a generic source S is selected 4 times to generate the marks being embedded into is too high, the final value assigned to the meta-mark will be wrong. Also, when using positions previously denied to marks generated with m D , other meta-marks linked to marks using those places for the embedding will be affected (cf. Fig. 10). Fig. 10.a) shows the case when positions excluded from the ones selected to embed marks generated with m D (represented with red arrows) are assigned to marks generated using m E . On the other hand, Fig. 10.b) shows conflicting positions (highlighted in yellow) due to not respecting the same sequence's order for the extraction. In particular, the position T 0 [A 0 ] is excluded and wasted since it is never linked to any other meta-mark. While position T 6 [A 1 ], which was previously excluded and assigned to a mark generated with m E , is included.
As soon as sequential nature is added to the processes, synchronization becomes vulnerable to subset reverse order attacks. Also, suppose the strategy of keeping track of the positions' order consists of storing their sequence in third structures used as a reference for the extraction. In that case, the blindness requirement of the technique is affected.
To face this problem, we combine P sel with a component pseudo-randomly generated, defined as g I , providing the same  order for the embedding and extraction of WM. We define the region G I of range λ I containing the elements used to generate g I . Considering multiple g I are generated, we use the notation λ D I , G D I , g D I denoting the case of the pseudo-random number D-th (cf. Fig. 11). All g I numbers are stored in the set C I , and a broader area of λ II range, identified as G II , is used to define the position of its correspondent g I in C I . Thus, even if the same g I values are generated, their place in C I will depend on a majority appreciation of their context. Both regions, G II and G I will have the same center g c , that will be used as reference to embed the mark generated with the selected meta-marks in G I . According to this, G I ⊆ G II (cf. Fig. 12.a)).
The principle of the majority contextual appreciation is similar to the majority voting described in Section I. Their main difference is that for the majority voting are considered candidates' values for one meta-mark whereas for the majority contextual appreciation, the elements contained in the region G II surrounding those contained in the region G I are analyzed. Given the context analysis requirement, G II sets can intersect each other without major consequences for the outcome. Nevertheless, this is not allowed for G I sets (cf. Fig. 12.b)).
The way elements are analyzed to be considered by every region depends on values that cannot be modified without compromising data quality for both data owners and attackers (e.g., msb of carriers, locked attributes). 6 This makes possible for the context to persist despite small variations, overcoming impacts due to G II intersection, benign updates and malicious 6 Locked attributes are attributes with nature and relevance not prompt to changes. operations, and guaranteeing the persistence of C I order.
T ≤ (G II ,G II ) (18) Let T be the threshold of allowed contextual differences and be the function that evaluates them considering all the G II sets generated for embedding the watermark signal (given by G II ) and those generated for the watermark extraction (given byG II ). Then, with (18) we can obtain an accurate idea of the degree of changes in the context and of the effects of the latter over C I . If the contextual difference betweenG II and G II exceeds the threshold T, the watermark synchronization will fail.

C. WATERMARKING ARCHITECTURE
This work focuses on robust watermarking techniques created for the copyright protection of relational data. These approaches are based on the principles of generation and embedding of WM, and the WM extraction is carried out under suspicion of piracy or false ownership claims. The main architecture of our approach is consistent with the processes described in Section I-A.
We perform the addition of Recurrence Reduction Engine (RRE) in the IBW technique described in Section II. Our proposal inherits features from [17] such as numerical cover type and multi-attribute. As depicted in Fig. 2 of Section I-B, the reduction of recurrent selection firstly takes place in the selection sub-process of the watermark embedding. The core of RRE consists of combining the pseudo-random generation of g I numbers (handled by the 'Chaos Generator') and managing P sel probabilities (performed in the 'Probability Box'). The watermark source S and the relation being watermarked R are the main inputs of the embedding process. Nevertheless, they have a particular link with RRE, since the 'Chaos Generator' also uses R to define C I and the 'Probability Box' interacts with S to assign the probability of each meta-mark for their later selection, according to the current stage.
Meta-marks resulting from the selection sub-process containing RRE must guarantee to consider more marks already at that early stage of the embedding compared to techniques that do not implement RRE. Once the meta-marks have been selected, the generation of marks is performed considering fixed elements from their embedding position. 7 When the mark is generated, its insertion on R may apply.
A low-level description of the embedding process deploying RRE with the meta-marks' selection is formalized by Algorithm 2. In our approach, the role of the watermark source structure is higher since probabilities are given according to meta-marks position in S. Considering the positions of meta-marks are fixed, there is no risk of subset-reverse order attack over the watermark source, which backs up the feasibility for our proposal implementation. 7 Fixed elements are data that cannot be modified without compromising data usability. The array P H stores the weight of selecting each meta-mark according to a number representing the probability in scale. Values of this array represents P h values and are switched according to the current selection given the rules introduced in Section III-A, by the function rotate_P H (P H , n, k) (line 22), where k represents the position of the current meta-mark selected in S and n the watermark source's length. Initial values are assigned to each position of P H (line 6). The set of numbers must describe the same slope of values obtained with equation (13). In the case of using decimal numbers, original values obtained with (13) can be used. Fig. 13.b) depicts the projection with integer values of the original probability distribution generated for Dào WM, which might be used in case of engines taking integer values as a virtual probability. By considering the image size and reducing by one the next value to assign in the correspondent P H position, a distribution is generated with the same slope of the original probability distribution (cf. Fig. 13.a)). Fig. 13.c) shows the proportion between the values assigned to each metamark (original vs. projected ones), validating the projection with integers.
The array P S represents the overall probability P sel of equation (17). Values assigned to each array's position are generated by considering the number of times available for selecting each meta-mark, according to P v defined in (16).
being a scaled value of P v . Note that values stored in P S are computed in line 7. Once an embedding is performed and C H and P H are updated, P S is updated as well (line 23).
The term P acc (line 4) represents the accumulated probability value to generate the random number inside the available set of probabilities (line 18). Each time P S varies, P acc is updated (lines 8 and 24). The generation of random numbers to increase the entropy of meta-marks selection is performed according to the techniques discussed in Section III-B. In line 9, the function generate_C I (λ I , λ II ) takes the radios of regions G I and G II to generate the set C I , which contains g I numbers with the order to be used for the meta-marks' selection.
Lines 10-15 detail the selection of elements being watermarked according to AHK and [17]. Line 16 computes the lsb position b ξ to store the mark, and line 17 computes a msb value b β out of the ones given by β. In line 18, the function getMarkPos(C I , T, A, P acc ) uses C I to extract the g I corresponding to position T[A] and uses it as seed of a random engine to select a position of the meta-mark according to the accumulated probabilities P acc .
Once the position k is obtained, the meta-mark m k is extracted and the mark to embed is generated by x-oring m k and b β (the symbol ⊕ denotes the x-or operator) (line 19). Next, the mark is embedded in the lsb position b ξ of the value stored in T[A]. Strategies for reducing distortion caused by the mark embedding according to [7] and [17] are implemented by the mark(T, A, b ξ , m) function of line 20.
Finally, for the selected meta-mark, C H is decreased by one unit, P H is switched according to the current selection, P S and P acc are updated. The iteration continues until all values of R are analyzed.
The extraction process takes place following the same principle for watermark embedding. In this case, the process is featured by a flow of detected marks from R to S. For every stage, a meta-mark is extracted following AHK and marked values are checked using rules defined in [17]. If a value is stored in a position identified as mark container, the meta-mark is extracted according to m = b ξ ⊕ b β , where b ξ contains the mark stored during the embedding in the correspondent lsb. The position k to store the extracted meta-mark is obtained the same way as for the embedding process.
At last, different meta-marks' values can be extracted for the same position k. A middle layer will store all repeated values, and after the extraction is finished, a majority voting is performed the same as with previous techniques. After the majority voting takes place, the source S is built and compared with the original according to (8) and (9). The conclusions of false ownership claims or piracy are delivered depending on the results.

IV. ANALYSIS
Limiting the number of times the same meta-mark is selected for the embedding should be done carefully. The value assigned to ρ could derive in higher robustness or proceeding with a weak protection of R as well.
There are some options about how to apply the engine proposed in this work, but all of them must take into account the number of times the marks are expected to be embedded into R, which requires taking into account the relation's dimension ν and η, and the embedding parameters γ and δ.

A. THE PROBABILITY-BASED SYNCHRONIZATION
The probability-based embedding with pseudo-random seed generation and recurrence limitation has advantages of meta-marks uniform selection and scattering of embedding locations. Thanks to this, resilience against malicious attacks increases. Nevertheless, an important downside of this approach is that the detection is not 100% accurate. Majority voting can help to overcome this problem if ρ is not too low. On the other hand, the right detection of neighboring meta-marks can help to correct false-positives added to S . To proceed this way, correction algorithms must try to recover missed meta-marks as well as correct wrong meta-marks' values.
To know which action to take to benefits majority voting but also allowing correction algorithms to be useful, it is important to monitor the performance of the watermark detection process. According to AHK, ω denotes the number of marked tuples. When only one mark per tuple is embedded, then ω can also be used to know the number of embedded marks and can be obtained according to ω ≈ η/γ . In a multi-attribute embedding setting ω ≈ η/γ × ν/δ and it reflects the number of embedded marks.
The false positive rate FP ∈ Q + : 0 ≤ FP ≤ 1 is expressed according to (19), whereω ∈ Z + is the number of marks extracted with the incorrect value out of the number of marks embedded given by ω ∈ Z + . When FP = 1 means all extracted marks do not correspond to the expected ones, rising suspicious about malicious operations being applied over R.
Another metric helping to measure the quality of the detection process is the detection accuracy A ∈ Q + : 0 ≤ A ≤ 1, defined in (20). In the equation, E m ∈ Z + is the number of meta-marks embedded and C m ∈ Z + the number of meta-marks having the correct value once the majority voting is performed and S is reconstructed.
The best case-scenario of WM extraction with respect to accuracy is when A = 1. This means all meta-marks of S were correctly identified, even if some marks were extracted with the wrong value (i.e., FP = 0). This last situation is possible thanks to the role played by the majority voting step in the construction sub-process. Fig. 14 and 15 allow us to compare the distribution of the meta-mark selection when RRE is applied or not, respectively. Blue boxes and blue dots represent meta-marks selected in a particular recurrence level. Red boxes and red dots denote meta-marks being ignored in the first recurrence level, or considered by previous recurrence level and ignored at that point. Considering S as an image, axis x represents the source's width, axis y its height and axis z the number of times the meta-mark of position (x,y) have been selected.

B. VERTICAL DIGGING VS. HORIZONTAL SPREADING
Vertical digging and horizontal spreading are definitions that arises when using meaningful sources for the watermark generation and the performance of recurrence embedding. When using meaningful sources, neighboring meta-marks of a contrasting value can be used to correct it. This can be performed by applying enhancement algorithms such as noise-reduction methods for images.
This type of operation increases the probability of success if more meta-marks are considered for the embedding. We define this property as horizontal spreading of meta-marks' selection (cf. Fig. 14) and it is directly benefited by applying RRE proposed in this work.
On the other hand, recurrent embedding contributes to avoid the extraction of false positives meta-marks, which contributes as well to the technique's robustness. We define VOLUME 10, 2022   the recurrent embedding as vertical digging over S (cf. Fig. 15) and even if is a recommended feature, it should not be performed at expenses of horizontal spreading's costs. For this reason, it is very important to exploit all possible locations from R for the embedding, as long as data quality remains.
In this case, there is no need to assign a value too low to ρ if the values of ν and η, considering γ and δ, guarantee the condition ω ≫ n, where the operator ≫ describes the relation much more greater than (e.g., the number of embedded marks ω is much more greater than the number of meta-marks n). Then vertical digging does not have to be compromised to the benefit of horizontal spreading.
Another way to assign a value for ρ that does not restrict the use of all embedding places that R offers is by knowing first the values of max(E) and m introduced in Section II. Nevertheless, this option is not optimal regarding performance, considering it requires first the embedding of the watermark without considering RRE to obtain the values of those metrics.
Another downside of selecting a low value of ρ, if n is relatively small, is limiting marks embedding to just a portion of R while all the meta-marks generating them were considered (cf. Fig. 16.a)). Indeed, in this scenario, all meta-marks are selected a number of times too low, compromising the benefits expected from the majority voting in the extraction process. Then, a low-level update-based attack could affect the watermark recognition. This can be overcome by restarting the value of ρ once all meta-marks are considered and P acc = 0, being possible to continue the meta-mark selection in a uniform way until all tuples and attributes of R are analyzed (cf. Fig. 16.b)).

C. WATERMARK EMBEDDING EXAMPLE
Before experimentally validate our approach, we show the effect of performing the watermark embedding while considering a small binary image S. According to the pixels' values, S = 1, 0, 1, 0, 0, 0, 1, 0, 1 . 8 In this case, n = 9 and ρ = 3. Also, as previously stated, for s = 0 it is assumed that the first meta-mark is already selected. Taking this into consideration P H and P S are initialized as P H = 0, 7, 6, 5, 4, 3, 2, 1, 0 and P S = 0, 21, 18, 15, 12, 9, 6, 3, 0 . Fig. 17 depicts the arrays representing the watermark source S, the tracker of embedding times C H , P H , and P S aligned for each stage s. The red lined rectangle represents the pseudo-random position selected for the next stage. The blue boxes represent items selected already once (i.e., for ρ = 3, C H [i] = 2), orange boxes represent items selected twice (i.e., for ρ = 3, C H [i] = 1), and yellow boxes elements selected already three times (i.e., for ρ = 3, C H [i] = 0).
On each stage, after updating C H , P H is rotated according to the position selected and P S is generated considering C H and P H according to Algorithm 2. Once C H [i] = 0 for a particular position, the same position will be assigned P S [i] = 0, taking that meta-mark out of the random-selection's consideration.
At every stage, the figure shows the accumulated probability P acc obtained by summing all items from P S . The process stops when P acc = 0 (stage s = 26).

V. EXPERIMENTAL EVALUATION
This section analyzes the components of the proposed watermarking architecture and the watermarking features directly affected by applying RRE, such as capacity and robustness. In addition, most of the metrics introduced in previous sections are used to evaluate and compare the results with those previously presented in Section II.

A. EXPERIMENTAL SETUP
We use watermark sources of different sizes to illustrate the effects of involving different values of n, similarly to Section II. The sources selected are the binary images introduced in Fig. 4.
The data used to embed the marks is the widely known numerical data set Forest Cover Type, available in [18]. To obtain results under the same conditions of previous works and allow a fair comparison, we perform the embedding in the first 30K tuples out of the 581K stored in the relation. Also, we consider only the first ten attributes (out of 54).
The implementation of our proposal follows a clientserver architecture. The client layer was developed using Java 1.8 and Eclipse Integrated Development Environment (IDE) 4.21. The database server engine was Oracle Database 19C. The considered IDE for database management was Oracle SQL Developer 21.4. The runtime environment was a 4.20GHz GHz Intel i7-7700K PC with 32.0 GHz of RAM with Windows 10 Pro OS. Fig. 18 shows the values of P h for each meta-mark of Dào and WWF image sources at the initial stage. Since the watermark generated from Dào image has fewer meta-marks, the probabilities will experience a faster drop than those related to WWF.

B. BENEFITS OBTAINED IN TERMS OF WATERMARK CAPACITY
When the number of meta-marks n is low, the differences among values of P h is high. This means that, for two neighbors meta-marks m i and m j (with i = | j), m i -m j will be higher for Dào than WWF. Also, if η and ν values do not change, i.e., if the amount of data being watermarked does not vary, all meta-marks of a small source are reconsidered first with respect to those belonging to a bigger source.
We used λ I = 1 and λ II = 3 for the generation of C I . Table 7 shows the quality of the embedded watermark generated from the UTM source for different values of γ . Given the high value of n and the reduction of tuples selected for marking when γ increases, the quality of the embedded watermark reduces. Nevertheless, compared to embedding without RRE (cf. Table 1), the quality improves thanks to the uniform meta-marks selection. The presence of red pixels in the extracted signal shows that all possible embedding places from R were considered, but not all meta-marks were selected ρ times.
Results shown in Table 8 presents a general improvement compared to those obtained by using the same WWF source and not applying RRE (cf. Table 2). Note that in this case, since n is lower than for UTM, the improvement of the capacity is higher. Table 9 presents the results when the Dào image is used as the source. For this case, since n is very small, independently   from the value of γ , the quality of the image generated from the embedded watermark is higher. Also, compared to results of Table 3 and despite the simplicity of S, the capacity experiences the more significant improvement.
It is important to notice that, for experiments shown in Tables 7, 8 and 9, c b increases compared to techniques performing the embedding not applying RRE. On the other hand, c w remains almost always the same, whereas σ w reduces due to the uniform meta-marks selection. Finally, when considering more meta-marks (E) increases, whereas a   reduction of red pixels is spotted in the images generated from the embedded watermark signal.
Tables 10, 11 and 12 present a more straight comparison of the quality of the embedded signals using different values of ρ against the signal embedded not applying RRE. Each table presents the case of each watermark source. The column having ''N/A'' (does not apply) as ρ value denotes the case of watermark embedding without RRE. For all the other cases, i.e., when RRE is applied, the same value of δ and γ are used to show how ρ affects the quality of the embedded watermark signal without varying the expected number of values being watermarked from R.
Similarly to the results of Tables 7, 8 and 9,  Tables 10, 11 and 12 depict the benefits of using a source with a low value of n in terms of the image reconstructed from the embedded watermark. Nevertheless, the case of WWF from Table 11 shows the more remarkable achievements considering the number of red pixels recovered in comparison with the embedding without applying RRE.
The tables above confirm also the relevance of the metric (E) as an objective way to measure the benefits of RRE when the number of tuples and attributes considered for the embedding does not vary. From the data shown in these tables, the relationship (E) ∝ 1/ρ (where ∝ denotes the proportionality relationship) can be empirically derived.

C. DETECTABILITY EVALUATION
When no malicious operations or benign updates are performed over R, the watermarking architecture without RRE guarantees a perfect detection of all marks. Nevertheless, due to the exclusion of meta-marks, the embedding of S is often partially compromising its recognition even when all embedded marks are correctly detected. On the other hand, the integration of RRE adds uniformity to the metamarks selection. Still, since the process randomly selects the meta-marks based on their probability, the detection is not 100% accurate. However, the effects of wrong meta-marks values over the reconstructed S reduce when the number of tuples in R is high.
A way to measure the quality of detectability by spotting the number of false hits during the watermark extraction is to compute the false positive rate FP and the detection accuracy A introduced in Section IV according to (19) and (20) respectively. Table 13 shows the quality of the detected signal for γ = 5 and γ = 10, when the embedded watermark was generated using each one of the sources introduced in Figure 4. In the Table, the bigger image represents the reconstructed S from the detected watermark. On the other hand, the smallest image depicts the part of S considered for the embedding.
The random engine used was based on the Random class of the package java.util contained in Java Development Kit (JDK). The methods setSeed(long arg) and nextInt(int arg) were used to assign the seed and for the random generation of numbers respectively. Despite the randomness added with the engine, for the same seeds and without any operations performed over the protected data, the process guarantee for each case a perfect detection (i.e., FP = 0, and A = 1).

D. ROBUSTNESS ANALYSIS
As previously stated, the uniform selection of meta-marks contributes to embedding a stronger watermark signal. Then, it is expected to achieve higher robustness to malicious operations since the watermark quality is higher than embedding without RRE. Also, the generation of C I and RRE must be resilient against updates over R avoiding the computing of seeds for the watermark detection different from the ones used for the embedding.
In Section V-C was already tested the detection under lack of updates. Here we test the proposal with respect to the quality of the detected signal while increasing the attacks' degree.
The case of deletion of tuples was used considering it as the most aggressive form of subset attack. Then, resilience to other types of subset attacks, such as updates and insertion of new data will be featured by a higher resilience than the one spotted in these experiments. Figures 19, 20, and 21 show the resilience of the watermark signal after different levels of tuple deletion attacks are performed over R . The comparisons are done with respect to results of Fig. 5, 6, and 7, where the extracted remaining signal after performing the same attack is shown.
The watermark generated from UTM and embedded considering RRE experiences higher robustness against tuple   deletion (see Fig. 19). For this case, ρ = 1, which is is not good vertical digging, but considering not all meta-marks are considered due to the number of tuples in R, horizontal spreading gets the direct benefits, allowing enhancement algorithms to reconstruct the signal if need it.
For the case when the embedded watermark is generated from WWF (cf. Fig. 20), it was used ρ = 2. The value is low, but since some meta-marks are still not considered when no RRE is applied, we intent a small recurrence whereas other meta-marks are included in the process. For this case, the quality of the remaining watermark signal drops faster than the previous case, which suggest a better performance of our approach with sources of bigger sizes.
Finally, we perform the text with the watermark generated from Dào (cf. Fig. 21) by using ρ = 5. This case confirms previous statement, but also contributes to reduce relevance for certain parameters, such as γ , which helps to obtain the same results performing the embedding without causing to much distortion over the data. This is a direct consequence of applying RRE over sources with low values of n. Fig. 22, 23, and 24 depict the difference for each source between embedding the same watermark considering RRE and using uncontrolled recurrent selection. For each case, the bar means positive difference, which indicated higher  resilience when RRE is applied. When no bar is shown, the difference is negative, describing higher resilience for no recurrent controlling.
The reduction of the resilience goes directly with the reduction of n of S. Since the value of ρ is too low for those cases, it is important to achieve a compromise between vertical digging and horizontal spreading by using higher values of ρ. Fig. 25, 26, and 27 show the accuracy of the detection process after tuple deletion attacks are performed. In this case, it is reflected how the pseudo-random nature depends on the use of same seeds, otherwise it gets affected when some elements from R vary. On the other hand, with this experiments it is confirmed once more then reduction of the role played by γ , particularly when n is higher.

E. IMPACT OF RRE WITH RESPECT TO PERFORMANCE
It is essential to evaluate the additional cost of integrating RRE into the watermarking architecture. The performance of the ''Probability Box'' and the ''Chaos Generator'' must not compromise the time taken to perform the watermark synchronization. Furthermore, the integration of RRE must perform accordingly to the number of tuples in R. For its application to be feasible, the technique must be featured by a linear increment of the time required for watermark  synchronization when the number of tuples in R increases linearly.
The first experiment is meant to analyze the complexity of RRE regarding the generation of C I . Considering that this step is not performed when RRE is not applied, this should be considered as an additional performance cost. On the other hand, if there is no risk of primary key elimination, the values of columns used to identify each tuple can be used as C I , and there is no need for extra time consumption generating C I . Fig. 28 depicts the time required to generate C I for two cases, when λ I = 1 and λ I = 2. For each case, λ I keeps the same value whereas λ II increases. In the figure can be depicted the linear increment of time required to generate C I , directly proportional to λ II increment.
Another important feature to analyze is the quality of each version of C I to perform watermark synchronization. One important feature to track is the number of repeated elements, since a high number of duplicates can compromise watermark synchronization. Nevertheless, with the parameter used to generate C I no duplicate values of g I are obtained. Because of this, high watermark capacity is achieved. Tables 14 and 15 compare the value of the standard deviation of each set of Fig. 28 with the one composed by the primary keys values. Considering that for the primary keys, the value is assigned using a serial integer i.e., increasing in one the value tuple per VOLUME 10, 2022   tuple, the scatter of G I and G II is much higher. For each case of C I , no duplicate values were spotted.
Finally, we evaluate the time required to perform the embedding using RRE. We compared out approach with a technique not applying RRE. Figure 29 shows the case when the sets of primary keys are used as C I .
The watermarking was performed using as parameter values γ = 10, δ = 10, and ρ = 5. As watermark source was selected the image Dào. The number of tuples considered was increased in 15 × 10 3 each time from 15 × 10 3 to 90 × 10 3 . As shown in the Figure, when C I , the embedding applying RRE is performed almost in the same time as when RRE is not considered.
Finally, depending on the parameters used to generate C I , the process can be very costly. Figure 30 shows how the cost is even higher than performing the embedding when λ I = 1 and λ II = 3.
The main difference between steganography and watermarking is that the first one does not require robustness considering the hidden message is transported by the digital asset to be protected, while others ignore its presence [24], [30]. If secrecy is compromised, the hidden message can also be easily decoded. Contrary, watermarking techniques must implement the public system requirement following Kerckhoffs' principle [31], [32], which establishes that the cryptosystem security must rely on the secrecy of its parameter values (in particular, the cryptographic keys) and not in hiding its details. For this reason, watermarking techniques must guarantee robustness and security [33], [34], considering protected digital assets are expected to be attacked, and the watermark embedded into them has to resist the malicious operations trying to remove the marks or compromise the watermark detection [6], [35].
Watermarking techniques have been applied over different data types with cybersecurity and Internet development [36], [37]. The first approaches were proposed to protect multimedia data. Among techniques for multimedia data, some has been specialized on copyright protection and tamper detection on video [38]- [40], audio [41], [42] and images [43], [44]. Later some approaches oriented to protect documents [45], [46] or textual content stored in relational databases [1], [47] emerged. Other techniques have been created to protect source code, and software [48], [49]. Their diversity regarding data types and protection intents is wide.
We are interested in watermarking techniques developed for relational data protection. Differences between multimedia and relational data are such watermarking techniques created for multimedia data protection cannot be applied to relational data, especially when they implement sequential watermark embedding. Indeed, a watermarking technique developed for relational data protection that sequentially embeds the watermark might severely compromise its detection, especially in the case in which data are reordered as a consequence of both a subset reverse order attack or a benign update.
The first relational data watermarking approach was proposed in 2002 by [19]. The authors introduced the AHK algorithm in this work, defining how to analyze the values stored in a database's relation for watermark embedding. Many techniques have been proposed to protect relational data from that moment on and different classification criteria have been defined to facilitate relational watermarking approaches study. For instance, relational watermarking techniques can be classified as distortion-based, and distortion-free [6]. Distortion-based watermarking approaches introduce small changes in the relation's content without affecting its usability. Among distortion-based techniques, there are schemes focused on returning the data to its original quality once the watermark is extracted. This subset of distortion-based technique is defined as reversible [50]- [52]. On the contrary, distortion-free techniques aim to preserve the integrity of the protected data [53], [54]. Usually, distortion-free techniques are defined as fragile, while distortion-based as robust approaches, to the extent that the embedded information survives at malicious or accidental attempts to remove it [1], [55]. Relational watermarking techniques can also be classified by their (i) cover-type, defining the type of data of the attribute in R selected to embed the marks; (ii) intent, i.e., ownership protection [1]- [3], data tampering detection [4], [5], [56], traitor tracing [57]- [61], among others; (iii) watermark source, which can be meaningless such as a random binary stream [19], [31], [62], or meaningful, i.e., a source for watermark generation presenting a meaning that does not depend on the watermarking technique [17], [63], [64]. Regarding the cover-type, some techniques embed the watermark into attributes storing categorical values [65]- [68], information regarding date or time [69], textual [1], [47] or numerical content [17], [70], [71]. Note that the relational watermarking approach we propose in this work is distortion-based, oriented to protect the ownership of the data to which it is applied. As presented in the sections above, it exploits meaningful information as the source to generate the watermark. Moreover, we validated our approach embedding the watermark into numerical attributes; however, it is not limited to a particular cover-type. It can be used to improve techniques marking different types of attributes.
Concerning IBW techniques, to avoid consequences of sequentiality in watermarking approaches, it has been substituted by the pseudo-random selection of marks and embedding positions, as proposed in [1], [7]. A relational watermarking technique leveraged with pseudo-randomness has been proved to be robust against subset reverse order attacks, but it introduces a new issue to address. Indeed, the random nature of pixels selection leads to the multiple selection of certain meta-marks, ignoring the others available in the source. On the one hand, these techniques can contrast the effects of update-based actions (malicious or not) when majority voting is implemented during the watermark extraction process. However, they lead to the partial embedding of the watermark, exposing the protected data to leaks due to the watermark degradation. Our approach face this issue monitoring the random component integrated in the embedding process, as presented in Section III.

VII. CONCLUSION
Pseudo-random selection of meta-marks and embedding places in database relations is a perfect way to overcome vulnerabilities of watermarking technique against subset reverse order attacks. While pseudo-random allows some meta-marks been used multiple times (recurrent selection) for generation and embedding of marks during the embedding process, this contributes to overcome minor update attacks if during the extraction process a majority voting is performed over each meta-mark candidate values. As well, chaotic nature of pseudo-random selection contributes to increase difficulty for attackers to find and delete or overwrite the marks.
Nevertheless, one important downside of recurrent selection is that, while some meta-marks are selected excessively, others are ignored, resulting in a partial use of the watermark source. Then, the attacks can be low-degree featured, guaranteeing watermark signal compromising while keeping data quality, since the embedded watermark signal is already weak. In this work, we proposed a recurrent meta-mark selection control engine (RRE) to limit the number of times a meta-mark is selected, increasing the opportunities for others to be considered, resulting in the increment of watermark capacity. The engine mixes the nature of probabilistic and chaotic frameworks, making hard for attackers to predict marks positions and increasing in a diverse way the positions of considered meta-marks.
Experimental results depict an important increment of the watermark capacity, and also clarify the role played by the number of meta-marks being considered according to the number of times the embedding is performed. Also, in terms of complexity, the experiments show an increment of the time required by applying RRE directly linked to the number of data being protected, with minor discrepancies with respect to techniques not considering RRE, which makes our proposal feasible for implementation.
As a future work, there is space for further optimizations related to the chaotic seeds generation without compromising the probabilistic features of the approach. AGOSTINO CORTESI is currently a Full Professor in computer science at Ca' Foscari University of Venice, Venice, Italy. He has extensive experience in the area of static analysis and software verification techniques. In particular, he contributed to the design and practical evaluation of abstract domains within the Abstract Interpretation Framework. He coordinates the MAE Italy-India Project (2017-2020) ''Formal Specification for Secured Software System.'' VOLUME 10, 2022