Digital Watermark-Based Independent Individual Certification Scheme in WSNs

With the development of wireless sensor networks (WSNs), data security has gained considerable attention. Since the sensor node is in a special environment with small volume and limited storage space, it is highly likely to be attacked illegally when transmitting the data. This paper proposes a watermarking scheme based on individual certification for sensor nodes. The data is grouped in a sensor node. A watermarking bit is generated by associating several other data items within a group, and is embedded in the group as a space. At the sink node, the extracted watermarking bit is compared with the watermarking bit generated by the same way in the sensor node to determine whether the data item has been tampered. The watermarking bit embedded in the proposed scheme is extremely difficult to be found in the transmission process. Meanwhile, it does not change the original data. The experimental results demonstrate the proposed scheme has superior performance compared with other certification schemes in WSNs.


I. INTRODUCTION
Wireless sensor network (WSN) is usually composed of sensor, sink and management nodes. The sensor nodes generally collect specific data in certain environments. Then the data collected from the sensor node is transmitted wirelessly to the sink node. WSNs can be applied in many fields because of the characteristics of sensor nodes and wireless connection. For example, in the military field, WSN can be used to monitor equipment in enemy areas, battlefield conditions in real time and locate targets. In industrial safety, sensor network technology can be used in dangerous work environments. For example, sensor nodes can be placed to monitor the safety status of the work environment and provide security for the staff in coal mines and nuclear power plants.
However, the data can be easily obtained and modified by an attacker in the process. False data would result in incorrect decisions. Therefore, the data should be verified at the sink node. Most approaches of data validation are holistic The associate editor coordinating the review of this manuscript and approving it for publication was Ting Yang.
validation of a group of data. If the watermark does not match, the set of data is discarded [1]- [4].
For example, Kamel and Juma [1] introduced the FWC-D algorithm, which divides the data into groups of fixed size and embeds the watermark into the LSB. The two embedded methods that can be adopted are embedding the watermarking bit into the front group or the next group for transmission. Once the data is tampered with, the entire set of data is rejected by the receiver sensor. Zhang et al. [2] proposed a scheme that authenticated the identity of the data and the node by digital watermarking technology. The first step is that the witness node sends the witness watermark w i and its own serial number of key p to the cluster head. The second step is that the base station verifies the authenticity of the received data. Sun et al. [3] found that there would be redundant space for data collection and storage. Therefore, the watermarking bit was embedded into the redundant space and transmitted to the base station with the data. Guan and Chen [4] proposed a solution to embed the whole watermark into the data set. To validate the grouped data, only the change in the watermark is detected, and the grouped data is discarded. In order to avoid destroying the original data, Kamel and Alkoky [5] proposed a data sorting method based on watermark. Thus, the watermarking bit is not embedded into the data. Zhou and Zhang [6] embedded the data and the watermark information together into the hash sequence, which avoided destroying the data transmitted by sensor nodes. The goal is to minimize corruption of data values. Shi and Xiao [7] proposed that in the case of unsuccessful data group validation, the data group could be divided again for double verification. Later, Shi [8] abandoned the concept of grouping and introduced the idea of queue. As each datum passed through the queue, a watermarking bit was generated in the queue and embedded into the data. In this way, the watermark verification problem caused by grouping was avoided. Li et al. [9] proposed a new idea and considered the sensor nodes distributed in each environment as the distributed pixels in the image to generate distributed watermarks. This was a reversible solution in which the original data was calculated according to the difference vector. Lalem et al. [10] conducted simulation experiments on watermarking bit embedding using the linear interpolation. Hameed et al. [11] proposed to utilize the characteristics of the data itself to generate and encrypt the watermark, and then transmit the watermarking bit with the data to the authentication node. Baoyi et al. [12] proposed a WSN secure communication solution with less time overhead for electric transmission line based on digital watermark. Wang et al. [13] proposed an effective dual-chaining watermark scheme, called DCW. In order to improve the verification of individual data, this paper proposes a new scheme. Compared with some current authentication schemes in WSNs, our main contribution is not only making the watermark embedding process more confidential, but also not changing the original data on the basis of individual authentication.
The remainder of this paper is organized as follows. Section 2 describes the proposed scheme of watermark embedding and extraction process. Section 3 proposes an experimental scheme for analysis. Section 4 analyzes the experimental results. Section 5 presents the conclusions.

II. INDIVIDUAL DATA VERIFICATION SCHEME
For simplicity, assume that the data being transmitted is numeric. A continuous data stream is formed from sensor node to sink node. The data flow from the sensor to the sink node is set as S, and the data collected by the sensor node in the environment is set as a data item s i . Sensitive data and time collected are included in the data item.
The scheme presented in this paper adopts part of the idea of Shi [14]. The basic process of the proposed scheme is as follows. The sensor node caches a data group with a packet length of N . In this group, M data items are selected for watermark calculation. Each data item participated in the watermark calculation for M times and the resulting watermark value is folded into one bit. Thus, N watermarking bits are generated and embedded into each data group. The data is grouped at the sink node. The watermark is calculated and folded in similar manner as the watermarking bit embedded. Data tampering can be identified by changes in the watermark.

A. WATERMARK GENERATION AND EMBEDDING
In the sensor node, the collected data is processed. A data group is represented by D, including N data items and can be expressed as Step 1: Generate an N × N matrix B according to the group length N . Ensure that each row and column has M elements of 1 and the rest of the elements are 0. The B matrix is used to ensure that each data item can participate in M hash operations. Assuming N and M are equal to 4 and 2, respectively, matrix B is as follows: Step 2: Use D to calculate the watermark. The N items in D are multiplied by B matrix, and 1 ×N matrix is obtained. The purpose is to ensure that each element value in 1 ×N matrix is associated with M data items.
Step 3: Compute the hash value of each member of a 1 ×N matrix, denoted as H , of the candidates using the secure hash function MD5. The binary representation of Bitwise XOR operation is performed to fold H into a one-bit watermark denoted as w j . The watermark value of the j th element is represented as w j . The calculation formula is as follows: where j = 1, 2. . . N . Using the above formula, N bit watermarks can be obtained.
Step 4: Convert the group of data Where C(s i ) represents the function that converts the double numeric data s i to character data. If the watermark value is '1', embed a space in the set of data s i . If the watermark value is '0', do not embed the watermarking bit into the data. Attain Where Q(s i ) represents the function that embed watermarks into data items.
In wireless sensor networks, the data collected by sensor nodes in a certain time cycle are not immediately transmitted to the sink nodes because the data must be converted to a character type before the watermark is embedded. Finally, at the sink node, the data is verified. The flow of the embedding model is shown in figure 1.
In the proposed watermark embedding algorithm, zerosones(N , M ) represents the N × N matrix B, in which the number of elements of 1 is M for each row and column. The function, Hash(data), generates a fixed-length Hash value for the data. Folding(H ) means that the data is folded into a one-bit watermark by bitwise XOR operation. The MATLAB function strcat(a, b) concatenates strings horizontally, while VOLUME 7, 2019  sprintf(data) formats the data into string. The algorithm process is described in Table 1.

B. WATERMARK EXTRACTION AND COMPARISON
After the data is received from the sensor nodes, it is grouped in the same way as the watermarking embedding process.
Step 1: Check for a space at the end of the character data. If there is, the watermark '1' is extracted. Otherwise, extract the watermark '0'. Denote the result of judgment as W q Step 2: Convert character data to double type. The data set is denoted as D .
Step 3: Compute the double data to obtain the watermark as the same way as the watermarking embedding process, which is denoted as W h .
Step 4: Compare W q with W h , if they are not equal, they are marked as '1' in the Res array, otherwise they are marked as '0'. Thus, N results of the comparisons are obtained in the Res array.
The sink node receives the data from the sensor node. If the element in the Res array is '1', the corresponding data has been changed. The watermark extraction model is shown in figure 2.
In the watermark extraction algorithm, Str(data) is a function that converts the character data to double data. Generation (data) means watermark is generated by the same way as the watermarking embedding process. The watermark extraction algorithm is shown in Table 2.

III. EXPERIMENTAL SCHEME ANALYSIS
Two indicators are used to measure the performance of the proposed scheme. While comparing the predicted results with the real results, some errors in judgment are expected. The case where the real result is incorrect but the prediction is correct, is called false negative, while a false positive is when the real result is correct but the prediction is incorrect.

A. THE THRESHOLD T
A grouping Q in the data stream cached by the sink node contains N data items s i , s i+1 . . . s i+N −1 . If data item s i has been tampered with, M watermark bits are associated with s i in the N watermark bits are generated by calculation in data set Q. Due to the folding operation of hash values, the result of the data item changes with a probability of 1/2. The watermark value may change from '0' to '1' or from '1' to '0', and the change in M bits are distributed independently and identically.
Assume Y is the number of flips in M bits, the probability distribution of Y is as follows: where k = 0, 1, 2, 3. . . M . When k is close to 0, the probability P(Y = k) is small enough to be ignored. Define a threshold T as the maximum number of flips. If Y <= T , the data item s i is considered unchanged.

B. FALSE NEGATIVE RATE
The false negative rate (FNR) of experiment is the ratio of the tampered data to the tampered data that can be detected. If the number of watermark flips associated with the tampered data is less than T , then the data item is not detected. Assume the number of the misjudged data items and the total number of tampered data items are denoted as X and Xnumber, respectively. The false negative rate can be expressed as follows: (2)

C. FALSE POSITIVE RATE
A data item that has not been tampered with is misjudged as a tampered data item, because the watermarking bits of the data item are affected by the tampered data. Therefore, M watermark bits associated with the data item will have a 1/2 probability of flipping. If the number of flips in the M watermarking bits that are associated the data item is more than T , the data item will be judged as tampered data. Assume the numbers of the data items that has not been tampered with and the misjudged data items are denoted as Vnumber and V , respectively. The false positive rate can be expressed as follows:

D. DETERMINATION OF M VALUE
It is necessary to multiply the data and the matrix B to obtain the value either in watermark embedding or watermark extraction and then calculate the watermark value. There are M elements of '1' in each row and each column in matrix B, which represents that the calculated value of D multiplied by B is related to M data items. Therefore, value of M should be appropriate. If watermark value is associated with more data items, the FPR will be high. If the amount of data associated with watermark value is smaller than FNR will increase. An appropriate value is assigned to M to ensure a balance between FNR and FPR. The test for the M value is shown in figure 3. The ratios of M to N (RMTN) are 35%, 50%, 65% and 80%, for figure 3 (a) to (d), respectively. Table 3 is obtained by integrating the data in the figure. Table 3 shows that the sum of FNR and FPR decreases gradually until the ratio of M to N is equal to 0.65. When the ratio of M to N reaches 0.8, the sum of FNR and FPR  starts to increase. Therefore, it is appropriate to choose the ratio of M to N as 0.65.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
In this section the proposed scheme is demonstrated through MATLAB simulation experiments. The data stream used in the experiment comes from real wireless sensor network in Intel Berkeley Research Laboratory [15], in which the sensors are deployed to collect time stamped information such as humidity, temperature and light intensity. In order to make the experiment simple and clear, a single feature (the temperature) is selected for the experiment. Some obvious errors are removed.
There are usually three ways to tamper with data: insert new data elements, delete data elements and modify data elements. Since the sensor node uses a fixed time interval sampling method, it can be identified whether data is inserted or deleted. Thus, all types of tampering can be seen as modification.
In order to calculate both FNR and FPR, the data stream D is modified at the tamper rate r. Then D × r data items in the data stream D are tampered. The aim of the experiment in this section is to compare the proposed algorithm with the mainstream independent individual verification algorithms [8], [14]. Since in the case of different tamper rate r, the experimental results of three schemes are almost the same.  Therefore, the other three different parameters, the length N of data group, the ratio of M to N and the threshold T are used to test the scheme proposed in this paper. Only the effect of one parameter is tested in each set of experiments. The algorithms proposed in [14] and [8] do not consider the LSB in the process of watermark calculation and embeds the watermarking bit in LSB. When the data is tampered, the LSBs are also likely to be modified, resulting in the instability of the entire algorithm. In figure 4, the ratios of tampered LSBs to the total tampered data (RTLSBTT) are 20%, 40%, 60% and 80%, for figure 4 (a) to (d), respectively. Figure 4 shows the trends of FNR and FPR of the three schemes with different packet length N . The abscissa is the grouping length of the data stream from 30 to 100, while the ordinate is the value of FNR or FPR. The figure shows that the FNR of the scheme proposed in this paper is better than the schemes proposed in [14] and [8] in all four cases. Although, the FPR has a small increase, in the case of tampering with different proportions of LSB data, the values of FNR and FPR of the algorithm proposed in this paper are relatively stable compared with those of the algorithms proposed in [14] and [8]. For example, when the packet length is 30, and RTLSBTT are from 20% to 80%, the values of FNR and FPR of the proposed algorithm remain around 0.1 and 0.2, respectively. On the contrary, the values of FNR and FPR of the algorithms proposed in [14] and [8] fluctuate greatly. Meanwhile, the sum of FNR and FPR of the algorithm proposed in this paper is less than those of algorithms proposed in [14] and [8]. Figure 5 shows the false negative rate and the false positive rate of the three algorithms under different threshold T . The threshold value T is the key to determine whether a data is tampered or not. When the number of flips in M watermark bits related to the data item s i is less than the threshold value T , it is considered that the data item is not changed. Otherwise, the data item is changed. It can be seen from the figure that the FNRs of [14] and [8] increase substantially with the increase of the data amount tampered by LSB. However, the trends of FNR and FPR of the algorithm proposed in this paper do not significantly fluctuate in this case. For example, when the threshold value T is equal to 3, more and more data items are tampered with LSB, the values of FNR and FPR of the algorithm proposed in this paper remains around 0.05 and 0.4, respectively. However, the values of FNR and FPR of [14] range from 0.1 to 0.6 and 0.4 to 0.1, respectively. Also, the corresponding values of [8] also range from 0.1 to 0.7 and 0.4 to 0.1, respectively.
In order to further illustrate the superiority of the proposed algorithm, the ratio of M to N is changed keeping the other parameters unchanged. With the change of the ratio of M to N , the FNR and FPR of the algorithm proposed in this paper remain relatively stable in the case of different RTLSBTT. Figure 6 shows the FNR an FPR of the three algorithms under different ratios of M to N . For example, when the ratio of M to N is equal to 0.2, the values of FNR and FPR of the algorithm proposed in this paper are around 0.7 and 0.1, respectively. However, the FNR and FPR of [14] range from 0.8 to 0.9 and 0.1 to 0, respectively. Also, the FNR of [8] range from 0.1 to 0.8. The value of FPR is relatively stable in [8].
It can be seen from Tables 4, 5 and 6 that the sum of FNR and FPR for the algorithm proposed in this paper maintains a relatively stable state with the increase of tampering LSB proportion. As shown in Table 4, the sum of FNR and FPR for the proposed algorithm remains around 4.0, while those rise from about 4.0 to 6.8 in the algorithm proposed in [14]    and from about 3.6 to 6.6 in the algorithm proposed in [8], respectively. Under the comparison of different thresholds T shown in Table 5, the sum of FNR and FPR for the algorithm presented in this paper is about 2.3, while those rise from about 2.0 to 4.0 in algorithm proposed in [14] and from about 2.5 to 4.4 in algorithm proposed in [8], respectively. The data in Table 6 shows that the algorithm presented in this paper is more stable than the algorithms proposed in [14] and [8]. The sum of FNR and FPR for the algorithm proposed in this paper remains around 4.0, while those rises from about 4.0 to 7.0 in algorithm proposed in [14] and from about 4.6 to 6.7 in algorithm proposed in [8], respectively. Table 7 provides the comparison of the algorithm proposed in this paper with other wireless sensor network data authentication schemes. The following aspects are mainly compared in the table.
1) Whether the object of verification is a group of data items or a single data item.
2) The object to be embedded in data group.
3) Whether the current data stream is grouped or not. 4) Watermark generation method. 5) The location where watermark is embedded.
In general, Table 7 shows that the proposed scheme is superior to others. The proposed algorithm can verify individual data item, while most of other schemes can only achieve the validation of data group level and are unable to determine whether the individual data item has been tampered with during the data group authentication process. If only a small amount of data items in each data group are modified, the entire data group is discarded and the WSN resources are wasted. However, the watermarking technique based on LSB replacement is not used in the proposed algorithm.  Because embedding the watermark into LSB of the data item will not only affect the actual value of the data item, but also lose the basis for determining whether the data is modified, especially when the attacker modifies the LSB of the data item. In the proposed algorithm, the space is embedded at the end of data item, which does not change the value of the data item. Therefore, the embedded watermark in the proposed algorithm is more secretive for attackers compared with other algorithms. Finally, the other algorithms generate watermark using current data group and embed watermark into the next data group. However, the watermark generation and embedding in the proposed algorithm are achieved only using the current data group, which may improve the algorithm efficiency.

V. CONCLUSION
This paper proposes a verification scheme for individual data items. The watermark generated by associating other data items can accurately verify the individual data items. Several experiments are conducted on various aspects affecting the FNR and FPR. The obtained results demonstrate the superiority of the proposed algorithm. Furthermore, it is not easy for the attacker to detect the watermark in the proposed scheme. In addition, the embedding of watermark in the proposed scheme does not change the value of the data item, which can reduce the unreliability of watermark contrast caused by tampered data.
YAN XIAO received the B.Eng. degree in information science and technology from Jiujiang University, Jiujiang, China, in 2017.
She is currently pursuing the M.Eng. degree with the College of Information Technology, Jiangxi University of Finance and Economics. Her research interests include computer image and video processing, information security technology, and data mining.
GUANGYONG GAO received the Ph.D. degree from the Nanjing University of Posts and Telecommunications, Nanjing, China, in 2012.
He is currently a Professor with the School of Computer and Software, Nanjing University of Information Science and Technology. His research interests include computer networks security, multimedia information security, and digital image processing. VOLUME 7, 2019