False Event Message Detection Robust to Burst Attacks in Wireless Sensor Networks

A sensor device can be used to detect target events at low cost. Moreover, there is a significant risk of sensor nodes being compromised or captured within large wireless sensor networks (WSNs). The transmission of valid event messages to users by a WSN can be hindered by network congestion due to false event messages in a compromised node. Several methods are available for detecting false event messages. However, they rely on long bits of authentication codes and hence do not provide a fundamental solution to prevent network congestion. In the proposed method, various hashing vectors, which are spaceefficient data structures that can determine whether the given data are an element of a set, are created based on the correct combination of authentication codes and placed in each node in advance. An event message contains an XOR of the authentication codes, and each node verifies it based on its hashing vector. If a node is acquired illegally, the information of the hashing vector and the XOR information of the authentication codes assigned to the correct event message is compromised, so we propose an algorithm to update the information securely. Compared to existing research, the number of hops required to detect a false event message increases by only about one hop, but the amount of traffic that a malicious node can generate can be reduced by about 60% or more. In other words, the proposed method effectively reduces the amount of traffic an attacker can generate with false event messages, which also reduces the overall network congestion.


I. INTRODUCTION
A S A CORE functionality, sensor devices identify and report events, e.g., trespassing or forest fires, and a wireless sensor network (WSN) is formed by connecting such sensor devices in a wireless network [1], [2], [3], [4]. Multihop wireless paths serve as the medium for the message in a WSN to reach its last recipient, the sink, following the detection of events of interest and transmission of the message by the sensor nodes (Fig. 1). Nonetheless, when sensor nodes are used in hostile environments, they become susceptible to attacks. An attacker can compromise and capture multiple sensors. Network congestion can occur, as such compromised nodes can generate numerous false event messages (hereinafter, false messages) following the obtention of the secret keys stored in these nodes (Fig. 2). This network congestion can prevent the transmission of a valid message to the sink, thereby obscuring the hacker's crime.
The rapid identification of false messages in-network is a concept explored in related studies. Network congestion can be partially addressed via the detection at an early stage. Many works tackle this challenging issue [5], [6], [7], [8].
Permitting each node to possess symmetric keys encapsulates the fundamental idea. Multiple message authentication codes (MACs) are contained in a report collectively produced by T proximal sensor nodes upon an event occurring. A node's agreement with the report is reflected by a MAC that is produced by a node using one of its symmetric keys. Further authentication information is incorporated into every message using multiple methods. The probabilistic validation of MACs contained in the report is enabled by every forwarding node within the process of a report being forwarded to the sink over several hops. Delivery does not occur when insufficient MACs are contained in a given report. As every message comprises significant quantities of authentication information, the quantity of traffic in every message is substantial despite their ability to identify false messages in-network. Therefore, the production of multiple false messages caused by a compromised node causes network congestion. Further, a significant proportion of the battery of sensor nodes is consumed due to legitimate messages. Consequently, the lifespan of a WSN diminishes as the sensor nodes function with a restricted battery. In addition, when more than T nodes are compromised, certain methods cause a total breakdown of security.
When more than T nodes are compromised, a streamlined method proposed in this study demonstrates a high likelihood of success in the detection of false messages innetwork. Although this method significantly decreases the probability of network congestion, as the message length can be substantially reduced, this contemporary, cutting-edge method generates almost the same total message hops before detecting a false message.
Before the deployment, the sensor nodes are preloaded with developed information regarding an exclusive OR code (EOC), which enhances false event detection in typical cases and cases in which the sensor nodes leak more than T nodes. The proposed method's ability to reduce the total traffic when compromised nodes generate numerous false messages is evidenced by our mathematical analysis.
The main contributions of the paper are as follows. 1) First, we propose an algorithm to detect false event messages in WSNs with high probability. In our algorithm, each message contains event information and a one-time authentication code (EOC; exclusive OR code). The bit length of the proposed one-time authentication code is shorter than the authentication information used in existing studies. Thus, the required amount of traffic required for event messages can be reduced compared to existing methods. As a result, the proposed algorithm can prevent network congestion caused by malicious nodes. As mentioned previously, the goal of this study is to reduce traffic volume and network congestion in WSNs. 2) Second, each forwarding node can determine whether the one-time authentication code in the forwarding message is legitimate based on its hashing vector. A hash vector is a data structure that can determine whether the given data is an element of a set, thereby allowing a few false positives and achieving high spatial efficiency. The network manager can control the detection ability of hashing vectors. Here, when the target detection probability is set to 0.5, a false event message can be detected in two hops on average. 3) Third, an update mechanism of the one-time authentication code is also proposed. This update mechanism disables one-time authentication codes that the attacker may have obtained and updates them to new codes that cannot be predicted by the attacker. The update mechanism is a novel feature required only for the proposed method, and the ability of the proposed method to detect false messages using one-time authentication codes is almost even with existing methods. Note that the update mechanism does not require much traffic, and the overall traffic volume of the proposed method, including the update mechanism, is very low compared to existing methods. 4) Fourth, we conducted experiments and compared the proposed algorithm with four existing algorithms. The experimental results demonstrate that the proposed method can reduce the maximum amount of traffic an attacker can produce. We present the subsequent sections of this study in the following manner: we portray a model of sensor networks and false events in Section II, explore related methods and their issues in Section III, illustrate the design of our method in Section IV, explore the efficacy of our method alongside the parameter settings in Section V (the findings from the previous section serve as the basis for assessments), and summarize this study in Section VII. The proposed method reduces the amount of traffic required to detect false event messages compared to existing studies while maintaining the ability to detect false event messages.

II. RELATED WORK
Secured data aggregation in WSNs has been a subject of inquiry in various studies [9], [10]. Although in-network false events cannot be detected via the methodology in these studies, false events at the sink can be detected. Therefore, a WSN cannot detect trespassing or other such events due to network congestion compromised nodes cause.
The distribution of secret keys to sensors is enabled via a method proposed in studies focusing on in-network false event detection. Each node possessing symmetric keys encapsulates this method's fundamental concept. Each node's location serves as the basis for producing keys following deployment, or each node is preloaded with the keys. In addition, certain nodes share the same key. The MACs, key IDs, and IDs of sensors proximal to an event are carried in a report these sensors collectively produce. A node's agreement with the report is reflected by a MAC that a node produces using one of its symmetric keys. The probabilistic validation of the MACs carried in the report is enabled by every forwarding node within the process of a report being forwarded to the sink over several hops. Delivery does not occur when reports contain false MACs. The related literature endeavors to enhance the likelihood that a forwarding node possesses a key used to generate MACs in the event message.
Keys are randomly distributed to sensor nodes within false message detection methods with randomized key distribution mechanisms [5], [11]. A legitimate event can only be generated when the sensor nodes collect more than T discrete MACs. This mechanism can be used in a scenario in which the sink moves. Nonetheless, threshold behavior is displayed by the mentioned randomized key distribution mechanism. When the hacker or attacker has more than T keys, the detection of false events cannot occur.
Each sensor node's location serves as the basis for distributing keys to every node within false message detection methods using a location-based key distribution mechanism [6], [7], [8], [12]. The locations of the sink and nodes enable the static forwarding node identification for every source node. A burst of large quantities of false messages is generated by the above-mentioned methods despite their ability to detect false messages in-network, attributed to the long messages that compromised nodes can generate in such contexts. The default of T = 5 has been established in various existing studies. In WSNs, 8 bytes is typically thought to be the length of a MAC [13].
By contrast, 40 bytes is considered the standard packet size without security mechanisms in WSNs in some studies [14], [15]. Therefore, compromised nodes can generate a size that doubles the original packet in a false message, reaching up to 80 bytes in length. A method for detecting false messages effectively handles scenarios that involve the location of the sink altering and the presence of multiple compromised nodes in a certain study [7]. Nevertheless, as the method uses T MACs, attacks in the form of bursts of large quantities of false messages occur, similar to other relevant studies. Babu et al. proposed Secure Data Aggregation based on the Principle Component Analysis (SDA-PCA) algorithm [9]. A cluster head is selected in their algorithm among neighboring sensor nodes according to node quality and energy level. The cluster head gathers sensing information of the surrounding sensor nodes, compresses the information, and then sends a message to the sink. Sensor nodes with high mobility are considered malicious nodes in SDA-PCA. Such nodes are not allowed to become cluster heads. This mechanism prevents false event messages from reaching the sink. Since the criteria for determining whether a node is malicious are clear, it is easy to place a malicious node so that it does not meet these criteria. Therefore, a malicious node may become a cluster head. It is also possible for a cluster head to provide much incorrect information.
Since the algorithm assumes that the messages generated by the cluster head are correct, it cannot detect when a cluster head creates a false event message.
Wang et al. proposed a secure aggregation protocol in WSNs [10]. The protocol selects a cluster head among neighboring nodes based on link quality and remaining energy. Data confidentiality is protected because communications between nodes and cluster heads and between cluster heads and sinks are encrypted. Moreover, each node and each cluster head have a unique key, preventing spoofing. However, since only the sink can detect this spoofing, the spoofing cannot be detected while transferring the message to the sink. Therefore, if an attacker compromises even one node, he can generate many false event messages that the in-network cannot detect.
Furthermore, Pedroso and Santos [16] suggested a clustering algorithm (CONsensus Based Data FIlteriNg for IIoT; CONFINIT) that prevents false message attacks. In that algorithm, each node compares its sensing values with those of other nodes. If a node's sensing values differ significantly from those of other nodes, the node is likely malicious. In this case, other honest nodes ignore the potentially malicious node. As a result, CONFINIT tries to prevent malicious nodes from generating false event messages. However, if a malicious node succeeds in generating a false event message, it is not detected as a false event message because there is no mechanism to detect false event messages at the forwarding node or sink.
The methods introduced so far can prevent the generation of false event messages to some extent or detect them in the sink. However, they cannot detect false event messages in-network, i.e., at the forwarding node, which is the goal of this paper.
Ye et al. proposed a Statistical En-route Filtering (SEF) mechanism [5]. They proposed an algorithm that pioneered detecting false event messages in the network. Sensor nodes that detect the target event generate MACs based on the event content and node ID, and the cluster head, at minimum, collects T MACs. Keys for MAC generation are randomly deployed to each node and information about which node holds which key is not maintained. Thus, the problem is that with only T keys compromised, an attacker can generate any number of false event messages from any location.
By distributing "legitimate key ID combinations" to each node in advance, an in-network detection method was proposed for messages generated with illegitimate keys [7]. In their approach named MobiSink, when a message is forwarded to the sink, each forwarding node checks to see if the message has at least T MACs with the correct key IDs appended. The detection rate of false messages is high, but the disadvantage is that the message length becomes very long because T MACs and T key IDs must be appended to every message.
A combinatorial design-based partial en-route filtering scheme (CD-PEFS) was proposed by Kumar et al. [8]. In CD-PEFS, three cluster heads are selected in each cluster. One cluster head gathers sensing information from the surrounding sensor nodes and generates a message with T secret shares, T node IDs, and (k +1) MACs where k represents the number of keys stored in cluster heads. The value of k is 10 on average when the number of clusters is 150. As the number of clusters increases, the value of k also increases. Although the detection rate of false event messages of CD-PEFS is very high, there is a great deal of data added to the message.
Yi proposed an en-route message authentication scheme (EMAS) [12]. In EMAS, each sensing node sends the information with event data, key ID, location ID, and MAC to a cluster head. The cluster head maps T MACs into a compressed filter using hash functions. The cluster head generates a message with the compressed filter, T node IDs, T location IDs, and T key IDs. Each forwarding node can detect false event messages based on the compressed filter, node IDs, location IDs, and key IDs. Although EMAS compresses T MACs into one filter, the amount of messages is still significant because T node IDs, T location IDs, and T key IDs are necessary for the authentication.
As mentioned above, all existing research aimed at detecting false event messages in-network adds many data to the event messages for authentication. False event messages generated by compromised nodes are detected in the in-network early. However, the amount of data in each message is so large that the composed nodes can easily cause network congestion. These four algorithms [5], [7], [8], [12] are compared with the proposed algorithm in Section V.
The suggested method uses one-time authentication codes instead of a large amount of authentication, such as T key IDs and T MACs, which significantly reduces the amount of data for authentication added to messages. This one-time authentication is a novel idea not found in existing research. Of course, since one-time authentication is used, an update mechanism is required. We propose an algorithm to perform this mechanism efficiently and securely.

III. SYSTEM MODEL
We outline the model of false event attacks and the sensor network model used in this section.

A. SENSOR NETWORK MODEL
Multiple small sensor nodes deployed at high density comprise the sensor network examined in this study. Adaptation to node failures and enhanced detection accuracy are enabled using multiple nodes to detect an event. A node is selected to be the center-of-stimulus node (CoS) while all detecting nodes report the signals they detect. The CoS collects and summates all received detection findings. A multihop path is used by the event message, which is a term used to describe the message the CoS delivers to the sink. All nodes use a localization scheme to identify their geographic location once they are deployed [17].
Given that the sensor nodes' design followed costeffectiveness principles, we presume that they do not possess tamper-resistant hardware. Following deployment, we also presume that no movement occurs in the nodes. The literature reflects these suppositions [18], [19], [20], [21].
Symmetric-key cryptography serves as the basis for most WSNs due to the computationally expensive nature of cryptographic asymmetry based on public-key schemes [22].
In addition, we presume that every node can validate the sink's message, and the sink can validate its messages to the sensor nodes via a secure mechanism, such as μTESLA, thereby ensuring that the sink cannot be compromised [23].
We assume that the number of legitimate events is relatively small, e.g., a few times per month. The target events are assumed to include forest fires and intrusions by attackers, and such events do not occur frequently.
Possible applications for the proposed method include intruders, arson, theft, and other application scenarios that detect human crime. These applications are one of the example scenarios in WSNs. In addition, human crime is not expected to occur frequently. Thus, the proposed method works effectively for such applications. In other words, the amount of traffic required by the proposed method required is very low compared to existing methods. If the frequency of events is high, the number of updates of the one-time authentication code increases, and the amount of traffic required by the proposed method increases. This reduces the advantage of the proposed method in that it requires less traffic than the existing methods.

B. ATTACK MODEL
A network can have several sensor nodes compromised by a hacker within it. An attacker can load secret keys attained from distinct nodes onto a compromised node following the exposure of all codes, data, and secret keys resulting from a sensor node being compromised. Compromised nodes can seemingly detect a proximal event when no such event exists. These false reports, which can cause a user to make harmful decisions, can induce network congestion. We assume that WSNs are equipped with replicated node detection methods [24]. In other words, if an attacker takes more than one node, he/she has complete control over that node but cannot replicate many malicious nodes. An attacker can generate arbitrary event messages. However, the forwarding node may detect an attackergenerated event message as illegitimate since each forwarding node checks whether the event message is legitimate. The attacker is free to use the information from the compromised node to mislead the forwarding node into assuming that the attacker-generated event message is legitimate. Although an attacker can compromise multiple nodes, we assume that the number of compromised nodes does not exceed 100 since it is difficult to realistically expect an attacker to compromise more than 100 nodes without the system administrator being aware. An attacker can also intercept communications between normal nodes. Table 1 lists the notations used in this study. We used a regular geographic grid to separate the terrain. T nodes in the same grid can generate an event message.

A. SUMMARY OF THE PROPOSED METHOD
The variable i represents the n i 's node ID, and n i represents the sensor node. The sink generates code r i available only once for each node n i in advance. When creating an event message, nodes cooperate to create exclusive OR code (EOC) R from T codes in the same grid. Please note that the sink can know all possible legitimate combinations of T codes in advance. In other words, the sink knows all possible legitimate EOCs in advance. Even if an attacker compromises T nodes, the attacker cannot create a legitimate EOC when all compromised nodes are not on the same grid because a legitimate EOC needs to be created from T nodes in the same grid.
Each node is preloaded with a hashing vector created from all legitimate EOCs. Since the hashing vector can check whether an element is included in the set from which the hashing vector was generated, each forwarding node can detect a false event message (i.e., an event message with nonlegitimate EOC) with a certain probability.   Here, there are two main challenges. First, if all nodes have the same hashing vectors, the detecting ability is the same among all hashing vectors. Hence, if a false event message is not detected at the first hop, it is not detected until the end. The second challenge is that the EOC information of a legitimate event message could be leaked to the attacker. Therefore, the update mechanism is necessary.
This process and the relationship between the process,  Tables 2 and 3. The following section describes the detailed process of the proposed algorithm. It also describes an approach to solving two issues.
The number of codes assigned to nodes in a grid is represented by q(q ≥ T), and g represents the number of grids. A node is given to each of the prepared qg codes-r 1 , r 2 , . . . , r qg . The variable b represents the codes' bit length. Methods involving local leader election [25], such as an event's occurrence, identify the leader node, n w , within the nodes in a grid, G w . The leader node, n w , receives

FIGURE 5. A hashing vector's execution process.
a report from its subordinate nodes when they detect an event. By adopting the XOR operation of the T codes selected randomly from the received keys, the leader node generates EOCs.
Determining whether an EOC is valid is facilitated by the information each node receives. In the scenario in which an attacker compromises more than T nodes, the false event in-network can be detected. Nonetheless, as aforementioned, the method proposed in this study and all existing related studies cannot detect false messages when a hacker compromises a T node in a grid. Nonetheless, locationbased key distribution or the mechanisms used in this study prevent the establishment in other grids of the message's occurrence point.
The codes are used to generate an EOC. In other words, the EOC is generated by the XOR operation of T codes assigned to T nodes. The same code cannot be used again, as the forwarding of the message containing the EOC could be a compromised node. Therefore, the codes used to generate the EOC must be updated via a suitable method.

B. PROCESSES
First, each node creates its code (Section IV-B1). This code is used to generate a one-time authentication code (EOC) when nodes detect events. Each node stores a hashing vector (Section IV-B2) created from legitimate set of EOCs in advance at the sink node. Here, when an event is detected, several nodes collaboratively generate an event message containing an EOC (Section IV-B3). A forwarding node then determines whether the event message is legitimate using the hashing vector stored in the node and the EOC attached to the event message (Section IV-B4). Finally, the sink determines if the received message is correct (Section IV-B5). The details of this process are described in the following.

1) CREATION OF CODES
The variable G w represents a grid with an ID of w. Each node n i has a symmetric key k i , which is shared with the sink. After deployment, node n i generates cord r i based on the ID of the grid where node n i exists. The variable g represents the number of grids. Each grid has q codes. One of these codes is given to each node in the grid.
The grid that nodes are in determines the codes the nodes can create based on the proposed method. C and (X 0 , Y 0 ) denote the grid size and grid's reference point, respectively. A reference point can encapsulate an arbitrary point. Grid G i can be expressed as follows when (X x i , Y y i ) is its central point: The nodes are preloaded with the reference point (X 0 , Y 0 ), a hash function, h, a shared master secret key, κ, and grid size, C. A method also used in related studies allows the location of each node to be identified following deployment [26]. A noncomplex calculation allows each node to identify its grid's central point ( The equation below enables the calculation of the code r i of node n i : The rank of ID i among all node IDs in ascending order in the same grid is conveyed via the variable (i), and concatenation is represented by ||.
The node ID is broadcasted by each node n i to one another in the grid G w following deployment. The rank of the node n i 's ID i is ascertained by node n i , which assembles IDs in ascending order.
Every pair of neighboring nodes shares a unique pairwise key. The variable k i,j represents the pairwise key of nodes n j and n i .

2) HASHING VECTOR
Every node uses a hashing vector to generate information about the EOCs. A Bloom filter is an equivalent concept to that of a hashing vector [27]. H[A] denotes the denomination used to represent the hashing vector generated from set A. Using α and H[A], where the former represents an element, α is a member of set A that can be ascertained. Nonetheless, false positives can occur, as α is not a member of A, yet H[A] states that α is a member of A. Fig. 5 portrays a hashing vector's execution process. The hash function h with a range of 1, . . . , m is selected following the assignment of a vector, V, of m bits, all initially set to 0.
The bit at position h(a) in V is set to 1 for every element a ∈ A where a bit could be set to 1 many times. The bit position h(α) is checked when a query for α is made. It is determined that α is not in set A when the value is 0. Although the following inference could be incorrect, when the value is not 0, α is considered to be in the set. If incorrect, the result is a false positive. An acceptable probability of producing a false positive can be facilitated by selecting a suitable parameter, m.
After hashing all n elements of a ∈ A into a hashing vector, the probability that a particular bit is still 0 is The length of the hashing vector's bits and the number of elements in a target set are represented by m and n, respectively. The target FPR is set to FPR t . By solving the equation of FPR t = e −n/m with regard to m, we have Before deployment, hashing vectors are created at the sink from all possible EOCs that may be generated in the WSN. The specific EOC preparation is described in the next subsection. Provide each hashing vector to each node before deployment. The collective filtering power across the forwarding can be substantial, even if the number of false positives is significant. However, indeed, it does not make sense to have the same hash vector at all nodes. Therefore, each hashing vector v i at node n i is generated by a unique seed, s i . Further, the hashing vector, v i , is generated and checked using h(s i ||α) instead of h(α) as a hash value of α. EOCs are equivalent to elements in the proposed method. Pseudorandom number generators are used to generate s i [28].

3) REPORT GENERATION
When an event occurs in grid G w , {r i , E} embodies the form an event report takes after every node n i that identifies the signal and prepares such a report. The description of the event is denominated by the variable E. A node n i creates a message described as follows: The encryption of the message, r i , using key k i,u is represented by the variable Enc(k i,u , r i ). The ID of the leader node of grid G w and the stream concatenation are represented by u and ||, respectively. Subsequently, node n i delivers M i to n u , the leader node. The leader node extracts and gathers all Enc(k i,u , r i )||E located in the detecting nodes. T reports are selected randomly when the leader node collects more than T reports. The term M i 1 , . . . , M i T is representative of the T reports. In addition, the term r i 1 , . . . , r i T is attained after the reports are decrypted by node n u , which then creates the following: The XOR operation is denoted by ⊕. The resulting R w is the grid's EOC.
The leader node's neighbor is denoted by node n v in the final report to the sink, represented by M = E||Enc(K u,v , R w ) and delivered by the leader node.

4) EN-ROUTE FILTERING
When node n j sends the event message M = E||Enc(K i,j , R w ) to node n i , the decryption of Enc(K i,j , R w ) enables node n i to attain R w . Subsequently, the node uses the hashing vector to determine whether the EOC R w is valid. The message is dropped if the node deems the EOC R w false. The message is delivered to the next node, n s , when the node determines that the EOC R w might be legitimate.

5) SINK VERIFICATION
Given that all information about EOCs is contained in the event message delivered to the sink, the sink can validate the EOCs. A message is considered generated by a hacker when it fails this validity assessment.

C. CODE UPDATE
The codes used to generate an EOC in a message must be updated if a legitimate message is delivered to the sink. This must occur, as there is a chance that the EOC was learned by a compromised node among one of the forwarding nodes of the message. Given that false messages can solely be generated from the grid in which an event occurs by an attacker cognizant of the EOC, a slight delay in updating the keys is not considered a significant issue.
To update EOCs' codes, the sink regulates multiple types of data.

1) DATA REGULATED AT THE SINK
The sink regulates the two tables presented below.
RT (Table of codes and EOCs): Let RS be the set of all EOCs that can be generated in the WSN. The variable R i represents an element of RS. The codes r i 1 , . . . , r i T generate R i . The ID of the grid apportioned R i and R i is linked to the above-mentioned T codes via RT.
VT (Table of hashing vectors): Before deployment, every node n i has a hashing vector, v i , assigned to it, where the sink generates the latter. A random seed, s i , is created and used upon the sink generating the hashing vector, v i . Node n i is linked to each seed s i and hashing vector v i via VT.

2) UPDATING HASHING VECTORS AND CODE PROCEDURE
When grid G w delivers a legitimate message to the sink with the R w EOC, the sink generates the ID w of the grid apportioned R w and R w as the sink learns the T codes (r i 1 , . . . , r i T ) via RT.
The sink generates novel EOC R w = r i 1 ⊕ r i 2 ⊕ ...r i T and T codes via the use of pseudorandom number generators. Messages M j = Enc(k j , r i j )(j = 1, . . . , T) are delivered to the nodes of grid G w from the sink. Then, the sink sets R w and r i j while removing R w and r i j to update RT.
The old hashing vector v should be replaced with a new hashing vector v . However, if the sink sends v as is, even if it is encrypted, compromised nodes may decrypt the information and obtain v . Therefore, in the proposed algorithm, only different bits of v and v are sent from the sink. Subsequently, the sink delivers v i ⊕ v i , comprising distinct bits of v and v to each node. For example, assume that v = 0100100 and v = 0100001. In this case, the data {4, 6} is sent from the sink to the corresponding node. Equation (7) represents this process. The subscript "(2)" denotes the value shown in a binary number, and m signifies the bit length of a hashing vector.
The symbol & represents a bitwise AND, and represents a left arithmetic shift.
Each node that receives the message reverses the specified bits from 1 to 0 or from 0 to 1.

A. NUMBER OF HOPS UNTIL THE NODES DETECT A FALSE MESSAGE
When detecting a false message, the required mean hop count was evaluated. For example, if the hop count is three, the third forwarding node has detected the false message. The probability, p 1 , of identifying a false message in a node comprises the first step in the calculation. When no node is compromised, (1 − FPR t ) represents the target probability of detecting a false message in a node. By suitably altering a hashing vector's bit length, an arbitrary value of FPR t can be attained.
An attacker can acquire the N c hashing vectors when the N c nodes are compromised. Resultantly, elements of RS can be searched. The variable m l represents the number of elements in RS. Since q codes are assigned to each grid and T codes are selected from the q codes for generating an event message, the number of possible combinations of T codes per grid is q C T . Moreover, because there are g grids, the number of possible combinations of T codes of all grids is represented by The bit length of a code or an EOC is b. The number of possible representations of b bits is 2 b . Since the number of legitimate EOCs is m l (Equation (8)), the number of false EOCs is 2 b − m l . When an attacker compromises N c nodes, the attacker can have N c hashing vectors. The probability that N c hashing vectors cannot detect each false EOC as false is (FPR t ) N c because the false positive rate of each hashing vector is set to (FPR t ). Let m f characterize the number of false EOCs that the attacker thinks may be legitimate. The value is represented by Since there are m l legitimate EOCs and m f false EOCs that the attacker considers possibly legitimate, the probability that the attacker generates a false EOC is represented by m f /(m l + m f ). Each forwarding node can detect a false EOC with probability (1 − FPR t ) if the node receives a false EOC. Therefore, the expected probability that each forwarding node can judge the received EOC is generated by the attacker is represented by Let p h signify the expected number of hops until the forwarding nodes identify a false event message and let H represent the maximum number of hops to the sink. The probability that the ith node can identify a false event message is denoted by (1 − p 1 ) i−1 · p 1 because (i − 1) nodes do not identify it with probability (1 − p 1 ) i−1 and the ith node detects it with probability p 1 . Therefore, the expected number of hops is given by The variable H represents the maximum number of hops to the sink.

B. THE MAXIMUM QUANTITY OF TRAFFIC AN ATTACKER CAN CREATE PER GRID
The variable D represents an upper limit of the occurrence frequency of events within the same grid. The bit length of the event information is |E| and the bit length of EOC is b. When there are D event messages, the amount of the messages is represented by

C. THE QUANTITY OF COMMUNICATION TRAFFIC RESULTING FROM LEGITIMATE EVENT MESSAGES
A message related to updating codes from the sink must also be delivered by the nodes to the sink alongside an event message based on the proposed method. When nodes in a grid detect an event, up to q nodes send the event information E and its code to the leader node. Then the leader node generates an event message, which is forwarded to the sink. When the number of hops is h, the total amount for generating and forwarding the event message to the sink is given by The variable represents the expected number of bits in a hashing vector that must be altered upon the sink altering one element of RS. The variables m and n denote the hashing vectors' bit length and the number of elements, respectively.
Assume that α represents the deleting element, and β embodies the new element. We should consider the following five cases. The number of bits in the hashing vector that must be changed when the sink changes one element of the RS is different in each case. The number of bits that must be changed is 0 in case 1 because deleting α and adding β do not affect the hashing vector. Similarly, the number of bits that must be varied is 1, 1, 0, and 2, in the cases 2, 3, 4, and 5, respectively.
Please recall that R(n, m) signifies the false positive rate of a hashing vector with a bit length m and number of elements n (Equation (3)). Hence, the probability that H[S] = H[S α ] is represented by R(n − 1, m) because this situation characterizes the false positive of H[S] against α / ∈ S. Consequently, the probability of each case occurring is expressed as follows. 1 The variable H represents the mean number of hops a node makes to the sink. Equation (4) is used to calculate a hashing vector's bit length, m. The term log 2 m signifies the number of bits required to embody an arbitrary bit position of the hashing vector. The terms N and log 2 N represent the numbers of sensor nodes and bits required to embody a node, respectively. Consequently, the following expression demonstrates communication traffic quantity for updating hashing vectors and codes:

D. AMOUNT OF CONSUMED ENERGY
During the reception, transmission, and power down, the leaked current was 16, 18, and 0.01 mA, respectively, when we used mica2 Berkeley motes as sensor nodes [29]. Although these sensor nodes are not the newest ones, they are still the subject of research [30], [31], [32] because of their superior performance. We presume a voltage of 3 V and a bit rate of 19.2 Kbps are standard figures for the two units of measurement.
LT + ET represents the number of bits for sending an event message and updating EOCs. Since the bit rate is 19.2 kbps, the required time of treating (LT + ET) bits is (LT + ET)/ (19.2 · 1000).
Additionally, because sending and receiving a message requires (16+18) mA and the voltage is 3V, the required energy for sending an event message and updating EOCs is given by 3(LT + ET)/ (19.2 · 1000). When there are v event messages, the consumed energy is denoted by 3(LT + ET)/(19.2 · 1000)v.
Each sensor node consumes energy even if there are no events. There are N sensor nodes and each sensor node requires 0.01 mA. Therefore, the consumed energy in the WSN per month is represented by N · 3 · 0.01 · 3600 · 24 · 30, where 3600·24·30 signifies the number of seconds per month.
In total, the following expression describes the monthly energy consumption of all sensor nodes when there are N sensor nodes and events occur v times per month:
We assume that the target events do not occur frequently; thus, the default value for the number of events per month was D = 1, and the value of D varied from 1 to 10 in our experiments.
Here, H is the average number of hops from a node to the sink. This value varies depending on the configuration of the WSN. In the literature [5], the maximum and minimum numbers of hops were set to 100 and 10, respectively, i.e., the average number of hops was set to approximately 50. In addition, the number of hops was set to 50 in a previous study [7]. Thus, in reference to these studies, H was set to 50 in our experiments. The number of hops to the sink does not have strong effect on the evaluation because all methods (other than SEF), including the proposed method, detect the false message within one or two hops.
In this context, T or more nodes must detect the same phenomenon. Thus, if the value of T is too large, the ability to detect legitimate events is reduced. In contrast, if the value of T is too small, the success rate of attacks will increase. This tradeoff must be considered. Many previous papers [5], [7], [12] assumed a default value of T = 5, and we followed this parameter setting in reference to these previous studies. Note that the value of T varied from 1-10 in our experiments.
The number of bits required to represent the contents of an event is highly dependent on the application. In a WSN without security mechanisms, approximately 40 bytes are required to represent a detected event [14], [15]. Based on this value and assuming that additional information, e.g., detailed sensing data, may be added, the value of |E| was set to 64 bytes in our experiments.
The number of grids g can be determined by the system administrator to control the detection rate of false messages and the amount of traffic. We performed preliminary experiments and set g to values that indicate good performance (e.g., g = 1,000). In our experiments, the value of g was varied from 10-1,000. Note that the value of q varies automatically with the value of g (q = N/g).
Here, N is the number of nodes in a WSN. In the literature [7], the default N value was set to 10,000. We followed this setting and set N=10,000 as the default value, and the value of N varied from 1,000-10,000 in our experiments.

B. EVALUATION RESULTS
Initially, the FPR t and the number of compromised nodes, N c , determine how we perform the experiments on the detection rate per node. Fig. 6 illustrates the findings. We chose compromised nodes randomly from the entire network. This evaluation setting is common for evaluating detection methods' performance [5], [7], [8], [12].
The detection rate is almost the same as the FPRt value. However, when N c is around 100 and FPRt is set to 0.5, the detection rate is slightly lower than the FPRt value. However, because it is difficult to imagine a situation in which more than 100 nodes are compromised without the WSN administrator being aware of it, FPRt can be set to 0.5.
The detection rate is lower around Nc = 100 and FRP t = 0.5 because the attacker can estimate legitimate EOCs by compromising many sensor nodes. Consequently, the attacker can check the obtained hashing vectors whether or not randomly generated codes are legitimate. When Nc is large, the attacker can check a greater number of hashing  vectors. When FRP t is small, the ability to judge each hashing vector is large. Therefore, when Nc is large and FRP t is small, the probability that an attacker can generate legitimate EOCs becomes large. Fig. 7 shows the detection rate per node when changing the values of T and g. We set N = 10, 000. The detection rate is very low when g and T are around 10. The symbol g represents the number of grids, so each grid has 1,000 nodes in this case. If each grid has many sensor nodes and T is large, the number of elements in RS is huge according to Equation (8). As a result, the detection rate becomes low, according to Equation (10). Therefore, this parameter setting should not be recommended. We recommend that the number of sensor nodes of each grid is from 10 to 20. That is, if N is 10, 000, g is recommended to be from 500 to 1,000. Fig. 8 demonstrates the projected number of hops until the detection of a false message. The higher the value of FPRt, the higher the detection rate and smaller the number of hops required before detection; when FPRt is 0.5, the average number of hops is 2, indicating that a sufficiently high detection rate is achieved.
We compared SEF [5], MobiSink [7], CD-PEFS [8], and EMAS [12] to the proposed method. Fig. 9 shows the results. When Nc > T, we found the security mechanism of SEF was inoperative, as the number of hops was 50. Conversely, owing to the considerable authentication information provided to each message, CD-PEFS and EMAS needed a maximum of one hop to detect false messages. The number of hops until nodes can detect false messages of SEF is high because the security mechanism of SEF breaks down when more than T nodes are compromised. In contrast, other methods, including the proposed method, can detect false messages even when more than T nodes are compromised.
Since the security mechanism of CD-PEFS, EMAS, and MobiSink are robust, the number of hops until nodes can detect false messages is only one hop. Furthermore, because the detection rate of the proposed rate is 0.5 (this value is equal to 1-FPR t ), the number of hops of the proposed method is two on average. Although the proposed method increases the number of hops required to detect a false message by one hop, the experiments described later show that it reduces the amount of traffic an attacker can generate.
Then, when no node is compromised, we calculated the mean energy consumption per node. Fig. 10 illustrates the findings. The methods used in related studies demonstrate that the proposed method requires more energy. Nonetheless, even when 10 events occurred monthly, we only found a marginal difference in the increase rate, from 77.8 to 78.3. In this experiment, we assume there are no compromises. The result represents the amount of traffic required to run the WSN during normal operation. Although the amount of traffic per event message is the smallest in the proposed method, the proposed method requires additional traffic for updating EOCs and hashing vectors. Despite these differences in the proposed methods, there is little difference in the amount of energy consumed by the WSN as a whole for all methods due to the relatively large amount of energy consumed by the nodes during normal operation.
Finally, we analyzed the proposed method, CD-PEFS, and SEF to ascertain the maximum traffic quantity a hacker could create per grid. Fig. 11 presents the findings. Given that T MACS can be appended to the message in CD-PEFS and SEF, an attacker could generate long messages, producing substantially more traffic than with the proposed method. By contrast, a lightweight authentication mechanism was performed via the proposed method, yielding minimal traffic production by compromised nodes. In existing studies, each event message requires T MACs or T key IDs for the message authentication; hence, the amount of traffic becomes very large. On the other hand, our method requires only one EOC for authentication. This mechanism can greatly reduce false message traffic. As a result, the maximum amount of traffic an attacker receives for the proposed method is the lowest compared to the existing methods. Fig. 9 reflects the detection rate, which suggests the performance of CD-PEFS is superior. Conversely, Fig. 11 reflects the amount of traffic; thus, CD-PEFS performs poorly.
We have additional experiments with varying T and N. The results are shown in Figs. 12 and 13, respectively. In all existing methods, the traffic increases as T increases. Conversely, the proposed method's traffic is not affected by its value because T codes are mapped to a single EOC. The bit length of this EOC is not affected by the value of T. On the other hand, the value of N has no significant effect on any of the methods. Some methods require node ID information, but the bit length of the node ID only increases on a log scale compared to an increase in the number of nodes.
From these results, in all parameter settings, the amount of traffic an attacker can generate of the proposed method is the lowest compared to other methods.

VII. DISCUSSION
A symmetric key encryption scheme encrypts all messages. This encryption process is common in all existing studies.  The computation cost of a symmetric key encryption scheme is much lower than a public key cryptography scheme [33]. Further, the calculation of a hash value can be performed with much less computation than that required for encryption using a symmetric key encryption scheme [34]. Regarding the operation of XOR and checking a specified bit of a hashing vector, the computation cost can be negligible because these operations are simple bit operations. Therefore, the proposed method's computation cost is minimal for each sensor node.
The sink needs to generate a seed and its code for each sensor node. Then, it generates EOCs by an XOR operation. Moreover, it needs to generate a hashing vector for each sensor node. The number of EOCs is g q C T ; therefore, if the values of g or q are large, the sink needs to generate a lot of EOCs. However, because the generation of EOC requires only a simple XOR operation, this process does not take a long time.
The amount of traffic is evaluated in Figs. 10 and 11, where there are no malicious nodes and a malicious node generates a lot of false messages, respectively. From these figures, the proposed method is robust against false message attacks by malicious nodes, although the amount of traffic during no malicious nodes is almost identical to that of the existing method.
The proposed method assumes that the occurrence frequency of legitimate events is not high. The communication overhead becomes non-negligible if many legitimate events occur because the codes of several sensor nodes should be updated each time a legitimate message is forwarded to the sink. Notably, an attacker wants to generate network congestion to prevent a legitimate event message from reaching the sink. If there are many legitimate events, it is challenging for an attacker to attack all legitimate events in the first place. The assumption that the situation in which such an attack might occur is one in which the number of correct event messages is small is realistic.
The maximum number of messages an attacker can produce equals the maximum number of messages a legitimate node receives. Additionally, the maximum number of messages a node can produce can be determined by the WSN manager according to the application of the WSN. For example, if the application is crime detection, an event does not occur so frequently. Therefore, even if there is an upper limit to the number of messages that can be generated, normal node operation is not restricted.

VIII. CONCLUSION
In this paper, we proposed an algorithm that can detect false messages in a large WSN with a small number of hops. Unlike existing studies, only a single, one-time authentication code is assigned to each event message for authentication in the proposed method, significantly reducing the amount of message traffic compared to existing methods. Since existing research requires T MACs, T key IDs, or other information for authentication, an attacker can generate much traffic just by generating one false message. On the other hand, the existing methods can detect a false message in one hop, but it takes two hops to detect a false message with the default setting of the proposed method. However, the proposed method can prevent network congestion attacks by significantly reducing the amount of traffic. Experiments have shown that the proposed method can reduce traffic volume by more than 60%. In a future study, we will deploy several hundred sensor nodes and perform demonstration tests.