Secure-Energy Efficient Bio-Inspired Clustering and Deep Learning-Based Routing Using Blockchain for Edge Assisted WSN Environment

In recent days, the usage of data transmission has increased in Wireless Sensor Network (WSN) environments due to its dynamic nature. However, WSNs face many issues during data transmission, such as less energy efficiency, less security, and less network lifetime. Here, in this research it presents secure and energy-efficient clustering and routing techniques for an edge-assisted WSN environment to address these problems. The proposed work includes four major processes: Quad tree-based network construction, energy-efficient clustering, RL-based duty cycling, and secure multipath routing. This work constructs the network based on a quad-tree structure to increase network management performance and reduce complexity. After network construction, authentication of sensors is performed by considering ID and location using the Lightweight Encryption Algorithm (LEA), which provides high security by eliminating illegitimate sensor nodes. Then, this research model performs clustering using Tasmanian Devil Optimization (TDO), which selects optimal CH and performs clustering. In contrast, the CH selection and clustering are performed dynamically by considering time and event metrics, which increases communication efficiency and reduces energy depletion. To reduce energy consumption, This research model performs duty cycling using the Improved Twin Delayed Deep Deterministic Policy Gradient (ITD3) algorithm, increasing the network lifetime. Finally, A secure multipath routing is performed using a game theory-based Generative Adversarial Network (GTGAN). During routing, GTGAN ranks the selected multipath based on their hop counts. The highest-ranked paths are chosen for transmitting an emergency message, and medium-ranked paths are selected for non-emergency message transmission, which reduces data loss due to energy depletion. Here, all the transactions are stored on the blockchain for increased security. The NS-3.26 network simulator conducts the simulation of this research, and the performances are evaluated based on various performance metrics, proving that the proposed work achieves superior performance compared to existing works.


I. INTRODUCTION
Wireless Sensor Networks (WSN) consist of distributed sensor nodes and single or multiple sink nodes for monitoring the physical conditions of the real-time environment, such as temperature, motion, and vibration.A WSN is suitable for more applications such as forest fire detection, air quality The associate editor coordinating the review of this manuscript and approving it for publication was Hosam El-Ocla .monitoring, agriculture monitoring, etc. [1], [2], [3].However, network lifetime and delay are important metrics in WSN, addressed by performing clustering, duty cycling, and routing.Due to the difficulty of maintaining sensor nodes, random deployment results in significant energy usage.Hence, the sensor nodes are deployed optimally using zonebased, hexagonal, and grid-based structures [4], [5].One of the most significant WSN techniques that reduces energy consumption is clustering, which collects the sensed data from a group of nodes rather than a single node.Most of the existing works use bio-inspired optimization algorithms for clustering, such as Particle Swarm Optimization (PSO), Mayfly Optimization (MFO), Genetic algorithm (GA), and so on.By considering different factors like residual energy, distance, and latency, These algorithms choose the best cluster head (CH).The optimal selection of CH leads to high communication efficiency and reliability, which also increases the network lifetime [6], [7], [8], [9], [10].Duty cycling is another important process to provide energy efficiency in WSN, which schedules the sensor node and allocates a time slot for each sensor [11], [12].To increase the network lifetime, sensor nodes are allowed to schedule in ON, OFF, and IDLE modes, which share a sensing activity based on their scheduled time slot.This process does not affect the overall system function due to putting a sensor node in OFF mode; rather, it reduces energy consumption.In recent days, a Q learning-based reinforcement algorithm has been proposed for node scheduling, which provides optimal scheduling strategies by minimizing the reward functions [13], [14], [15].One of the major issues in WSN is optimizing the energy available in the sensor nodes to boost the network lifetime, which can be achieved by developing an energy-efficient routing algorithm [42].In general, the nodes are far from the destination and source, so they must route their data packets via multi-hops due to the distance and coverage problems [45].In WSN, routing helps to send the sensed data from CH to the sink or base station.Optimal routes are selected from multiple routes to reduce routing failure due to excessive delay [46].In a cluster-based routing protocol, the data are sent through a shortest and multi-hop optimal routing path [43].].The multi-hop routing allows the nodes to monitor the malicious activities in the nodes of the WSNs.Due to the presence of malicious nodes, the design of secure routing protocols is the most important issue in WSNs [44].WSNs become an insecure environment due to the vulnerable attacks performed on several nodes by attackers [44].In WSN, routing helps to send the sensed data from CH to the sink or base station.Optimal routes are selected from multiple routes to reduce routing failure due to excessive delay [16].The existing works used metaheuristic optimization algorithms for routing, such as grey wolf optimization (GWO) and Aquilla optimization, which provided the results slowly due to their slow convergence.Hence, routing was performed by a deep learning algorithm, which provides a routing result in a faster manner so that the transmission delay is decreased.Routing attacks cause high data loss and low throughput since the data are routed to the best paths during routing.Hence, the data is encrypted and sent to the optimal route for transmission.Additionally, the legitimacy of each hop node is evaluated to increase security.Secure routing increases throughput and packet delivery ratio [17].

A. MOTIVATIONS AND OBJECTIVES
For providing a secure and energy-efficient framework for an edge-assisted WSN environment, this work addressed some issues, which are listed as follows, • Poor Network management-Random sensor node placement results in excessive complexity, which lowers network management's capacity.In addition, the large number of sensor nodes deployed in a particular area also leads to poor network management due to its management difficulties.
• Low Energy Efficiency-Routing is frequently used in existing works without scheduling, which results in excessive energy usage.Because the sensor nodes are always in active mode, additionally, it does not have any individual time slots for execution, which leads to high data collisions during communication and low energy efficiency.
• High packet loss rate: The improper selection of CH leads to high packet loss because the existing works consider limited parameters (i.e., energy, distance, delay) for CH selection, which leads to poor CH selection that increases the high packet loss rate.In addition, routing failure also leads to a high packet loss rate due to excessive delay and the presence of erroneous data.
• Less Security-Here, clustering and routing were performed without considering any security constraints, increasing high-security threats.In some existing works, data is sent through secure routing, but attackers can easily forge the data due to the absence of security.
In addition, trust values are calculated and kept in the base station without taking security precautions into account, making them vulnerable to easy falsification by attackers, which poses a serious concern.
This work is motivated by the problems mentioned above with the aim of improving energy efficiency, reducing latency, and reducing packet loss rates by proving security.
Additionally, this research looks at issues including inadequate network administration, low security, and low energy efficiency The following sections of this study are organized as follows: section II includes an explanation of earlier research as well as any gaps in that research.In Section III, the research technique is explained in detail and is illustrated with examples of the procedure, algorithms, and mathematical representations.Section IV, which also includes the simulation setup, comparison analysis, and research summary, describes the experimental findings of this work.Section V explains the significance of this work.The conclusion of the suggested study and future directions are presented in Section VI.

II. PRELIMINARIES
This section illustrates a detailed explanation of the literature survey.Table 2 summarizes the literature survey, including concepts, algorithms, and research gaps.In the paper [21], the authors proposed a multipath routing model for a wireless control environment.The proposed work performed graph routing for uplink, broadcast, and downlink processes.Here, multipath routing was performed for data transmission, which selects the shortest path as an optimal path, which was selected by the planning algorithm.Every path has backups to provide uninterrupted connections and reduce transmission errors.Three existing techniques were compared with the suggested work, which was implemented by a wireless HART network simulator.Based on the simulation results, the suggested work performed better regarding latency, energy use, network lifetime, and packet delivery ratio.Here, the shortest path was selected as an optimal path that leads to high packet loss due to lack of security, which reduces throughput.
Authors in [22] proposed an optimization-based clustering process for the WSN environment.This main research focus is to minimize energy usage clustering.This research proposed a red deer algorithm for clustering, which selects an optimal CH by considering energy, node degree, sink distance, and neighbor distance.
After clustering was completed, secure data transmission was performed by utilizing blockchain.Here, all the transactions were recorded in the blockchain to enhance security.The numerical results also show that the suggested approach outperformed previous efforts in terms of network longevity, energy use, throughput, and packet delivery ratio.Here, the red deer algorithm was proposed for clustering, which leads to high latency due to its slow convergence, which reduces the performance of clustering.In addition, the static selection of CH leads to energy depletion.
In the paper [23], the authors provided secure and energy-efficient communication in an IoT-based WSN environment.This research used a private key as a random integer for enhancing security, where an elliptic curve cryptography algorithm generated the private key.Four processes are included in the proposed work.namely cluster identification, authentication, cluster head identification, and key distribution.An enhanced LEACH protocol was proposed for the optimal selection of CH in this environment.The results of the comparison demonstrate that the suggested research achieved better outcomes in terms of dead nodes, energy usage, and network lifetime.Here, elliptic curve cryptography was applied for secret key creation during authentication, which results in poor security because its shorter length makes it easily vulnerable to attack by attackers.Authors in [24] proposed a secure routing protocol for an IoT-based WSN environment.This research includes three processes: routing, load balancing, and secure data transmission.Initially, routing was performed based on a multipath routing protocol that selected the optimal CH based on residual energy for routing.After that, data transmission was performed, which included three processes: data encryption during transmission, data aggregation for data transmission, and decryption and authentication for security.A ubiquitous data storage system provided the authentication and security information.Finally, the performance of the proposed work was evaluated based on energy efficiency, throughput, network lifetime, end-to-end delay, and storage capacity.Here, CH was selected based on a single entity, namely residual energy, which was not sufficient for optimal CH selection and led to poor communication efficiency.For the WSN environment, authors in a paper [25] proposed an energy-efficient routing protocol.The setup phase and the steady phase are two of the phases of the intended work.The network is split into four zones during the setup phase based on the distance threshold.Based on residual energy, regions three and four were taken into consideration when choosing CH.In this study, a hole-removing algorithm was proposed to lower energy usage.Finally, data transmission took the energy threshold into account.According to the comparison findings, the proposed work performed better in terms of network lifetime, residual energy, throughput, and network stability.Because the data was exchanged through a public channel without considering any security restrictions, data transmission in this case was carried out by considering the energy threshold, which creates serious security risks.A secure cluster routing protocol was proposed in the WSN environment [26].This result proposes a lightweight trust management method for increasing security.Here, the node reputation value was measured by binomial distribution, and the reputation value was updated dynamically to calculate the new trust values of the nodes.Here, CH was selected by considering environmental parameters for improving stability.This research used a multi-dimensional secure routing protocol for improving security and reducing energy consumption.Finally, the comparison results demonstrate that the suggested approach outperformed previous efforts in terms of security and energy efficiency.
A new clustering algorithm was proposed for reducing energy consumption in the WSN environment [27].Maximizing network longevity is the primary goal of this research.Node deployment, clustering, data collecting, and routing are some of the procedures included in the proposed work.The OPTICS clustering method was used to accomplish clustering and CH selection once the nodes had first been deployed in a distributed fashion.In this instance, CH was used to gather data from the cluster's members based on time slots that allot the time slot for data transmission.The acquired data was then delivered from CH through the best route to the sink node.The simulation findings show that the suggested work outperformed previous efforts in terms of energy usage and network lifetime.The random placement of sensor nodes increases energy consumption and demonstrates poor network management, which makes clustering and routing more challenging.For data aggregation and routing in a WSN environment, authors in [28] suggested a reinforcement learning technique.Reinforcement learning (also known as Q learning) was primarily used to decrease rewards and encourage necessary behavior.To save energy, the sensor node first collected the data using multi-mode operations.After that, data aggregation was carried out to minimise spatial and temporal data duplications during routing.The best path with the fewest hops was chosen to deliver the aggregated data to the sink nodes.The effectiveness of this research was assessed in terms of delay, complexity, overhead of control messages, and queue management.The outcomes of the simulation reveal that the suggested work achieved improved performance in data aggregation and routing.For data aggregation and routing, in this case, the Q learning algorithm was utilized.Because Q values must be updated frequently, the latency increases.That may lead to routing failure.In the paper [29], the authors proposed a heuristic algorithm for performing routing in a WSN environment.The proposed work includes three major processes, namely route discovery, security providing for discovered routes, and route maintenance.Initially, sensor nodes were deployed in an undirected graph to discover the path between two nodes easily.The heuristic algorithms decide on routing.Next, security was provided for the selected route by using a cryptography algorithm.Finally, alternative route paths were identified for the selected routes when the path exceeded the predefined threshold.The comparison results show that the proposed work achieved better performance in terms of throughput, delay, packet drop rate, energy consumption, network overhead, failure routes, and computation overhead compared to previous works.The main aim of this paper is to reduce energy consumption and increase network lifetime by using enhanced LEACH-based clustering and TDMAbased scheduling [30].Initially, clustering was performed by enhanced LEACH, which includes grey wolf optimization and particle swarm optimization for selecting optimal CH.After clustering was completed, TDMA-based scheduling was performed by considering four modes, such as sense, transmit, receive, and sleep, which improved the network's lifetime.Finally, data transmission was performed between CH and sink nodes by selecting the optimal channel using fuzzy logic based on channel capacity, RSS, and packet error rate.The simulation results show that the proposed work achieved better performance in terms of throughput, energy consumption, delay, network lifetime, and packet loss.Here, fuzzy logic was proposed for selecting the optimal channel for data transmission.However, it provides only approximate values, which leads to poor channel selection that reduces the performance of data transmission.
Multi-objective seagull optimization, an optimization technique, was suggested for routing in an IoT environment with WSN support [31].For the clustering and routing processes in this scenario, a seagull optimization technique was suggested.Remaining energy, network coverage, node degree, and communication cost were the initial factors considered when choosing a CH.Routing was carried out after the best CH had been chosen, taking queue length, communication cost, link quality, and residual energy into account.Additionally, the simulation results demonstrate that the suggested approach outperformed previous efforts in terms of packet delivery ratio, energy usage, delay, and network lifetime.
In this case, optimum routing was carried out based on queue length, communication cost, connection quality, and residual energy, which determine the optimal way; nonetheless, it results in lower throughput and a higher packet loss rate due to the nodes' lack of physical protection.The authors in [32] proposed secure routing using blockchain in a WSN environment.Initially, the data was encrypted and sent to the authenticated sensor nodes through secure routing to increase security.The blockchain, which contains both private and public blockchains for authentication, is where the authentication transactions are stored.Routing was performed by selecting the next hop, which was selected by calculating the trust values.The experimental results show that the proposed work achieved better performance in terms of packet delivery ratio.
In the paper [33], the authors proposed an optimization algorithm-based energy-efficient routing for an IoT-based WSN environment.The proposed work includes four processes, namely network setup, clustering, path selection, and node aggregation.Here, clustering heads were selected based on network coverage, network connectivity, and network longevity.After selecting the optimal CH, routing was initiated by selecting the optimal routing path using the sailfish optimizer algorithm.According to the experimental findings, the suggested work performed better in terms of energy usage, packet delivery ratio, and bandwidth utilisation.
Here, routing failure occurred due to excessive delay and erroneous data because of not considering physical security and the shortest path.A reinforcement learning algorithm was proposed for sensor scheduling for multi-target tracking [34].The proposed work includes three processes, namely clustering, prioritizing, and allocation.Initially, dynamic clustering was performed to reduce energy consumption.After completing clustering, prioritizing was performed based on target type, node density, energy, and distance.Here, the Q learning algorithm was proposed for task allocation, which maximized the reward function for obtaining optimal scheduling and allocation.The experimental results show that the proposed work achieved better performance in terms of tracking accuracy and energy efficiency.The authors in [35], proposed an improved optimization technique for clustering and routing in an IoT-based WSN environment.In this context, CH was chosen using the Archimedes optimization technique while considering distance, node degree, and energy.Routing was performed based on teaching learning-based optimization algorithms that select an optimal routing for data processing.Finally, the simulation results show that the proposed work achieved better performance in terms of energy consumption, network lifetime, latency, and packet delivery ratio.This research has a problem of high energy consumption due to the absence of scheduling.Hence, the sensor nodes are always activated, which reduces the energy efficiency.

III. SYSTEM MODEL
This research focused on using blockchain technology for secure clustering and routing in a WSN environment.The main objective of this research is to reduce energy consumption and latency and increase throughput and security.This research includes entities like sensor nodes, edge servers (which include clustering agents, scheduling agents, and routing agents), cloud servers, and blockchains.This work is implemented in four major processes, which are listed as follows, (ii) Edge Server: It helps to provide additional resources to the WSN environment with the purpose of reducing energy consumption and latency.It includes three agents for performing clustering, scheduling, and routing to obtain a high network lifetime.
(iii) Cloud Server: It is used to update and store the sensing data of the sensor nodes for further purposes, which reduces the storage burden of the WSN environment.
(iv) Blockchain: It is used to provide the security of the WSN environment with the aid of a tamper-proof ledger, which stores all the transactions such as authentication, clustering, scheduling, routing, and encrypted data, which leads to high security and less packet loss rate.

Radio Model for Communication:
In a wireless sensor network, the First-Order radio model is a simplified abstraction that has a behavior to communicate.It represents the energy consumption and transmission range of these devices.The first-order radio model has two key parameters, like 1. Transmit Energy (E tx ): This represents the energy consumed by a node to transmit data to a certain distance.
It expands the energy based on the distance to the receiver 2. Receiver Energy (E rx ): This represents the energy consumed by the node to receive certain data from another node.Just like the transmitter, it also depends on the distance between the sender and receiver.The first-order radio model offers an evaluation of energy consumed when transmission or reception is made by a sensor node at each cycle.The energy consumed during the transmission and reception is typically proportional to the distance covered.Hence, the energy required to transmit or receive a packet over distance 'd' can be represented as Here, 'k' is a proportionality constant that captures the relationship between energy consumption and distance.This model is used in networks and can be utilized for the basic simulations and optimizations in the early stages of the network design and analysis.Firstly, analyze and understand the energy efficiency and communication range of the sensor nodes in a wireless sensor network.The energy spent by any transmitter to send a k-bit message over a distance d is,

d0=
E fs E amp (5) The first term represents the energy consumption of radio dissipation, while the second represents the energy consumption for amplifying radio.The electronics energy (E elec ) depends on factors such as the digital coding, modulation, filtering, and spreading of the signal, the use of free space (E fs ) and the multi-path (E amp ) fading channel models depend upon the transmission distance d.

A. QUAD TREE-BASED NETWORK CONSTRUCTION
The network is initially built using a quad tree-based hexagonal architecture, which increases the effectiveness of network administration [41].Here, four types of hexagonal subdivisions are presented in the network, where each division includes one parent node and four child nodes.Here, the hexagon center point is calculated based on the lattice point.
The vectors x and y are the 2D cartesian coordinate system which is active to define the lattice point, which is represented as follows,  The hexagonal lattice vector model is produced by connecting the hexagonal lattice point with vectors.Each hexagonal cell in this example is represented by a grouping of two vectors, x and y.Let S_n represent the set of lattice points that are divided in the hexagonal grids with n resolutions, which is formulated as follows, where d represent the partition direction (d={0, x, y, −x, −h}, and S 0 represent the first level lattice point.For n ≥ 1, assume and l n represent the linear grouping of x n , h n , where l n represents the entire lattices in the vector space of nth level.The calculation of l n is defined as follows,  of numbers in the ordinary form is mentioned as the label of S i in S n , where b i ∈ {0, 1, 2, 3, 4} , which is one to one representation between the hexagon cell and the label.The proposed quadtree encoding cannot cover and defend the overall spherical surface.Hence, it needs adjustment.For code in the bottom right corner, the additional hexagonal grids are represented as follows, The hexagonal grid pair in the topmost left corner is represented as follows, where x ⟨=⟩ x * n , and h ⟨=⟩ h * n and R L n ⟨=⟩ R L * n , The grid part exceeding the bottom right corner accurately agrees with the part that is empty in the topmost left corner.Likewise, the grid part that overpasses the bottom and topmost left corners may also agree with the vacant part in the bottom and topmost corners, respectively.Hence, it constructs a hexagon quadtree structure optimally.All the sensor nodes are deployed in the center point of the hexagon.This type of quad-tree construction reduces the complexity during network management. Initially, all the sensor nodes register their ID and location to the Secure Agent (SA) for authentication.After completing registration, the SA provides an authentication ticket using a Lightweight Encryption Algorithm (LEA), which provides high security with minimum execution time.In this work, an authentication ticket is known as a key that LEA generates.The LEA algorithm includes key expansion and encryption processes.The basic component for performing decryption and encryption is key.Hence, the key is more secure for preventing the data from attackers.The proposed LEA takes five rounds for encryption and decryption; hence, it generates five unique keys.To provide security to the key, it needs to maintain that the key length is much larger.The LEA is a 64-bit block cipher algorithm, which means it needs a 64-bit key for encrypting the data.After the key generation process, encryption is initiated.The confusion and diffusion processes include some logical functions such as substitution, swapping, and left shift.The first matrix of 64-bit plain text (T P ) is segmented into four segments with 16 bits, which are as follows T P (y 0−15 ) , T P (y 16−31 ) , T P (y 32−47 ) , and T P (y 48−63 ).The bits grow in every round.LEA performs a swapping function, which reduces the originality of the data by changing the bit order, basically increasing confusion in the cipher text.Then, bitwise XNOR is performed among individual round keys.ε i attained previous from key expansion procedures and T P (y 0−15 ) is applied among ε i and T P (y 48−63 ) which results in sr 11 and sr 14 correspondingly.The 64-bit key is fed into the key expansion block for expanding the key size.The expansion procedure is explained as follows: first, the 64-bit key is separated into 4-bit segments.Then, the F function is performed, which plays a key role.This is used to determine whether the cluster should be further divided or not, and the F function is a criterion or the rule that is applied to evaluate whether the cluster is homogeneous enough to consider as a single entity or it should be split into sub-clusters it is the criteria to be considered for the quadtree sub division process.The exact form of the F function varies from one application to another application based on the nature of the data that is supplied.Here, it is calculated based on 16-bit data, which are obtained after executing an initial replacement of the cipher key.(ε c ) segments, which are defined as follows, where i = 1 to 4 for starting 4 round keys.Then, it gets ε ai by forwarding the 16-bits of ε i F to the F function, which is expressed as follows, where F function includes both R and S tables, which perform the transformations of non-linear and linear, which results in diffusion and confusion function, which is shown In Table 5.The F function output is represented in 4 × 4 matrix format which is formulated as follows,  (21) For obtaining the fifth key, the XOR operation is performed between four round keys, which is defined as follows, The XNOR output is fed into F operation for generating key expansion EF l1 and EF r1 .The F operation is used in encryption is similar as the key expansion which performed both swapping and substitution functions using bitwise XOR operation, which is applied among EF l1 &T P (y 32−47 ) for obtaining rd 12 , similarly between EF r1 &T P (y 16−31 ) for obtaining rd 13 .The computation of rd i,j is defined as follows, T P y i,j ⊙ ε i ; j = 1 and 4 T P y i,j+1 ⊕ EF li ; j = 2 T P y i,j+1 ⊕ EF ri ; j = 3 (23) In this way, round transformation is performed for the following round.rd 11 will be T P (y 16−31 ) , rd 12 will be T P (y 0−15 ) , rd 13 will be T P (y 48−68 ) , and rd 13 will be T P (y 32−47 ).These steps are repeated for every round.The final round are merged to get cipher text (T C ) which is defined as follows, The proposed authentication algorithm performs better when compared to other cryptographic algorithms, namely AES, DES, blowfish, etc.

B. ENERGY EFFICIENT CLUSTERING
After authentication is completed, the clustering process is initiated to reduce energy consumption and latency.This research proposes the Tasmanian Devil Optimization (TDO) algorithm for clustering, which is under a bio-inspired optimization algorithm that provides the results in a faster manner due to its high convergence.Clustering is done by a clustering agent, which is deployed in the edge server.Initially, optimal CH is selected based on fitness evaluation by TDO.Here, fitness is evaluated by considering trust, energy, distance, delay, direction, and node connectivityand then performing 145430 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.clustering.At first, the nodes will be unclassified, and it identifies its neighbors based on the distance, and it performs until it reaches the requirement to be a cluster.The population initialization of TDO is expressed as follows, where Z represents the population of TDO and Z i represent the ith candidate of the solution, n, and m represent the searching count and variable count, respectively.The following is a definition of TDO's objective function: where OF represents the objective function value, which represents the candidate solution quality.The TDO includes two phases: exploration (feed by eating carrion) and exploitation (feed by eating prey).In TDO, for every Tasmanian devil, the population member position in search space is considered to be the location of carrions.The random selection of kth member as the goal carrion for ith Tasmanian devil.The random selection is defined follows, where γ i represent the nominated carrion by ith Tasmanian devil.Based on the nominated carrion, the current position of the Tasmanian devil is calculated in the search space.If the carrion objective function is high, the Tasmanian Devil moves forward.Otherwise, it moves backward from carrions.The position updation of TDO is defined as follows, Z new,δ1 i,j = z i,j + rand.γ i,j −I.z i,j , OF γ i < OF i z i,j + rand.z i,j − γ i,j , otherwise where Z new, δ1 i represent the ith Tasmanian devil's current status and OF new, δ1 i represent the value of the objective function, OF γ i represent the nominated carrion objective function value, rand represents the random value with the range of [1,0], and I represents the random value, which may be one or two.In the exploration phase, the Tasmanian devil hunts for earing prey.It performs hunting by utilizing two strategies.The first one is area scanning, which selects attacks and prey.The second one is to stop chasing and start eating.The prey selection process is defined as follows, where Ṕi represent the nominated prey by ith Tasmanian devil.After discovering the position of prey, the current position of the Tasmanian devil is calculated.If the nominated prey objective value is high, the Tasmanian devil moves forward.Otherwise, it moves backward.The current position is calculated as follows, Z new,δ2 i,j = z i,j + rand.γ i,j −I.z i,j , OF Ṕi < OF i z i,j +rand.z i,j − Ṕi,j , otherwise (31) where Z new,δ2 i,j represent the current position of the ith Tasmanian devil, OF new, δ2 i represent the value of the objective function and OF Ṕi represent the objective function of the nominated prey.To simulate the chasing process of the first and second strategy, the Tasmanian devil tracks the prey in the attacked placed neighborhood.Tasmanian devil determines the chase stage of the prey is defined as follows, where α denotes the radius of the neighborhood, u, and U represents the count of iterations and the highest count of iterations, respectively.In this stage, the position of the Tasmanian devil is assumed to be the neighborhood centre, whereas the chasing process of the prey is taking place.The neighborhood radius represents the range of the Tasmanian devil who follows the prey, which is also calculated based on the above equation.Then, the current position of the Tasmanian devil based on the chasing process is calculated as follows, The Tasmanian devil recognizes the current position if it has a high objective function compared to the existing position.Then, the current position of the Tasmanian devil is defined as follows, In this way, TDO identifies optimal CH for clustering.Here, the mapping scenario of the cluster head and how it chooses the cluster is shown.Some parameters are to be considered.The parameters are cluster head and noise, and then it calculates the ground truth dataset, which is capable of finding nodes that are suitable for each cluster.It constructs a confusion matrix with rows and columns corresponding to the ground truth labels (true classes) and the predicted labels (cluster IDs).And populates the matrix based on how the algorithm's assignments match the ground truth.The representation of the sample confusion matrix and how it chooses the cluster head is expressed below.It can calculate many various metric evaluations based on a confusion matrix.
Here 1. TP: True Positives, which are correctly classified points in each cluster 2. FP: False Positives where points wrongly classified into a cluster FN: False Negatives where points belonging to a cluster but not classified correctly 4. TN: True Negatives, which are correctly classified noise points Here, clusters and CHs are dynamically reconstructed based on time and event.Time is considered to measure how much time it acts like a CH, which is considered for evaluating the resource consumption of the CH.If the current node acts as CH for much more time, then it evaluates the resource efficiency for preventing energy depletion.Here, the event is known as the resource consumption that is energy.If the energy exceeds a certain threshold, then reconstruction is performed to reduce data loss or CH failure.The threshold generation is defined as follows, g (w, v) = − e∈E w (e) log v(e) (36) where g represents the threshold range, e is energy, w and v represents the discrete probability density function.Reconstruction is performed based on a certain time to balance the resources.
If the value of reconstruct is 0, then continue the current CH and cluster.If the value is 1, then reconstruct CH and cluster.The 0.6 value is derived by performing the probability density function, and it is 0.6, which is equal to more than half of the life time of the node, which means nearly 60%, so it chooses another node as cluster head.These types of dynamic CH selection reduce energy depletion CH failure and increase communication efficiency when performing massive amounts of data aggregation and transmission.

C. RL-BASED DUTY CYCLING
Because sensor nodes are always in active mode, the chosen CH transmits data, which uses a lot of energy.Duty cycling is done by the scheduling agent, which is deployed in the edge server.To reduce energy consumption, it performs RL-based duty cycling, which allocates individual time slots for each sensor and reduces the data collision during data transmission.This work proposes duty cycling using an enhanced twin delayed deep deterministic policy gradient (ITD3) algorithm, which is shown in Figure 3.It includes two critic networks to avoid the prediction error of a solo critic network.Hence, it leads to high accuracy and faster convergence.The proposed duty cycling process includes three modes of duty cycling: ON, OFF, and IDLE.In the ON mode, all the sensor nodes are allowed to transmit the sensed data.In OFF mode, sensor nodes are turned off to increase network lifetime.In IDLE mode, sensor nodes only receive the sensed data and are not allowed to send the sensed data.Table 7 depicts the state, action, and reward of ITD3.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
cycling to enhance the lifetime of the sensors.For performing duty cycling, the agent of ITD3 evaluates the policy and Q value function.Here, the policy is the strategy that the agent uses to take action in the environment in which it tries to find an optimal policy.It is a function that directly maps states to actions, and whereas the Q− values are known as the Quality values in the reinforced and Q learning, the Q value is also known as the action value function that represents the expected cumulative reward an agent which can be obtained by taking a particular specific action.In general, the Q-value is defined as Q(s, a)=E[sum of discounted rewards starting from state s, taking action a, and following the policy].The Q value indicates how far it is good to take any action at a particular state.The policy gradient-based deep reinforcement learning methods include two networks for improving the decoupling of the network environment, which also avoids excessive errors.The relationship and execution of the proposed learning network is explained as follows, At time slot τ , the exploration noise χ is added for policy function π ω (E t ).The actor-network performs action (F t ) based on state with duty cycling environment for observing next state E t+1 with respective value of reward N t .The information of E t , F t , N t , E t+1 will be stored in the experience replay buffer.Subsequently, the target network of actor can select the accurate next action F t+1 based on the information of next state E t+1 .Finally, N sample data are selected randomly form the experience replay buffer, that is jointly by six networks.The present action, and critic present (critic 1, and critic 2) networks has an individual response for computing the values of Q ϕ1 (E m , F m ) and Q ϕ2 (E m , F m ) .For now, the parameters of networks will be replicated in a periodic a manner to the target network of critic 1 and critic 2. However, the present 1 target network and present 2 target network are

End if End for End for Obtain duty cycling with high efficiency End
The small value is considered the ultimate target Q value.o m for reducing overrating bias caused by continuous gathering and circulation, which is defined as follows, Based on the target Q value, the neural network deployed the method of gradient backpropagation for updating the related parameters ϕ1, and ϕ2 of the present network to reduce loss function.Then, it estimates the loss function, which is expressed as follows, where n may be one or two.Additionally, to improve the collective error functions and improve the stability of training, it utilizes the frequency of late updating, which means the target and policy networks are updated in ordinary intervals.
The policy gradient function of ITD3 is defined as follows, In this way, continuous parameter updating and learning process improves the training performance of ITD3, which results in optimal scheduling policies.

D. SECURE MULTIPATH ROUTING
The sensed data are sent from source to destination through secure routing.Before routing, it classified the data into two classes, namely emergency data and non-emergency data, by considering data size, deadline, and type of data.
To improve security, it transmitted the data in an encrypted manner, where the data is encrypted by LEA using eqn ( 23), (24).Routing is performed by a routing agent, which is deployed in the edge server.In routing, the next hop is selected based on energy, distance, trust, and link quality.From the multiple paths, the optimal route is selected by considering packet delivery ratio, number of hops, and link stability.Routing was done using the game theory-based Generative Adversarial Network (GTGAN) algorithm, which provides the results faster and with higher accuracy.In this research, it builds the network model based on the Stackelberg game, which agrees on a model of price leadership.Here, the leader takes a decision, and then the other players or followers decide their own actions.The Stackelberg game is put in the GAN; multiple generators and a single discriminator are essential to compare with everyone, and the discriminator is present in the position of leader.The training is modified between the discriminator and generator.The main aim of the generator is to reduce the value of Ṽ (G 1 , G 2 , . . ., G J ; D)/J , the main aim of the discriminator is to enhance the value of Ṽ (G 1 , G 2 , . . ., G J ; D)/J .The Stackelberg GAN objective function includes the summation of losses.The discriminator output is 0.5 at the generator, which succeeds in misleading the discriminator.
The optimal generators and discriminator are expressed as follows, The loss value of the Stackelberg GAN generators and discriminator is computed as follows, Model minibatch of N samples {f (1), f (2), . . .., f (N )} from distribution of data generation pdata(f ) Update the discriminator based on its average stochastic gradient

End For
Model minibatch of noise samples {e(1), e(2), . . .., e(N )} from prior of noise p g (e) Update J generator based on descending order using its average stochastic gradient

End For End
criminator that learns J data mode with J generators.In this way, it follows a special learning process, which means all the generators are trained together with the sharing of a single discriminator, and every generator and discriminator pair shares an equivalent weight.Finally, it obtains the optimal path from multiple paths by performing the Stackelberg game.Based on the results, It ranked the routing paths based on the number of hops (i.e., shortest path order (ascending)) in the data from source to destination, and then the backup agents select the alternate path for routing, which is deployed in the blockchain.This process reduces node failure and routing failure due to energy depletion and erroneous data.
In rare cases, non-emergency message transmission, the node does not have the energy to transmit its data to the medium rank.This research has stored the transactions of trust values, routing paths, backup route information, and encryption data recorded in the blockchain to enhance security.

IV. EXPERIMENTAL RESULTS
The experimental findings of the suggested SE2Bio-CR model are presented in this section.In two subsections, which include the simulation setup and the comparative analysis, The results proved that the proposed work achieved better performance both qualitatively and quantitatively by performing quad tree-based network construction, energy efficiency clustering, RL-based duty cycling, and secure multipath routing.

A. SIMULATION SETUP
The simulation setup for the proposed work is explained in this section.The simulation of this research is conducted by NS-3.26 network simulator.The simulation of this research is executed by performing Quad tree-based network construction, energy-efficient clustering, RL-based duty cycling, and secure multipath routing.Table 8 illustrates the simulation of the parameters of the proposed work.

B. COMPARATIVE ANALYSIS
This section provides the comparative analysis between the proposed SE2Bio-CR model and existing models such as DRL-SS [36], HOA-WSN [37], and HybMeta-WSN [38] models.The comparison is performed by considering various performance metrics such as throughput, latency, energy consumption, packet delivery ratio, network lifetime, and alive nodes, which proved that the proposed SE2Bio-CR model achieves better performance compared to existing models.

a: Impact of Energy Consumption
The difference between the initial energy and the remaining energy is used to calculate the energy consumption.
This metric is used to analyze how much energy the system consumes to complete all the processes in the WSN environment.The mathematical representation of energy consumption (C E ) is defined as follows, where ὴ represents the initial energy, and α represents the remaining energy.The comparison of energy consumption in relation to simulation rounds is shown in Fig. Latency is used to determine the additional time taken by the system to complete all processes, whereas less latency denotes the high efficiency of the system.The calculation of latency Ĺ is defined as follows, where Ç represents the current completion time, and Ë represents the expected completion time.The proposed SE2Bio-CR model requires less latency than the existing model, as shown by the graphical representation of latency with respect to simulation rounds in Figure 6.To reduce latency during data transmission, it performs energy-efficient clustering using the TDO algorithm, which optimally selects CH and performs optimal clustering, which increases communication reliability and reduces latency because of its high convergence.Latency also occurs because of data collision; hence, It reduces the collision by performing duty cycling using the ITD3 algorithm, which reduces latency.To reduce delay during routing, it performs multipath routing using the GTGAN algorithm, which optimally selects the shortest and most secure path for data transmission by performing a game that increases processing speed and reduces latency.The existing HybMeta-WSN, HoA-WSN, and DRL-SS performed clustering using particle swarm optimization and affinity propagation, grey wolf optimization, and brainstorming optimization algorithms, respectively, which considered limited metrics for clustering, which is not enough for optimal clustering.Additionally, clustering takes much time to perform due to its slow convergence, which leads to high latency.Routing was also performed by an optimization algorithm, which considers limited metrics for selecting an optimal route that leads to inefficient routing, thereby resulting in high latency.The proposed SE2Bio-CR model achieves 2.2s less latency than the DRL-SS model, 4s less than the HOA-WSN model, and 5.1s less than the HybMeta-WSN model.

c: Impact of Throughput
The number of successful packets sent from source to destination over time is measured as throughput.The computation of throughput (Tr) is defined as follows, where R S represents the amount of successfully transmitted packets, and t represents the time.To increase throughput, it performs secure routing by GTGAN, which ranks the routing options based on the number of hops, which increases throughput because of selecting the optimal route for corresponding data transmission.In addition, the data is encrypted by LEA during data transmission, which cannot be hacked or compromised by attackers.Hence, it leads to high throughput.The existing works do not perform optimal routing and clustering by considering limited metrics, which leads to less throughput.In addition, the existing works do not focus on security, which also decreases the throughput level compared to our proposed model.The proposed SE2Bio-CR model achieves 9% higher than DRL-SS model, 16% higher than HOA-WSN, and 23% higher than HybMeta-WSN model.

d: Packet Delivery Ratio
This metric is used to compare the number of packets sent by the sender to the number of packets received.The formula for calculating the packet delivery ratio is as follows.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the graphical representation of the packet delivery ratio for both proposed and existing works.It proved that the proposed SE2Bio-CR model achieves a high packet delivery ratio compared to existing models.Here, optimal and dynamic CH is selected to increase communication efficiency and packet delivery ratio.To increase the packet delivery ratio, it performs secure routing.Before that, data are classified into emergency and non-emergency, which helps to reduce packet loss rate due to missed deadlines.In addition, the data are encrypted by LEA before transmission, which increases the packet delivery ratio.Here, the routing paths are ranked and ordered in ascending order by GTGAN based on the number of hops.The emergency messages are sent to the highest-rank path because they contain a smaller number of hops.Hence, it transmits the data before the end deadline, which increases the packet delivery ratio.The non-emergency data are sent to the medium-ranked paths.
In the worst case, alternative paths are selected from the backup agent for reducing packet loss.In this way, this research increases the packet delivery ratio compared to existing models.The static clustering and CH selection processes used by the current models result in energy consumption and data loss.In addition, all the data (i.e., emergency and nonemergency) are sent the selected routing path, which leads to high data loss because of high waiting which leads a lower packet delivery ratio.The proposed SE2Bio-CR model achieves 8.6% higher than DRL-SS model, 15.6% higher than the HOA-WSN model, and 21.6% higher than HybrMeta-WSN.

e: Impact of Network Lifetime
This metric is employed to assess the sensing network's lifetime.It is described as the quantity of sensor nodes that continue to function during a specific period in order to survive in the environment.The comparison of network lifetime with respect to simulation rounds is shown in Fig. 9.
According to the comparison results, the proposed SE2Bio-CR model outperforms earlier works in terms of  network lifetime.In this research, it deploys the sensor nodes in a Quad tree-based structure, which increases network management, reduces energy consumption, and increases network lifetime.This research performs dynamic clustering and CH selection based on time and event metrics, which increases network lifetime by reducing energy consumption.However, the existing works perform static CH selection, which leads to energy depletion that reduces network lifetime.The existing HOA-WSN and HybMeta-WSN models do not consider duty cycling, which increases high energy consumption and less network lifetime.Here, it selects optimal and secure routing for data transmission, which reduces transmission latency and increases network lifetime.The existing works lack optimal clustering, scheduling, and routing, which increase energy consumption and reduce network lifetime.The proposed SE2Bio-CR model achieves 8% higher than the DRL-SS model, 13% higher than HOA-WSN, and 18% higher than the HybMeta-WSN model.

f: Impact of Alive Node
This metric is used to calculate the count of alive nodes in the network, which defines the node in an active state that is directionally proportional to the network lifetime.

V. DISCUSSION
The major significant highlights of the research are as follows.
• To improve network management, this research proposed a Quad tree-based hierarchical architecture that enhances the ability of network management.In addition, it performs sensor authentication using the LEA algorithm, which ensures the legitimacy of the sensor nodes.In DRL-SS, which uses a corona structure and also uses only a sink node in each corona structure, all the nodes will have to transmit.There is no security, and HoA-WSN, which uses Random deployment, and trust values were calculated based on fuzzy logic, which provides only approximate result that reduces the security, this proposed model gives more efficiency.HybMeta-WSN, which has a low fault tolerance, which leads to high-security threats due to a lack of physical security.
• To increase communication reliability, it performs dynamic clustering using TDO, which selects the CH dynamically and reduces energy consumption during data collection and aggregation.While the DRLSS uses PSO and AP, which takes much time for clustering due to its slow convergence, which leads to high energy consumption and latency, and HoA-WSN selects the CH optimally based on distance, energy, security, and delay, which increases the complexity, and the values are approximated by Fuzzy logic, which reduces the security overall.Whereas in the case of HybMeta, clustering was performed based on a brainstorm optimization algorithm which considers distance and energy for selecting optimal CH which is not enough for optimal CH selection Because it has Static CH Selection leads to energy depletion that increases data loss.
• To reduce energy consumption and improve network lifetime, this research performs the ITD3 algorithm, which schedules the sensors in three modes, namely ON, OFF, and IDLE, and also reduces the data collision.
In DRL-SS, it uses Q-learning, which results in high latency because it must keep updating the Q value periodically, which impacts the duty cycling.Meanwhile, in the HoA-WSN, the challenge it faces is that it takes the trust values based on fuzzy logic, which are stored in a public channel and can be compromised.Whereas in HybMeta it uses brainstorm optimization with levy distribution to determine optimal CHs, it also does not perform any scheduling process, and it also leads to high-security threats due to lack of physical security.
• To enhance the data delivery ratio and reduce network congestion, this research performs secure routing using the GTGAN algorithm, which provides results in a faster manner that reduces delayed data transmission.While the DRL-SS uses ant colony optimization, which leads to high data loss, it does not address the legitimacy of nodes.Which also reduces throughput and increases packet loss rate.Also, when compared to HoA-WSN, it uses secure routing by calculating the trust values of nodes but lags in data security, both of which are considered in the proposed model.In the HybMeta, the data transmission takes place using the intercluster routes determined by the WWO-HC algorithm.which don't address the absence of individual time slots for sensors, due to which it increases high latency.So, while comparing all the points mentioned above, the proposed model outperforms the existing model.Table 9 depicts the numerical analysis of the proposed and existing works, which includes average values of the performance metrics.

VI. CONCLUSION AND FUTURE WORK
This research mainly focused on designing a secure and energy-efficient framework for an edge-assisted WSN environment.To achieve this goal, the proposed work performs four major processes, which are explained as follows: initially, all the sensor nodes are deployed in a Quad tree-based structure to increase the capability of network management, which also reduces the complexity due to random deployment.To increase security, it authenticates all the sensor nodes using LEA, which performs against external attacks, thereby resulting in high security.To increase energy efficiency and reduce latency, it performs dynamic CH selection and clustering using the TDO algorithm by considering time and event metrics, which provide high communication reliability that increases the throughput of the environment.The energy consumption is reduced by performing duty cycling using the ITD3 algorithm, which takes an action based on the current environment that increases network lifetime.After that, it performs secure and multipath routing GTGAN, which provides optimal and secure routing for data transmission.Before data transmission, the data are encrypted by LEA to increase throughput and reduce packet loss rate due to erroneous data.Here, alternate paths are also selected from the backup agent to reduce data loss due to energy scarcity for sensor nodes to transmit the data from source to destination, which increases throughput.Finally, the simulation of this research is conducted by an NS-3.26 network simulator, and the performance is evaluated based on several performance metrics that demonstrate that the proposed work achieves superior performance compared to existing works.In the future, This research plans to integrate Software Defined Network (SDN) with WSN to increase network management, scalability, and flexibility.

Fig 1
Fig 1 represents the flow of the work.Table 3 depicts the proposed work goals, which include processes, algorithms, and corresponding goals.Fig 3 represents the architecture of the proposed work, which includes all the processes of the proposed work.The description of entities is listed as follows, (i) Sensor nodes: The sensor nodes are responsible for sensing and updating the current status of the WSN environment.(ii)Edge Server: It helps to provide additional resources to the WSN environment with the purpose of reducing energy consumption and latency.It includes three agents for performing clustering, scheduling, and routing to obtain a high network lifetime.(iii)Cloud Server: It is used to update and store the sensing data of the sensor nodes for further purposes, which reduces the storage burden of the WSN environment.(iv)Blockchain: It is used to provide the security of the WSN environment with the aid of a tamper-proof ledger, which stores all the transactions such as authentication, clustering, scheduling, routing, and encrypted data, which leads to high security and less packet loss rate.Radio Model for Communication:In a wireless sensor network, the First-Order radio model is a simplified abstraction that has a behavior to communicate.It represents the energy consumption and transmission range of these devices.The first-order radio model has two key parameters, like 1. Transmit Energy (E tx ): This represents the energy consumed by a node to transmit data to a certain distance.It expands the energy based on the distance to the receiver 2. Receiver Energy (E rx ): This represents the energy consumed by the node to receive certain data from another node.Just like the transmitter, it also depends on the distance between the sender and receiver.The first-order radio model offers an evaluation of energy consumed when transmission or reception is made by a sensor node at each cycle.The energy consumed during the transmission and reception is typically proportional to the distance covered.Hence, the energy required to transmit or receive a packet over distance 'd' can be represented as

FIGURE 1 .
FIGURE 1. Flow chart of the proposed model.

FIGURE 3 .
FIGURE 3. Architecture of proposed system model.
mainly responsible for calculating target Q values o 1m , o 2m correspondingly.Pseudocode for ITD3 for Duty Cycling Input: Energy, buffer storage Output: Duty cycling (ON, OFF, IDLE) Begin Initialize actor, critic 1, and critic 2 networks with random parameters (ω, ϕ 1 , ϕ 2 ) Initialize the parameters of target network with ω ′ = ω, ϕ ′ 1 = ϕ 1 and ϕ ′ 2 = ϕ 2 Initialization of highest episode Ȇ ¸, and highest iteration count, and delay of policy update Clear experience replay buffer For xxx 0 × 04BE = 0 to Ȇ ¸do Generate initial observation state of duty cycling environment For u = 0 to U do Choose action F t based on policy π ω (E t ) with the noise of exploration Observe next E t+1 state and reward N t Store E t , F t , N t , E t+1 in experience replay buffer E t = E t+1 Sample tuples ofN transitions (E m , F m , N m , E m+1 ) from replay buffer Compute target Q valueo m sing eqn (40) Update the parameter ϕ 1 and ϕ 2 of critic present network using eqn (41) If 0 × 04BE mod updating delay then Update the parameter of actor present network Update the parameters ϕ ′ 1 , ϕ ′ 2 , ϕ ′ of target network

In
Stackelberg GAN, there are J generators and one dis-Pseudocode for GTGAN Input: Path selection metrics Output: Optimal ranked routes Begin For epochs in training do For step in D training do Model minibatch of N noise samples {e(1), e(2), . . .., e(N )} from prior of noise p g (e)

Fig. 7
compares the throughput for the proposed and existing models in comparison to the simulation round.The comparison results demonstrated that the proposed SE2Bio-CR model outperforms other models in terms of throughput.The reason for achieving high throughput is to perform secure authentication, energy efficiency clustering, RL-based duty cycling, and secure multipath routing.Here, all the sensor nodes are authenticated by a secure agent using the LEA algorithm, which increases security and eliminates illegitimate nodes, helping to increase throughput.Efficient clustering and routing lead to high communication reliability and throughput.

FIGURE 8 .
FIGURE 8. Comparison of packet delivery ratio.
Fig 10 represents the alive node comparison of both proposed and existing works with respect to simulation rounds.The alive node count is gradually increases with the increasing amount of simulation rounds.The proposed SE2Bio-CR model achieves a high number of alive nodes compared to existing works because it performs RL-based dynamic duty cycling using the ITD3 algorithm.Here, the edge server performs clustering, scheduling, and routing, which increases network lifetime and number of alive nodes by reducing energy consumption.The existing works have fewer alive nodes due to inefficient clustering, scheduling, and routing.In the proposed SE2Bio-CR model, the alive nodes are minimized to 90 from 100, the DRL-SS model has 85 alive nodes, the HOA-WSN has 75 alive nodes, and the HybMeta-WSN model has 67 alive nodes.

Fig 10
Fig 10 represents the alive node comparison of both proposed and existing works with respect to simulation rounds.The alive node count gradually increases with the increasing amount of simulation rounds.The proposed SE2Bio-CR model achieves a high number of alive nodes compared to existing works because it performs RL-based dynamic duty cycling using the ITD3 algorithm.Here, the edge server performs clustering, scheduling, and routing, which increases network lifetime and number of alive nodes by reducing energy consumption.The existing works have fewer alive nodes due to inefficient clustering, scheduling, and routing.In the proposed SE2Bio-CR model, the alive nodes are minimized to 90 from 100, and the DRL-SS model has 85 alive nodes, HOA-WSN has 75 alive nodes, and the HybMeta-WSN model has 67 alive nodes.The difference between the proposed and existing models is five alive nodes for the DRL-SS model, 15 alive nodes for HOA-WSN, and 23 alive nodes for the HybMeta-WSN model.

TABLE 2 .
Comparative analysis of literature survey.

TABLE 3 .
Design goals of the proposed SE2Bio-CR model.
FIGURE 2. First-order radio model.

TABLE 4 .
Summary of system variables.

TABLE 5 .
Prepresentation of S and R table.

TABLE 6 .
Representation of truth table.

TABLE 7 .
Representation of state, action, reward of ITD3.

TABLE 9 .
Numerical analysis of proposed and existing model.