Permissioned blockchain and deep reinforcement learning enabled security and energy efficient Healthcare Internet of Things

Recently, the Healthcare Internet of Things (H-IoT) has been widely applied to alleviate the global challenge of the coronavirus disease 2019 (COVID-19) pandemic. However, security and limited energy capacity issues remain the two main factors that prevent the large-scale application of the H-IoT. Therefore, a permissioned blockchain and deep reinforcement learning (DRL)-empowered H-IoT system is presented in this research to address these two issues. The proposed H-IoT system can provide real-time security and energy-efficient healthcare services to control the propagation of the COVID-19 pandemic. To address the security issue, a permissioned blockchain method is adopted to guarantee the security of the proposed H-IoT system. As for handling the limited energy constraint, we employ the mobile edge computing (MEC) method to offload the computing tasks to alleviate the computational burden and energy consumption of the proposed H-IoT system. We also adopt an energy harvesting method to improve performance. In addition, a DRL method is employed to jointly optimize both the security and energy efficiency performance of the proposed system. The simulation results demonstrate that the proposed solution can balance the requirements of security and energy efficiency issues and hence can better respond to the COVID-19 pandemic.

plications of the H-IoT in the healthcare industry. For example, one review summarizes the changes brought about by ICTs in the healthcare industry, where the IoT, cloud and fog computing, and big data have reformed the whole ecosystem of the healthcare industry [5]. Because IoT-based healthcare applications have the potential to improve the efficiency of healthcare delivery systems to alleviate the challenges of the COVID-19 pandemic [6], an increasing number of IoT applications have been employed in the healthcare industry in recent years. For example, advanced digital technologies, including the IoT, big data analytics, AI, and blockchain, have already been applied to alleviate the impact of the COVID-19 pandemic, such as by effectively monitoring, detecting, and preventing the spread of COVID-19 [7]. Furthermore, the IoT platform can be employed to develop intelligent COVID-19 pandemic prevention and control, including symptom diagnosis, quarantine monitoring, contact tracing, social distancing, outbreak forecasting, and mutation tracking [8]. To prevent the spread of COVID-19, the IoT can help with remote diagnostics and detection [9]. In addition, an empirical study investigating the staff of healthcare manufacturing companies in South Korea demonstrates that digital healthcare technologies, including AI and IoT, can improve the performance of supply chains in the healthcare industry in response to the pandemic [10].
However, due to the limited computing and communication resources of H-IoT systems, security and limited energy capacity issues have become two main issues preventing the large-scale and widespread application of H-IoT. For example, in [2], the performance challenges of H-IoT in the healthcare industry are reviewed, including low latency, low power operation, security, and real-time operational challenges, as well as the scalability, service availability, interoperability, and the regulatory policies [2]. In addition, a blockchainbased energy-efficient data collection system is presented to address the security and energy-efficient challenges of IoT applications [11]. In summary, as the research on the H-IoT in the COVID-19 pandemic is still in its initial stages, efforts addressing both the security and energy efficiency issues of the H-IoT are still lacking. Therefore, in this study, we introduce a permissioned blockchain and deep reinforcement learning (DRL)-enabled H-IoT system to improve both the security and energy efficiency of the H-IoT system.
To address the security issue, the permissioned blockchain method is used to protect the security of the proposed H-IoT system. An increasing number of related studies employ blockchains to ensure the security and auditability of IoT applications [12]. In addition, blockchains can improve the interoperability, privacy, security, reliability, and scalability of IoT systems [13]. The blockchain method not only plays a strategic role in the healthcare sector but can also improve the clinical practice when dealing with the COVID-19 pandemic [14]. Therefore, the blockchain method is adopted in this research to guarantee the security of the H-IoT system. However, the permissionless blockchain methods commonly employed by most of the current research cannot fit the limited computing and communication resources of the H-IoT system. Hence, the permissioned blockchain method is chosen in this study due to its limited energy and computational resources.
To address the limited energy capacity issue, mobile edge computing (MEC) and energy harvesting methods are adopted. It is worth noting that an increasing number of studies have employed the MEC method to offload computing tasks to the edge server in order to improve the performance of the IoT system [15]. Therefore, in this study, the MEC method is used to alleviate the computing burden and the energy consumption of the H-IoT. In addition, some researchers have adopted the energy harvesting method to capture energy from external sources to prolong the running time of IoT [16]. Therefore, we also deploy an energy harvesting method to improve the energy efficiency of the proposed H-IoT system. Furthermore, a joint optimization method is employed that takes into account both the security requirements and the energy efficiency performances of the H-IoT system. Considering the complex and dynamic nature of the blockchainbased and MEC-enabled H-IoT system, we model the system as a discrete Markov decision process (MDP) problem [17], to explore the maximized system reward. Since the dynamic state of H-IoT system cannot be defined in advance, a DRL method is adopted in this study for the joint optimization task. In particular, an asynchronous advantage actor critic (A3C) algorithm is adopted for joint optimization of the throughput of a permissioned blockchain system and the energy efficiency of the H-IoT system.
This study is organized as follows: Section II presents the review of related studies regarding H-IoT applications dealing with the COVID-19 pandemic. Section III introduces the system model and the problem formation to address the security issue of the H-IoT system in COVID-19. Section IV describes the details of the proposed DRL method for the joint optimization of the H-IoT system. Section V presents extensive simulation results to evaluate the proposed H-IoT systems. Section VI and Section VII consist of the discussion and conclusions. The abbreviations frequently employed in this study are listed in Table 1.

II. RELATED WORKS
The related research described in this section contains three components: the H-IoT-related applications in the COVID-19 pandemic, the security issue of such H-IoT applications, and the limited energy capacity issue of these H-IoT applications.

A. H-IOT-RELATED APPLICATIONS IN THE COVID-19 PANDEMIC
The H-IoT, as an important and emerging application of IoT in the healthcare industry, has developed rapidly with the advancement of ICTs and the widespread use of wearable smart IoT devices in recent years [1], [2]. A comprehensive review of future H-IoT applications is presented in [2], including machine learning, fog and edge computing, big data, blockchain, and SDNs. A review of the H-IoT from  [3] with the focus on three technologies, namely, sensing, communication, and data analytics and inference. In [18], a comprehensive review summarizes the integration of IoT and cloud computing in the healthcare industry, offers a complete IoT and clouding computing framework in the healthcare industry, and classified them briefly. The study also addresses threats, vulnerabilities, and attack risks of IoT in the healthcare industry. The COVID-19 pandemic has accelerated the application of advanced ICTs in healthcare areas [7]. For example, [7] summarizes the applications of IoT, big data, AI, and blockchain technologies for tracking during the COVID-19 pandemic and for mitigation of the pandemic's indirect negative impacts. In addition, a comprehensive review summarizes the advanced ICTs applied to alleviate the impact of the COVID-19 pandemic, including IoT, unmanned aerial vehicles (UAVs), blockchain, AI, and 5G [19]. In [8], another review summarizes the IoT platform for COVID-19 prevention and control and discusses the details of the fog and cloud combined IoT platform in the COVID-19 pandemic prevention and control process. This includes symptom diagnosis, quarantine monitoring, contact tracing, social distancing (during the COVID-19 pandemic prevention period), and outbreak forecasting and mutation tracking (in the COVID-19 pandemic control stage). In addition, a novel remote diagnostic and detection COVID-19 pandemic system based on AI and IoT is presented in [9]. This system not only helps doctors reduce direct contact with patients but also stops the spread of the virus. The study presented in [20] discusses the applications of IoT and AI in healthcare in different South American countries during the COVID-19 pandemic. Another study presents the state-of-the-art literature on machine learning-based techniques for big data analytics in IoT smart healthcare areas and discusses their strengths and weaknesses [21]. Furthermore, a comprehensive review summarizing the potential roles of machine learning in H-IoT within healthcare areas is presented in [22]; these potential roles include diagnosis, prognosis and spread control, assistive systems, monitoring, and logistics.
Although an increasing number of studies have combined advanced digital technologies or ICTs to prevent and treat the COVID-19 pandemic in recent years, there are still many challenges for the large-scale application of the H-IoT in practice. For example, in [2], the main challenges of the H-IoT are summarized, which include low latency, low power operation, security, real-time operation, scalable deployment, networking solutions, service availability, interoperability, and regulatory policy challenges [2]. Considering the characteristics of the H-IoT system, the security challenge and limited energy capacity challenge are two main issues that affect the system's performance in the COVID-19 pandemic. Furthermore, there is not enough research to address both the security and the limited energy capacity issues of the H-IoT, especially in the dynamic environments of the COVID-19 pandemic. Therefore, the permissioned blockchain and DRL methods are employed in this research to address the security issue and the limited energy capacity issue of the H-IoT to improve the performance in response to the COVID-19 pandemic.

B. THE SECURITY OF H-IOT APPLICATIONS IN THE COVID-19 PANDEMIC
An H-IoT system must deal with many personal data [23], such as heart and breath data [2], personal medical information [24], and personal living information, such as details of real-time location and travel [25]. Because this information is private and should be secure [26], an increasing number of studies have adopted blockchain systems to enhance the security of the H-IoT systems. For instance, some comprehensive surveys have summarized the research issues, applications, and challenges of blockchain technologies in IoT fields [13], [27], [28]. In addition, [14] demonstrated the strategic role of blockchain in healthcare areas. It shows that blockchains can not only be applied to safely share information between different groups of persons but can also improve COVID-19 pandemic-related clinical practice. [13] discusses the integration of blockchain technology with IoT and describes the research issues concerning blockchains for next-generation networks. Furthermore, blockchain technologies are already widely employed to guarantee security in industrial settings [29], [30] and large-scale IoT applications [31]. There are three main types of blockchains: public or permissionless blockchains, private or permissioned blockchains, and consortium or federated blockchains [12]. In light of their limited computing and energy capacities, an increasing number of H-IoT applications choose the private or the permissioned blockchain in practice [32].
Although a growing number of studies employ blockchains to ensure the security of the H-IoT, these blockchain-based H-IoT systems would also be vulnerable to malicious security attacks. For instance, [33] summaries the blockchain-related solutions for IoT security. In addition, [34] lists the security challenges for smart contracts in IoT systems. Moreover, [35] reports the research challenges VOLUME 4, 2016 involving the performance evaluation of blockchain-based security and privacy systems for IoT. However, most current research only focuses on the applications of blockchain systems in IoT scenarios, only few of them discuss the security threats and attacks in blockchain-supported IoT systems. Therefore, we employ the practical Byzantine fault tolerance (PBFT) consensus mechanism to address the security threat and attacks imposed by malicious nodes in the proposed permissioned blockchain system. Further, we note in this context that there is still not enough research dealing with the throughput of blockchain systems in the H-IoT system. Therefore, our research not only adopts the permissioned blockchain system but also optimizes the throughput of the permissioned blockchain used in our H-IoT system.

C. THE LIMITED ENERGY CAPACITY ISSUE OF H-IOT APPLICATIONS IN THE COVID-19 PANDEMIC
Although the proposed permissioned blockchain system can reduce the computing burden compared with the public blockchain system, it also brings extra computing tasks and energy consumptions to the H-IoT system. To reduce the energy consumption, the MEC method is often employed to offload the tasks to the edge servers [36]. For example, a DRL-based decentralized and efficient structure was presented in [37] for blockchain-enabled IoT systems to reduce transaction delays and transmission power consumption, which can improve the efficiency and reliability of blockchain-enabled IoT systems. In addition, related studies have adopted the MEC method to enhance the capability and reduce the energy consumption of IoT systems [15], [16], [36], [38], [39]. Therefore, we also employ the MEC system to offload computing tasks to reduce the energy consumption of the proposed H-IoT system. Furthermore, the proposed H-IoT system can determine whether to offload computing tasks to MEC servers based on real-time situations, which can cover both the partial offloading and binary offloading scenarios [40].
Energy harvesting in IoT, which can harvest energy from natural or artificial environmental resources for IoT networks, can alleviate their dependence on batteries and prolong the duration of IoT networks [41]- [43]. There has been increasing research employing energy harvesting methods to improve the lasting performance of IoT systems in recent years [15], [16], [38], [39]. The linear energy harvesting model [38] was also employed in this study to prolong the duration of H-IoT in response to the COVID-19 pandemic.
Although several studies address the energy efficiency problems of H-IoT systems, most of them focus only on how to improve the energy efficiency while neglecting the security issues of H-IoT systems, which is indispensable for H-IoT applications to respond to the COVID-19 pandemic. Therefore, in our research, both security and energy efficiency issues are jointly addressed to help deploy the H-IoT in complex environments.
The DRL method, which combines deep learning and reinforcement learning, is often used to address the real-time  [44]. It can effectively address issues related to the dynamic environment [45]. Considering the special characteristics of blockchainenabled IoT systems, DRL methods are widely employed to address the dynamic environment and to obtain the optimal solutions of blockchain-empowered IoT systems [46]. Therefore, a DRL method is employed in our study to optimize both the throughput of the proposed permissioned blockchain system and the energy efficiency of H-IoT systems. The details of the proposed deep reinforcement learning method are described in Sections III and IV.

III. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, the system model is introduced. We then demonstrate the MEC system and the energy management model. Subsequently, the permissioned blockchain and throughput model are described. Finally, the problem formulation process is presented.

A. H-IOT SYSTEM FRAMEWORK
Although an increasing number of H-IoT applications are employed to respond to the COVID-19 pandemic, the limited energy and computing capability of H-IoT restrict their largescale application in practice. Fortunately, the MEC system can significantly increase the computing capability of the H-IoT system, while the energy harvesting system can surpass the current limits on battery capacity to prolong the running time of the H-IoT system. These two methods can improve performance and broaden the applications of the H-IoT in the COVID-19 pandemic. Accordingly, in this research, a secure and energy-efficient H-IoT system is presented in Figure 1.
As illustrated in Figure 1, the proposed H-IoT system consists of an MEC network that includes several MEC server devices, a permissioned blockchain system deployed on the MEC network, and several H-IoT devices with an energy harvesting module and computing module, which can process the computing tasks at local devices or offload them to MEC server devices. The details of the proposed H-IoT system are presented in the following sub-sections

B. PERMISSIONED BLOCKCHAIN SYSTEM AND THROUGHPUT MODEL
Considering the limited computing and energy resources of the H-IoT system, a permissioned blockchain model is deployed on MEC servers to guarantee the security of the proposed H-IoT system, which can prevent the security threat and attacks of malicious nodes in the proposed H-IoT system during COVID-19 pandemic.
According to [47], the proposed permissioned blockchain systems in this research contain two main steps: Generating a block and appending the block to the blockchain. Generating a block consists of collecting, validating, and packaging the transactions into a block, while appending the block to the blockchain including both broadcasting the generated block to other block produces and appending the block to the local blockchain after the consensus of the new block is reached by the consensus mechanism.
Considering there may be some malicious nodes in the proposed H-IoT system, the proposed blockchain system mainly deal with manipulation-based attacks [27], which means the malicious node may send incorrect or false information to the blockchain system [47]. Hence, the PBFT consensus protocol has been employed in this study to guarantee the security of the H-IoT system. The whole consensus of PBFT includes five steps: Request, Pre-prepare, Prepare, Commit, and Reply [47]. In the PBFT consensus mechanism, every transaction will be validated by every other node in the blockchain system, then the final consensus decision is following a majority rule that can prevent some malicious Byzantine replications proposed by dishonest or malicious nodes [33]. In a nutshell, the PBFT can achieve a robust consensus as long as the malicious replicas are less than a fraction (1/3) [48].
Given q consensus nodes in the proposed permissioned blockchain system, which can be defined as Q = {1, 2, ..., q}. In the beginning, the H-IoT system sends the consensus requirement of the new block to the blockchain system. Then the blockchain system randomly chooses a primary node and assigns all other nodes as replica nodes. In terms of [47], [49], there are three main steps of the PBFT consensus in H-IoT. Firstly, the primary node verifies the signature and the message authentication code (MAC) of the new block, then the primary node generates a MAC and broadcasts them to all other replica nodes. Secondly, each replica node verifies the received new block and MAC, then generate a MAC and broadcasts them to other replica nodes, which means each node will verify these MACs and new blocks from all other nodes. Finally, any node that received more than 2L (where L = (q − 1)/3) correct message will send one correct reply message to the master node, if the master node receives more than 2L correct reply massage, the consensus process can be recognized as successful. Based on the PBFT consensus mechanism, the proposed system can filter out the security threat and attacks of malicious nodes in the H-IoT system.
As a robust and famous consensus protocol adopted by the Hyperledger, PBFT employs a replication algorithm to tolerant Byzantine failures [33]. moreover, since the PBFT can support the smart contracts of the Hyperledger platform, growing research adopts the PBFT consensus protocol as the foundation of current blockchain application [50]. Therefore, based on the PBFT consensus protocol, the proposed framework in this study not only can support the blockchain system in the H-IoT system but also can be integrated with the current blockchain framework.
Following related studies [17], [30], [47], the transactional throughput of a permissioned blockchain system relies on two parameters, namely, the block size and block interval. Here, T b denotes the block interval, and the transactional throughput Ω of the permission blockchain system can be modelled as follows: where S b represents the block size and χ denotes the average transaction size [17], [30], [47].

C. MEC SYSTEM AND ENERGY MANAGEMENT MODEL
To guarantee the security, the permissioned blockchain system has to add some extra computing tasks of the proposed H-IoT system. Therefore, both MEC system and energy management model are used in this study to improve the performance of the proposed H-IoT system. Given an MEC system consisting of M H-IoT devices and N mobile edge servers, similar to that in related studies [36], [38], and given a fixed constant slot duration of time k (where k ∈ {1, 2, 3, ...}), the H-IoT system generates a computing task of C (k) bits at time slot k [38]. Following related research [36], the computing task can be partitioned into λ parts. Then, the H-IoT system can decide whether to process the computing task using local computing devices or to offload the task to mobile edge servers. In detail, B (k) i represents the radio link transmission rate of the H-IoT system to offload some tasks to the selected mobile edge device i (where 1 ≤ i ≤ N ) at time slot k, and x (k) (where 0 ≤ x (k) ≤ 1) is the offloading rate of the H-IoT system. If x (k) = 0, the H-IoT system executes all the computing tasks using local devices, and if x (k) = 1, the H-IoT system offloads all the computing tasks to the selected mobile edge devices i. When 0 < x (k) < 1, the H-IoT system executes (1 − x (k) )C (k) tasks on local devices while offloading x (k) C (k) tasks on selected mobile edge devices i. In a nutshell, the value of x (k) can represent both the partial offloading and binary offloading decisions of proposed H-IoT system [40]. In addition, the key notations are listed in Table 2.

Notation
Description The volume of computing task during time slot k λ The partition number of computing task B (k) i The radio transmission rate between the H-IoT devices and edge server i x (k) The ratio of data offloading to the edge server M The number of CPU cycles needed to process one bit computing task fm The CPU cycle frequency of local H-IoT device µ The effective capacitance coefficient T The transmission during to offload computing task to edge server i P The transmission power to offload computing task to edge server i b (k) The battery level ρ (k) The amount of energy harvested during time slot k E k l The energy consumption of the local H-IoT devices E k i The energy consumption of offloading task to edge server i E (k) The total energy consumption during time slot k Ω The transactional throughput of permissioned blockchain system S b The block size of permissioned blockchain system T b The block interval of permissioned blockchain system χ The average transactional size

1) Energy consumption of local computing
Following related studies [36], [38], we assume that the central processing unit (CPU) is responsible for local computing. Hence, the CPU cycle frequency can be adopted to measure the performance. Given that M is the required number of CPU cycles for one input bit, the total CPU cycles for local computing is (1−x (k) )C (k) M . Based on [36], [38], dynamic voltage and frequency scaling techniques can be applied, and the H-IoT devices can control the energy consumption by adjusting the CPU frequency f m for each cycle m (where m ∈ 1, 2, ..., (1 − x (k) )C (k) M ). The local computing energy consumption E (k) l can hence be modelled as follows: where µ is the effective capacitance coefficient, which depends on the chip architecture of H-IoT devices [51].

2) Energy consumption of MEC
According to [36], [38], the H-IoT devices can offload x (k) C (k) bits of tasks to mobile edge devices i at time slot k with the uplink radio transmission rate B is the transmission duration of the H-IoT device to offload tasks to the edge server, the transmission duration can be modelled as follows.
Following [36], [38], given the transmit power P to offload the tasks to mobile edge device i at time slot k during T can be modelled as follows:

3) Energy harvesting and energy consumption
Assume that the H-IoT is equipped with an energy harvesting module that can capture energy from energy resources, including solar, wind, and RF signals, and that the harvested energy can be stored in the battery to support both the local computing and offloading energy consumption. Suppose ρ (k) represents the amount of energy harvested during time slot k as in [36]. Let E (k) denote the total energy consumption during time slot k. It can be defined as follows: According to [38], the battery level at the beginning of time slot k can be modelled as follows: If the battery charge is insufficient, for example, when b (k+1) = 0, the H-IoT devices will drop the computing task [51].

D. PROBLEM FORMULATION
To improve the security and energy efficiency of the proposed H-IoT system, the system needs to simultaneously maximize the blockchain throughput and minimize the energy consumption. Therefore, similar to related studies [17], [30], the total objective function of the H-IoT system is modelled as follows: where w 1 ,(0 < w 1 < 1) is a weight parameter that can combine two objective functions into a single function, and w 2 is a mapping factor that can ensure two objective functions at the same level.

1) Discrete MDP problem
Similar to related research on MEC computing in the IoT, we model this joint optimization problem as a discrete MDP problem. The optimization problem can be formulated as a four-tuple < S, A, P, r >, where S denotes the state of the H-IoT system, A represents the discrete action space set of the H-IoT system, P is the state transition probability, and r is the reward function of the optimization problem.

2) System state space and transition probability
At each epoch, the DRL agent learns the experience from the state space and updates the decision policy based on the observed state. The state space at epoch t can be represented as follows: where b (k) (t) is the battery level, C (k) (t) denotes the computing task, ρ (k) (t) represents the energy harvesting volume, and χ(t) denotes the transactional size. Similar to related research [52], in the state transition model, the state transition from the current state s(t) to a new state s(t + 1) can be modeled as follows: where r(t) is the randomness, s(t), s(t+1) ∈ S(t), a(t) ∈ A(t), which shows that the state transition is controlled not only by the current action a(t) but also by random conditions, such as the energy harvesting conditions and the wireless communication conditions.

3) System action space
The DRL agent can dynamically choose actions to maximize the system reward. These actions at each epoch t, including the offloading decision a o (t), block size a bs (t), and block interval a bi (t), can be modelled as follows: The offloading decision, which decides whether to process the computing task by local devices or to offload the computing task to edge server devices, can be defined as follows: Following related studies [17], [47], the block size and block interval decision, which are important for optimizing the throughput of the proposed permissioned blockchain system, can be defined as follows: where M bs and M bi represent the maximum block size and maximum block interval, respectively.

4) Reward function
The objective of this research is to jointly optimize the security and energy efficiency of the proposed H-IoT system to respond to the COVID-19 pandemic. This can be represented as a joint optimization problem to maximize the transactional throughput of the permissioned blockchain system and minimize the energy consumption of the H-IoT system. This implies that it is necessary to jointly optimize the offloading decision, block size, and block interval. The reward function can hence be represented as equation (14). where C1 represents the amount of energy harvested from available resources, which cannot exceed the maximum battery level, and C2 denotes that the H-IoT system will drop the computing task when the battery level is insufficient.
The reward function can be illustrated as follows: Where

IV. DRL METHODS FOR H-IOT SYSTEM
The details of the DRL algorithm are presented in this section. It also includes the comparison of two mainstream DRL algorithms and the pseudocode of the proposed DRL algorithm. Growing DRL algorithms are often employed in IoT areas to address the MDP problem [44]. However, considering the special features of the H-IoT system, only DRL algorithms that can fit discrete action spaces are discussed in this research. Therefore, the deep Q-network (DQN) algorithm [53] and A3C [54] are used in this research. In addition, we compare the performance of the DQN and A3C with that of the random algorithm, which means the agents will randomly choose available actions without considering the complex scenarios [36]. The comparison results are shown in Figure 2. In Figure 2, the blue solid line represents the average reward of the A3C algorithm, while the orange dashed line and green dashed line denote the performance of the DQN algorithm and random algorithm, respectively. Both the DQN and A3C algorithms achieve better performance than the random algorithm. This demonstrates the validity of these two algorithms. Although the DQN algorithm can obtain a large average reward quickly, their results fluctuate as the episode number increases, and it is difficult for the DQN algorithm to obtain a stable and good performance. The A3C algorithm is much more robust and can achieve VOLUME 4, 2016 stable and good performance with an increasing number of episodes.
In addition, according to the simulation results in [54], the A3C algorithm is more robust than other DRL algorithms. Therefore, we chose the A3C algorithm to address our joint optimization problem. In addition to its robust performance, the A3C algorithm can take advantage of the benefits of both the value-based and policy-based methods. In addition, it fits both the discrete and continuous action spaces.
According to [54], [55], the A3C algorithm adopts a single deep neural network to approximate both the policy and the value function. It maintains a policy function π(a t |s t ; θ), which can be approximated by a softmax layer and an estimate of the value function V (s t ; θ v ), which can be output by a linear layer. Because the agents share the network parameters in A3C, the policy and value functions need to update the weights and parameters after every t max step action, or when they reach a terminal state. Following related studies [17], [30], [55], there are two cost functions in the A3C algorithm associated with the two deep neural network outputs; for the policy function, the cost function can be defined as follows: where θ and θ v are the parameters, while H(π(s t ; θ) is an entropy term, and the parameterβ controls the strength of the entropy regularization term. In function 16, R t denotes the estimated discounted reward in the time interval from t to t + k, which can be defined as follows: where k is upper bounded by t max , r t+i denotes the immediate reward, and Υ indicates the discount factor, with Υ ∈ (0, 1].
In addition, the cost function for the estimated value function can be modelled as follows: In terms of [54], [55], the gradient of policy function f π (θ) with respect to parameter θ can be defined as follows: The gradient of the value function f v (θ) with respect to parameter θ v can be represented as follows: Following related studies [17], [30], [54], [55], the standard non-centered RMSProp algorithm is used to optimize both the policy functions and the value functions, which can be illustrated as follows.
Algorithm 1 A3C based joint optimization algorithm for H-IoT 1: Initial global parameters θ, θ v , and global step counter T = 0 2: Initial related parameters T max , t max , discount factor Υ, learning rate η, number of agents W , and ϵ 3: Initila local parameters θ ′ , θ ′ v , and local step counter t = 1 4: for each episode do 5: for w = 1 to W do for i = t − 1 to t start do 20: Accumulate policy gradients wrt θ (19) 22: Accumulate value gradients wrt θ (20) 23: end for 24: Asynchronous update weight parameter θ and θ v according to function (21) and (22) 25: end for 26: end for where a indicates the momentum, ∆θ represents the accumulated gradient of the cost functions, η denotes the learning rate, and ϵ is a small positive parameter.
In terms of related studies [17], [30], [54], the pseudocode of the A3C algorithm is described in Algorithm 1 as follows.

V. SIMULATION RESULTS
In this section, extensive simulations are conducted to evaluate the proposed H-IoT system. The simulation parameters and numerical results are as follows: The simulation was conducted using PyTorch on a Pythonbased PyCharm platform. In detail, the CPU of the computer  [16], [51] is an Intel Core i9-10900k with 64 GB memory, while the GPU is an Nvidia GeForce RTX 3090 with 24 GB memory. The software is PyTorch 1.7.1 with Python 3.8 on Windows 10. An H-IoT system composed of an MEC system with some MEC server devices, a permissioned blockchain system, and several H-IoT devices is presented. Extensive simulations were conducted to evaluate the performance of the proposed H-IoT system. The details of the main simulation are presented in Table 3.
To evaluate the performance of the proposed method, we compared the results with the following three schemes: • Proposed scheme with fixed block size: The scheme uses the same method as the proposed scheme, except that the block size is fixed for a permissioned blockchain system; • Proposed scheme with fixed block interval: The scheme adopts a fixed block generating interval for the permissioned blockchain system, and the other parameters are the same as those of the proposed scheme; • Proposed Scheme with Only Offloading: The scheme does not process computing tasks by local devices; all the computing tasks are offloaded to the MEC server devices.
Because the learning rate is crucial for the DRL algorithm, we evaluated the performance of the proposed A3C algorithm under different learning rates, and the results are shown in Figure 3. This demonstrates that the learning rate of 1e-4 has the best average reward in most episodes. Therefore, we chose this rate to conduct further simulation.
The comprehensive comparison of the four different schemes is demonstrated in Figure 4 as follows.  Figure 4, four different lines represent the four different schemes, where the orange dashed line has the smallest value, and the blue solid line performs best in most episodes. Although the results fluctuate with the episodes, it clearly shows that the proposed scheme (the blue solid line) has the best performance in most episodes. Furthermore, the performance of the proposed scheme with a fixed block size is better than that with only offloading, whereas the proposed scheme with a fixed block interval achieves the lowest results. Thus the simulation results validate the performance of the proposed scheme. VOLUME 4, 2016 In addition, we explored the performance of the proposed scheme under different conditions , and the results are presented in Figures 5, 6, and 7.
First, the effects of computing tasks are presented in Figure  5. In Figure 5, all four types of lines decrease with an increase in the computing task. This is mainly because the growing computing task will increase the energy consumption. In addition, the proposed scheme (the blue solid line) also obtains the largest average reward at most times. This shows the performance of the proposed scheme when dealing with different computing tasks.
Second, the effects of average transaction sizes are illustrated in Figure 6. As shown in Figure 6, the average rewards of the four proposed schemes all decrease with increasing in average transaction size, while the proposed scheme (the blue solid line) achieves the highest value most times. The increase in the average transaction size will reduce the throughput of the proposed permissioned blockchain system, which can reduce the average reward.
Third, the effects of the maximum block size are presented in Figure 7.  figure 7, the performance of the four proposed schemes increase with an increase in the maximum block size, whereas the proposed scheme (the blue solid line) has the highest value in the majority of the time. The fixed block size of the proposed scheme with a fixed block size (green dashed line) changed with the maximum block size. The proposed scheme (blue solid line) also performs best most times. The performance of the four proposed schemes increases with an increase in the maximum block size. This is because the number of transactions in a block increases with the increase in the maximum block size, which can improve the throughput of the proposed permissioned blockchain system.

VI. DISCUSSION
The COVID-19 pandemic has not only caused severe consequences globally but has also impacted on the healthcare industry, especially in terms of pandemic prevention and control. Although an increasing number of H-IoT applications have already been applied to respond to the pandemic, most of them do not address both security and energy efficiency issues. In this research, the permissioned blockchain method and DRL method are adopted to guarantee the security and energy efficiency of the applications of H-IoT systems in the COVID-19 pandemic area.
Since it is difficult to conduct field experiments, this research adopts simulation methods to evaluate the performance of the proposed H-IoT system. Although the simulation results can demonstrate the effects of the proposed system, they cannot reflect complex real-world situations, especially for large-scale applications. Therefore, field experiments should not be neglected. In addition, there are still other challenges for H-IoT to be addressed, including latency and real-time operation issues.

VII. CONCLUSION
To improve the performance of the H-IoT system to mitigate the impact of COVID-19 globally, a security and energyefficient H-IoT system for COVID-19 has been developed in this research. The proposed system not only ensures security by using a permissioned blockchain system but also reduce the energy consumption by offloading tasks to MEC server devices as well as by capturing energy using energyharvesting devices. In addition, a DRL algorithm is employed to optimize both the throughput of the blockchain systems and the energy efficiency of the proposed H-IoT system. The extensive simulation results demonstrate that the proposed system can balance both the throughput of the permissioned blockchain system and the energy efficiency of the H-IoT system. This illustrates that the proposed methods can alleviate the H-IoT system's security and energy efficiency issues, thereby improving the performance of H-IoT systems in mitigating the impact of COVID-19.
Future research will focus on two main directions. First, we will study how to apply the proposed H-IoT system to real-world healthcare applications to alleviate the security and energy efficiency concerns during COVID-19 pandemic control, this will imply that more factors need to be considered. Second, we plan to further study the combination of blockchain and DRL to address more complex applications in H-IoT areas, such as real-time H-IoT applications and big data-related H-IoT applications.