Energy-Efficient Edge Offloading in Heterogeneous Industrial IoT Networks for Factory of Future

The ultra-reliable and low latency communication (URLLC) and massive machine type communication (mMTC) in 5G are envisioned to support intelligent automation in the heterogeneous Factory of Future (FoF) networks, and Mobile-edge computing (MEC) is considered to be a promising system for enabling real-time task processing at the edge of the network. In the future factory, production machines, and environmental monitoring devices will be endowed with the wireless connecting for mobility. These devices are deployed for running complicated real-time tasks. To make such mission-critical tasks being processed in time, parts of the tasks should be completed with the assistance of the edge server or even the cloud. In this work, we jointly investigate the partial task offloading, computation, and communication (licensed and unlicensed) resource allocation problem in the trade-off between overall power consumption and quality of service (QoS) satisfaction. A 2-tier MEC-cloud framework is provided, wherein the IoT mobile devices (MDs) are able to partition the tasks into segments and offload them to the MEC and the cloud server. Considering the limits of communication and computation resources, we proposed a mechanism call 5G and NR-U opportunity-cost-based offloading algorithm (5G/NR-U OCBOA) to optimize resource allocation. Within the mechanism, there are two proposed algorithms, 5G OCBOA is for the licensed-only case, and NR-U OCBOA dedicates on unlicensed one. We iteratively perform the two algorithms to get the final solution. The simulation results show that our low-complexity algorithms almost outperform the other benchmark greedy algorithms. The proposed algorithm is up to 59.3% MD blocking probability less, up to 58.7% power saving gain, and up to 47.6% more QoS gain.


I. INTRODUCTION
In recent years, smart factory technology is emerging in many research topics to increase productivity and efficiency, where the smart industrial IoT MDs like robots and assembly arms collaborate with each other and with workers. There are also many kinds of Information and Communication Technologies (ICTs) working together to provide information exchange and intelligent functions [1], which boost the development of The associate editor coordinating the review of this manuscript and approving it for publication was Tie Qiu .
intelligent Industrial Internet-of-Things (IIoT). This kind of factory automation is expected to process vast amounts of data and orchestrate complex cyber-physical components, which can realize unmanned factories. To reach the automation purpose, tons of sensors and controllers are needed to collect the real-time information for immediate reaction during the manufacturing process for precision instrument control, autonomous production lines, emergency detection, and reaction [2].
3GPP has revealed the key points of factory automation in [3], such as the control of flows and chemical reactions VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ referred to as process automation, which requires communications for supervisory and open-loop control applications, as well as process monitoring and tracking operations inside an industrial plant. For this purpose, a large number of sensors and monitors are required. In addition, factory automation for manufacturing requires ultra-reliable low-latency communication (URLLC) on the user-plane, such as 10 ms E2E latency and (1 − 10 −5 ) reliability. For some typical use cases of closed-loop motion control and local monitoring, the demand is characterized by a more extreme E2E latency of 1 ms, with more stringent reliability of (1 − 10 −8 ) or more [4], [5]. Furthermore, the Clear5G 1 project has pointed out the critical challenges of automotive IIoT is the system capacity demands. Based on the Clear5G deliverable [6], the critical issues are to address the lack of dynamic control across diverse wireless network resources, including wireless traffic arrangement across radio access technologies (RATs) and the balance across the available spectrum. The IoT devices are also constrained by physical limitations such as energy and computation capability. When performing computationally intensive tasks on the MDs, the QoS can be significantly affected by the limited processing power of MDs. Multiaccess Edge Computing (MEC) system provides a new solution for such a challenge, which brings the computation resources in proximity to devices. MEC system can also offer a service environment with ultra-low latency, high bandwidth, local caching, and direct access to real-time network information, which are envisioned to support many URLLC scenarios such as Real-time monitoring and operating system in smart factory [7], [8].
To deal with the huge amount of traffic in the plant of intelligent factory, Clear5G project focus on the feasibility of multi-RAT scenario. Along with the licensed band, operators are trending to use the unlicensed band to alleviate their licensed band cellular traffic. Several new unlicensed band technologies have merged and been investigated, such as LTE on Unlicensed (LTE-U), LTE Licensed-Assisted Access (LAA) [9], and LTE-WLAN Aggregation (LWA) [10]. The above-mentioned radio access technologies (RATs) have formed a new type of heterogeneous networks (HetNets), which are keys to enhance network coverage and data throughput [11]. Recently, 5G New Radio in Unlicensed Spectrum (NR-U) [12] seems to be a promising technology to satisfy the stringent IIoT use cases, and the integration of MEC system with HetNets will form a more robust network in FoF scenarios [13].
However, to the best of our knowledge, there is no literature targeting at energy-efficient (EE) MEC task offloading and application (APP) performances under heterogeneous 1 The Horizon 2020 EU-Taiwan collaboration Clear5G project focuses on 5G wireless solutions to empower the factories of the future. The Clear5G team is established by a combination of European and Taiwanese successful, innovative, and well known major corporations, SMEs, as well as research and academic institution. Clear5G dedicates to 5G and beyond communication architecture design, resource management, protocol enhancements, standardization, prototyping, and demonstration. https://clear5g.eu/ networks, which might be a promising IIoT scenario for factory automation in the future. Therefore, the main contributions in this work are summarized as follows: • A 2-tier partial offloading RAT/MEC-cloud architecture for IIoT is proposed. We consider both 5G licensed and unlicensed spectrum (5G NR-U) as the RATs for wireless communication. With the limited energy and computational capability of the MDs, we formulate a generalized assignment problem (GAP) and jointly optimize the energy consumption on the MDs and their transmission reliability.
• In our previous work [14], we proposed an opportunitycost-based offloading algorithm (OCBOA) for licensed band only heterogeneous cellular networks. Inheriting from our previous work, we proposed an enhanced version of OCBOA. which can solve the MEC task offloading and resources allocation problem efficiently not only for the licensed but also the unlicensed band.
• We conduct extensive experiments and have some insightful discoveries. We jointly consider the energy efficiency (EE) and APP QoS sanctification in this work. The utility function is tailored for the customize requirements in different factory scenarios by adjusting a weight factor. The simulation results demonstrate that the superior performance of our proposed mechanism and algorithms almost outperform the benchmark algorithms, and the QoS requirement can be satisfied as much as possible. The remainder of the article is arranged as follows. We go through literature reviews and the results in Section II. A system model is described in section III including the MD description, throughput model and energy model of licensed and unlicensed offloading. The problem formulation is introduced in Section IV, and the proposed solution are provided in Section V. Performance evaluation is provided in section VI and this paper is concluded in section VII.

II. RELATED WORKS A. MOBILE-EDGE COMPUTATION OFFLOADING
Many research works have been done on mobile edge computing. In [15], Chen et al. proposed the architecture of a software-defined ultra-dense network (SD-UDN) embedded MEC system. By deploying a controller at macro cell (MC) base station (BS), the global information about MDs, BSs, edge nodes, cloud, and tasks can be acquired. They presented an efficient task offloading solution which not only reduces the task duration but also considers the battery capacity on the MDs. The authors in [16] proposed a multi-user and multi-MEC scheme where the offloading strategies and wireless resource allocation are jointly investigated. By exploiting the tradeoff between the multi-user diversity and multi-MEC diversity, an energy-efficient strategy is proposed to optimize the offloading decisions. In [17], a concept of a combination of MEC and wireless power transfer is proposed. The authors performed task offloading and computing optimization under a multi-antenna Access point (AP) Time Division Multiple Access (TDMA) scenario. In [18], with a concept of the residual energy, Zhang et al. proposed an energyaware offloading scheme considering the available number of the small cells (SCs) and the CPU cycle frequency of user equipment (UEs). To find the best computational and communication resource allocation strategy, a game-theoretic greedy approach is proposed in [19] to conduct MEC data offloading in ultra-dense IoT networks. Due to the complex essence of the task offloading decisions are sometimes NP-hard, some evolutionary strategies such as genetic algorithm (GA) in [20] and artificial fish swarm algorithm (ASFA) in [21] are proposed to solve such the difficult problems. Some authors seek the opportunity of promising machine learning framework to solve the dynamic system resource allocation problem. In [22], the authors proposed an edge-to-edge and edge-tocloud resource allocation through an on-demand AI learning approach, which can adaptively match the dynamic demand of tasks and improving the QoS of services. There are also some research efforts on decentralized algorithms for computation offloading to reduce the reliance on system dynamics information acquisition. In [23], the researchers proposed a novel peer offloading framework to account for network traffic offloading, bringing more opportunities for distributed MEC systems. The author in [24] investigated a decentralized computational traffic offloading and content caching problem and demonstrate that the decentralized algorithm can perform no worse than other centralized algorithms.
Moreover, the promising MEC server system can bring in-network caching capability, which is capable of reducing duplicate information exchange and increasing transmission efficiency [25]. Researchers have developed some content caching strategies for different network optimization purposes. The caching strategies presented in [26] and [27] are based on content popularity, while the authors in [28] designed a flexible caching algorithm on the basis of stochastic network information, which can be more flexible and resilient for network dynamics.

B. RAT-SELECTION UNDER HETEROGENEOUS NETWORKS
Regarding network heterogeneity in the 5G and beyond systems, different radio access technology (RAT) properties should be separately tackled in order to perform system optimization [29]. In [30], Lee et al. generalized the multiple network interface activation problem for multi RAT-equipped smartphones. A generalized multiple radio access architecture is proposed, where energy, latency, and data quota are jointly considered for a tasking offloading problem. In [31], Singh et al. proposed and demonstrated a simple yet optimal algorithm for downlink traffic splitting and aggregation in a multi-RAT heterogeneous network (HetNet), where the traffic of each MD is split across an MC and an SC. The throughput and capacity gains from the proposed aggregation algorithm are shown to be up to 70% over the baseline RAT selection algorithms. In [32], the same group of authors proposed an enhanced version of the solution, where each MD's traffic is split across multiple RATs or APs.
The proposed algorithm maximized a general network-wide utility. The developed framework is also applicable for RATs to employ millimeter-wave-based entities and working frequencies. In [33], the authors presented the Software-Defined Network (SDN) based WiFi offloading and load balancing solutions for cellular networks.

III. SYSTEM MODEL A. FACTORY OF FUTURE SCENARIO
As shown in Fig. 1, we consider a 2-tier industrial IoT network with multiple 5G NR small cells (SCs) coexisting with one 5G NR-U network. There are some WLAN users nearby which would cause interference to NR-U users. In the system, there are N IoT MDs having 5G NR and NR-U air interfaces, and they can switch to their preferred radio access technology for achieving higher QoS. 5G NR works on the licensed band, and 5G NR-U works on the unlicensed ones. To provide computation and storage capabilities for local devices, each radio access technology (RAT) node is installed with one MEC server and further connected to the central cloud server for a larger provision of computation capability. Each RAT is also connected to a self-organizing network (SON) controller, which would provide configuration and optimization of the radio access networks [34]. In terms of the IoT MDs endowed with computing capability, it can complete only one task at a time, which implies that the MDs and the task are paired one-on-one. And, it can communicate with an NR SC or NR-U node for more computational resources VOLUME 8, 2020 support at the MEC server and further at the cloud server via a fiber connection. Assume that each task is detachable, and therefore when a task arrives at an MD i, it can be decomposed and processed in three places, which are (1)MD i itself, (2)MEC server within the associated RAT j and (3)cloud server via the associated RAT j. As Fig. 2 Here we assume that each MEC server has limited computation capability and it can only accommodate a fixed number of tasks. On the other hand, the cloud server has more powerful computation and storage capability with infinite capacity.

C. MOBILE DEVICE DESCRIPTION
In the target FoF scenarios, each MD is equipped with one or more cameras to record real-time streaming for environment monitoring. MDs can monitor the surrounding environment all the time and make reasonable reactions according to their observation. However, the raw videos should be further processed with some computer vision techniques for retrieving useful information like video caption or object detection and recognition. However, limited by the physical and hardware size, we assume that the MDs are not able to finish the task by themselves, they always have to offload some parts of the task to the network. Thanks to the rapid evolution of deep learning based method in computer vision region, the video processing can be accelerated by GPU parallel computing in the MEC or cloud server. And these useful information can be fed into a smart reinforcement learning based agent system for decision making.
With the video processing as main application, each MD i can be described by a 7-field notation set ). It contains the information of video data size D i (in bits), the total number of video frames φ i , the transmission power on each subcarrier p i,trans (in Watt) or on unlicensed single subcarrier band, the GPU average processing power p i,comp (in Watt), the requested completion latency T max i (in second), the maximal RF transmission power P max i,trans (in Watt), and the MD GPU computation capability f MD i (in frames/sec). Note that D i and φ i have proportional mapping relationship according to current used resolution.

D. THROUGHPUT MODEL OF 5G NR LICENSED OFFLOADING
Assume that OFDMA is utilized as the uplink transmission. There are K j available subcarriers for uplink transmission in NR SC j, and the set is denoted as j ∈ M} denote the channel gain matrix, and g i,k,j denotes the channel gain on the subcarrier k between MD i and NR SC j. Each subcarrier has bandwidth B. Assume a flat fading environment where the channel gain remains the same in one transmission. σ is the background white Gaussian noise power.
In terms of the OFDMA mechanism, we ignore the interference in the view of the exclusive subcarrier allocation. For MD i who tries to offload its task to the MEC server on NR SC j and the cloud server, the aggregated data rate can be expressed as 5G New Radio in Unlicensed Spectrum (NR-U) aims to support the licensed spectrum in the 5G era [12], which has been included in 3GPP Release 16. It allows NR to access not only the existing 5GHz unlicensed band as well as the new ''greenfield'' 6GHz unlicensed band. Here we assume that only 5GHz band is used for NR-U with the coexistence of other WLAN users. Based on the adaptation of the well-known Bianchi model [35], the authors in [36] and [37] tried to analyze the coexistence of WiFi and LTE-LAA using 2-D Markov model and conduct a series of experiments to support the correctness of their outcome. Due to the similarity of Listen-Before-Talk (LBT) mechanism used in 5G NR-U for fair unlicensed spectrum access, we apply a similar method to analyze the coexistence of NR-U and WiFi. Table 1 depicts the detailed notation indications in the following discussion. Here, we assume that there are n n NR-U stations (1 gNB and n n − 1 MDs) and n w WiFi stations (1 MEC node and n w − 1 MDs) transmitting in UL which are co-channel and co-located, and, the transmission buffer of all the MDs are always full. Assume that NR-U and WiFi have the same sensing period before backoff. From the results derived in [36], the throughput of NR-U can be derived as where is the probability that a NR-U node transmits a packet successfully in one slot. T D is the fraction of data within one TXOP. T E is the total average time of all possible events, and r n is the NR-U physical code rate. Finally, the throughput of a single NR-U node, says node i, is Different from the infrastructure nodes powered by the grid, the MDs have more critical energy concerning since most of them are mobile and powered by batteries. Therefore, in this work, we focus on the energy consumption of the MDs only. For the MD i who is served by MEC node j, the overall task processing energy consumption model includes the communication energy term E ij,comm and the local computation energy term E i,comp , that is The communication energy consumption term E ij,comm could be licensed band supported (transmit by NR SC) E SC ij,comm , j > 0 or unlicensed band supported (transmit by NR-U) E NR−U i0,comm . In the following two subsections, we will derive the formulas of the two supporting types of offloading energy consumption model of the MDs.

F. TRANSMITTING ENERGY MODEL OF 5G NR LICENSED OFFLOADING
First, we consider the MDs served by licensed band. Let S i indicate the associated NR SC of MD i, the transmission power can be derived by aggregating all the transmission power on individual subcarrier in NR SC j. Thus, the overall communication energy consumption can be calculated by Since the LBT mechanism of NR-U is similar to WiFi distributed coordination function (DCF) mechanism, we derive the energy consumption model with slight modification on WiFi energy model in [38]. For a NR-U MD attempting to transmit parts of the task in a time slot, it has to contend with other incumbent WiFi devices. Therefore, the overall communication energy consumption can be written as where E NR−U i,sensing and E NR−U i,trans are the energy consumption of carrier sensing and transmission, respectively.
In addition, the energy consumption formula of carrier sensing includes transmission failures and retransmission, which is where ρ sensing denotes the carrier sensing coefficient, PDR is the packet successful delivery ratio, n Pkts is the total number of packets in one transmission, and, t sensing is the average sensing time.
In the formula of the average sensing time t sensing , N denotes the overall unlicensed competent user, m is the maximum time of retransmission back-off attempt. The average sensing time is thus formed as Besides, the transmission energy consumption can be derived by where R n i represents the unlicensed throughput for MD i.

H. COMPUTATIONAL PROCESSING ENERGY CONSUMPTION
As mentioned previously, the overall packet processing energy consumption model includes the communication energy and the local computation energy term (4). For E ij,comm , we have the two alternative terms E SC ij,comm , j > 0 and E NR−U i0,comm . Next, we will find out the local computing (E i,comp ) part.
For local computing energy consumption relies on the efficiency of the processor. In our video processing application, GPU would be the main processor which can provide parallel processing for computing acceleration. For real-time object detection, the video would be split into frames and fed into a convolution neural network for individual image processing. When a GPU performs video processing, the power consumption would remain stable without too much fluctuation due to the fixed neural network architecture, which can be represented by a fixed average power p i,comp . Thus, the computation energy consumption on MD i can be calculated by So, the total energy consumption for a video task partially offloaded from MD i by NR SC j (says E SC ij , j > 0), or by the VOLUME 8, 2020 ) to the MEC node are I. LATENCY OF 5G LICENSED AND 5G NR-U OFFLOADING To be simplicity, we only consider the uplink transmission because the downloaded processed data results are usually much smaller than the original uploaded data. Since the task is decomposed and computed in different places, the latency of MD i depends on the completion time of the last finished part. Thus, we have where T i,local indicates the time of local computing, and T i,remote is time for remote processing.
where f MD i is the computation capability of MD i. And, T remote i depends on the two bands choice which is similarly to the energy consumption part. First, for the licensed band transmitting, let S i denote the associated NR SC of MD i, the remote processing time can be expressed as where f MEC j and f cloud are the computation capability of MEC on NR SC j and cloud server respectively. R SC ij is the aggregated data rate between MD i and NR SC j. t rc is the transmission time of the backhaul wired connection between NR SC j and cloud server.
Similarly, for the 5G NR-U unlicensed band transmitting, the remote processing time can be expressed as

J. THE UTILITY FUNCTION
For computationally intensive object detection jobs, the battery life of devices could be one of the most essential issues.
If we let all the MDs offload their tasks to MEC through the RAT they prefer the most, it might cause the deficiency of communication resources, and some of the MDs may not be able to maintain their required QoS. Therefore, a suitable global offloading strategy is needed. We jointly consider the trade-off between energy consumption and attained QoS on MDs.
In terms of energy-saving, we define the energy-efficient (EE) utility function. Given a maximum budget E max i for an arriving task on MD i, the expected energy consumption E ij for task offloading from MD i though RAT j to MEC cannot }. Thus, we determine the normalized EE utility function of MD i and RAT j. That is, (17) Under the assumption that the promised communication reliability has been reached in 5G system, we consider object detection performance as the QoS of the application. Note that the detection performance relies on the intrinsic video resolution, where videos with higher resolution give higher detection accuracy. We use the utility term U Q i to denote detection performance of MD i, and take the well-known object detection metric to mean average precision (mAP) [39] to illustrate the detection accuracy. In brief, mAP refers to the detection success rate of each target object in one frame, which is directly connected to the resolution level and ranged in [0, 1].
Let Q i denote the mAP score that an MD gets in detection, with a higher score corresponding to higher accuracy. If a task of the MDs can be decomposed and finished within latency requirement with the assistance of the MEC server and cloud, it will take the whole mAP score as QoS utility reward. However, if a task cannot be finished in time, that is, not be served by any MEC or cloud server, the QoS utility reward will be zero.
Jointly taking EE and QoS into account, we determine the final utility function of MD i and RAT j as follows.
where α ∈ [0, 1]. There is a trade-off relationship between these two terms in (18). When the EE utility term is critical (says α is close to one), the MDs might tend to offload a portion of the task as much as possible. The network might face the resource shortage problem and not be able to serve all the MDs. Therefore, With the increment of weighting factor α, more MDs will be ditched and have zero scores on the detection performance, which results in the decrease of the entire system mAP. On the contrary, if the weighting factor α is low, saving power is not that critical. Therefore, the MDs are willing to process more portion of the task locally, and the number of the serving MDs increases. Finally, the entire system mAP increases as well.

IV. PROBLEM FORMULATION
Based on the APP profiles and the system model described in section III, we define the formal problem in this section. Regarding the ultra-dense industrial IoT heterogeneous networks, the MDs have |M| RATs to offload the task. However, both the licensed and the unlicensed radio spectrum are limited. For a specific RAT, the channel quality and throughput vary with the location and condition of the MDs. And, the intensity of the computational resource of the MEC nodes depends on the current traffic condition. Some MEC nodes may remain a bunch of computational resources, but they may not be the ideal nodes for task offloading in terms of some MDs due to the long communication delay and poor radio link quality of the corresponding RAT. Moreover, we should ensure that critical tasks can be offloaded successfully with satisfied QoS.
Let X = {x ij |x ij ∈ {0, 1}, i ∈ N , j ∈ M} denote the assignment matrix, where x ij = 1 means the task of MD i is offloaded to the MEC server and even to the cloud by NR SC j, otherwise x ij = 0. The main objective is to maximize the sum of the utility function of all the MDs (18) by allocating both communication and computation resources. The complete problem can be mathematically formulated is max X,W , y i i∈N ,j∈M In (19), constraint (19b) and (19c) ensure the latency and energy budget of all tasks. (19d) states the radio resource constraint, which means that the allocated subcarriers to all MDs cannot exceed the maximum available subcarriers in the corresponding RAT. (19e) illustrates the MEC capacity constraint. Similar to the assumption in [21], we assume that a single MEC server can serve C MEC j MDs at the time with fixed operating GPU frequency running on separating VMs. (19f) limits the transmission power of each MD. (19g) ensures that each MD must be served to finish its task, and, (19h) asserts the task distribution constraint.

V. PROPOSED SOLUTION
The original optimization problem shown in (19) can be seen as a typical generalized assignment problem (GAP), which is NP-hard [40] and even APX-hard, and has up to |M| |N | possible solutions. To reduce the computational complexity, we propose a heuristic method called 5G licensed-only & NR-U unlicensed-assistance opportunity-cost-based offloading algorithm (5G/NR-U OCBOA) to approach the optimal solution. The proposed method is based on the opportunity cost conception initially presented in economics. As Fig. 3 shows, there are three main steps in the proposed 5G/NR-U OCBOA mechanism. First, we start from 5G licensed-only opportunity-cost-based offloading algorithm (5G OCBOA) to determine the optimal task offloading and resource allocation solution in terms of 5G licensed-only scenario. Then, based on the solution of 5G OCBOA, we perform the NR-U unlicensed band assisted opportunity-cost-based offloading algorithm (NR-U OCBOA) to find out the MDs who may have some performance gains if we switch its serving band from licensed to unlicensed one, and further figure out their optimal task offloading and resource allocation results. Finally, once we determine the MDs who will be served by the NR-U unlicensed band, we execute 5G OCBOA again to forage out the optimal solution of the rest of the MDs who will be served by the 5G licensed band. Table 2 lists the notations used in the 5G/NR-U OCBOA. VOLUME 8, 2020 In Procedure 1, we find out the offloading profile of every possible MD and RAT pairs with the least resource consumption to accommodate more tasks in the system, that is, with the least carriers and computation resource. In addition, Procedure 1 also figures out the corresponding utility function U ij . Recap the latency function (13), T i can be calculated by solving the equation T i,local = T i,remote , and the utility value U ij can be easily obtained as well. Based on the outcomes of Procedure 1, in the main body, we calculate the corresponding offloading distribution results of each MD to the MEC and cloud server (i.e., for MD i, we obtain y i ), and the value of the novel type of utility functions on each MD and NR SC pair.
In the following, we introduce the three novel types of utility functions apart from the original one (18), says desirability fulfillment vector, to try to get better results compared to the original utility function if possible. When a task in MD i is assigned to the MEC node and the cloud server through 5G licensed NR SC j, the desirability fulfillment vector is denoted as f ij . In other words, value f ij gives a numerical incentive measure when NR SC j is chosen as a task offloading candidate RAT of MD i. The desirability fulfillment vector f ij are defined in three types, which are 1) the negative cost of the assignment (−C ij ) 2) the value-cost ratio of the assignment (U ij /C ij )

3) the throughput of the assignment (Thput)
The concept of negative cost (−C ij ) of the assignment is to select the task who has the fewest workload, that is, the required number of the subcarrier in NR SC j. Since in the objective function we maximize the utility function, when the objective function is minus workload, the MD who has the lesser workload has a higher probability to be selected. And, the value-cost ratio of the assignment implies that we jointly consider the original utility function (18) and the workload. Finally, the throughput of the assignment means we only consider the throughput as the selection criteria.
Then, given the selected desirability fulfillment vector type, we try to maximize the sum of the desirability fulfillment vector of all the MDs. Inspired by the concept of opportunity cost from economics, in the main body we iteratively consider all of the unassigned MDs (in the beginning all of the MDs are tagged unassigned), and search the MD, says MD i * , which has the largest gap (d i * ) between the largest and the second-largest desirability value f ij , i ∈ N in its feasible offloading NR SC candidate list; then MD i * is assigned to its most preferred NR SC j due to the largest value of d i * . Fig. 5 is an example about how we select the MD i * . If there is only one feasible NR SC that can satisfy the request of MD i, then the gap for MD i is ∞. After iteration, we get if U ij > U ij then 10: else 14: counter = counter + 1 15: if counter = |N | then 16: break 17: for i in N do 18: j ← {j| U ij = maxˆjU iĵ ,ĵ ∈ M, C iĵ Qˆj} 19: if j = ∅ then 20: the preliminary resource allocation results. The preliminary results can be further enhanced in Procedure 2.
Procedure 2 includes two main operations conducted to finish the fine-tuned resource allocation. The first part is to check if there is any possibility to get a better result by adjusting the preliminary NR SC assignment. That is to say, comparing to the original utility function (18), each of the MD will be checked whether they could be served by a more appropriate NR SC and have better-associated utility function. If it stands, then the MD would be assigned to the new NR SC. After finishing the first part, we turn to the second operation to allocate the remaining resources. In the previous operations (Procedure 1 and main body), some MDs might be blocked and not-served due to the relatively poor desirability fulfillment performance. After the first part operation, there might be some rooms for such MDs. Thus, in the second part, we further allocate the remaining resources to those not-served MDs by checking the original utility value if possible. Finally, we have the final task distribution and resource allocation results and finish 5G OCBOA.

2) ANALYSIS AND DISCUSSIONS
In the following, we prove that the adjustment in Procedure 2 will always converge. That is, the global equilibrium exists.
Proposition 1: The adjustment in the Procedure 2 will converge in finite steps.
Proof: We take the adjustment procedure as a decision game. Let N be the number of MDs, A be the set of action profiles (A i and A −i represent the sets of doing and not doing the action decision of MD i) and U be the global reward function. The whole game G can be expressed as In our problem, the global reward function is the sum of all separate utility gained from each MD. Note that the utilities defined between every pair of MDs and RAT nodes are mutually independent. Let u be the individual reward function u : A i , A −i → R. Then, the global reward function satisfies the property that The pervious property states that the utility change of individual MD directly equals the change of the entire system, which is the definition of exact potential game in game theory. Thus, the iteration process of the adjustment reaches a Nash Equilibrium state and converges to finite steps according to the potential game theorems [41], [42]. In NR-U OCBOA, we consider the unlicensed band as the complement transmission resources. With the initial licensed allocation result from 5G OCBOA, we construct an auxiliary matrix Y for unlicensed resource allocation. In the auxiliary VOLUME 8, 2020 Algorithm 1 5G Opportunity-Cost-Based Offloading Algorithm (5G OCBOA) Input: N , M, C ij , Q j ; Output: r, S i , N ; 1: Initialze: Call Procedure 1 to get the best offloading profile and corresponding utility U ij between every pair of RAT j and MD i. 2: Based on the utility U ij and the resource requirement C ij , derive the corresponding desirability measure f ij .

21:
N ← N \{i * } 22: Call Procedure 2 to further improve the allocation results. 23: return r, S i , N , Q j matrix Y , each row means the discrepancy of utility performance of a specific MD being switched from 5G licensed band to NR-U unlicensed band in different column scenario. The index of the column represents the number of the MD the unlicensed band will support. That is to say, the element Y (i.j) is the difference of utility value of MD i when there are j MDs being served by NR-U unlicensed band in total. For instance, the first column means that we consider the unlicensed band will only serve one MD. Under such a single-MD-served scenario, the throughput of each MD will be calculated by (3). Then, for j-th column, we sum up the first j largest values of MD utility gains. However, the selected MDs have to satisfy the condition that the received signal strength indicator (RSSI) between NR-U and the MD is higher than a threshold ζ . This condition ensures the feasibility of NR-U offloading. In j-th column, the number of the selected MDs might less than j due to the lack of eligible MDs, but this does not affect the results. After checking all the columns, we can determine the best NR-U assisted MD number and the actual MDs by finding the largest column sum of utility gains, and we accomplished our NR-U resource allocation. Fig. 6 gives an example of NR-U OCBOA. Besides, the pseudo code of NR-U OCBOA is depicted in Algorithm 2. if y i j >= 0 and RSSI between MD i and NR-U > ζ then 10: l ← l + 1 11: κ ← κ + y i j

12:
← ∪ {i } 13: if κ > κ then 14: κ ← κ Proof: First, we spend at most O(|N | 2 ) times checking utilities by iterating all columns and all elements in the auxiliary matrix. Then, we assign the NR-U RAT to those selected MDs which takes at most O(|N |) time. Therefore, the computational complexity is O(|N | 2 + |N |) = O(|N | 2 ). As mentioned in the beginning of this section (also shown in Fig. 3), when we finish the second step algorithm NR-U OCBOA, we only determine how many and which MDs are going to be served by the NR-U unlicensed band, Therefore, we need to perform 5G OCBOA again to make the final optimal decision for the rest of the MDs who are going to be served by 5G NR SC licensed band. The optimal decision includes the task distribution of each MD, and the licensed band and MEC computational resources allocation.
In the end, considering the entire system computational complexity, we combine the complexity of 5G OCBOA and NR-U OCBOA. The whole computational complexity is also O(|N ||M| log |M| + |N | 2 + c · |N ||M|). Again, if |N | |M| 2 , the computational complexity would be dominated by O (|N | 2 ).

VI. PERFORMANCE EVALUATION A. TESTBED SETTINGS
Consider a multi-MD and multi-RAT industrial scenario, the MDs and the MEC nodes are randomly distributed in a square area whose side length is 500 meters. To fit the Clear5G FoF scenarios, we select the elevated gNBs path loss model proposed by Nokia in [43]. The NR-U behavior aligns with WiFi. For the other scenario parameters, please refer to Table 3. For image recognition, we select three of the most popular graphics cards as the processor. The mobile devices, edge server, and the cloud are installed 940 MX, GTX 1060, and two GTX 1080Ti. The maximum processing power of the graphic cards is referred to NVIDIA data sheets 2 .
And, we adopted the state-of-the-art object detection and classification method called You Only Look Once 3rd Version (YOLOv3) [44]. YOLOv3 is an algorithm that can perform not only object localization but also objection classification. In real-time video streaming, YOLOv3 would split videos into frames and find all the objects and object classes in individual frames. In our simulation, we use the pre-trained YOLO-tiny model to detect five Chicago street video clips with nine classes (people, handbag, backpack, bicycle, car, motorcycle, bus, truck, traffic light) for classification. To verify the processing efficiency says frame per second (FPS), we run five-minute testing footage with 30 FPS and 1080P resolution on each graphic card 100 times and take the frame processing speed on average, respectively. There is one thing that should be noticed. When performing the image recognition, all the frames will be resized to 608 × 608 pixels as the input of the neural network. Therefore, the resolution level is not relevant to the processing efficiency, only to the precision quantity (QoS level). While simulating, we randomly gather 5-15 consecutive frames from the testing footage and give the resolution level arbitrarily.

B. COMPARED BENCHMARK ALGORITHMS
In our simulation, two other algorithms are provided as benchmarks, which are Utility-Greedy algorithm (UGA) and Device-First algorithm (DFA).
a. Utility Greedy Algorithm (UGA): In UGA, the system iteratively chooses the task with the largest utility from MDs and make it offloaded to the MEC and the cloud server through the RAT which has the best signal quality with until the entire system resource is exhausted.
b. Device First Algorithm (DFA): In DFA, each MD makes its best effort to tackle as many as the tasks and offloads the rest to MEC and the cloud server. The offloading sequence is randomly shuffled, which means every MD has an equal chance to get the transmitting and computational resources.
As for the usage of the licensed and unlicensed resource, we have two offloading strategies, the unlicensed resourcepreferred (UP) and licensed resource-preferred (LP) strategy. In other words, when both licensed and unlicensed spectrum resources are available, LP MDs will tend to use the licensed resource in priority, vice versa. Thus, we have four benchmark algorithms, UGA(UP), UGA(LP), DFA(UP) and DFA(LP).
For our proposed algorithm, we have three properties corresponding to the three desirability fulfillment vectors, the negative workload of the assignment (OCBOA(-w)), the value-cost ratio of the assignment (OCBOA(p/w)), and the throughput of the assignment (OCBOA(Thput)).

1) BLOCKING SITUATIONS
When the requirement of an MD cannot be satisfied due to the deficiency of the resources, we call the MD is blocked from the system service. Fig. 7a shows the blocking results under different number of MDs. As the number increases, the blocked number of MDs also increases. In general, the UP ones will be blocked more than the LP ones due to the contention and collision essence of unlicensed spectrum usage. And, the ones in UGA mode generally will be blocked the most since the greedy behavior will consume more system resources than that of in DFA mode, which leads to a large number of blocked MDs. On the contrary, the MDs in DFA mode consume the least system resources because they will do their best to finish the task as much as possible before offloading. Thus, the ones in DFA mode has the least blocking probability. For the proposed algorithms, we jointly consider the efficiency of licensed and unlicensed spectrum, which can increase the served number of MDs compared to the benchmark algorithms. Among three desirability fulfillment vectors, the one considering negative workload (-w) gives higher precedence to those tasks with light loading, so it can accommodate more MDs and hence result in lower blocking probability. On the other hand, the one that only takes throughput into decision consideration performs the worst.
That is because the higher throughput generally implies the more bandwidth requirement, always choose the highest one will consume the bandwidth resource the fastest. Fig. 7b investigates the blocking situation under different latency requirements. When we vary the E2E latency demand and make it more stringent, say, 10ms to 2ms, the MDs would require more resources to accomplish their tasks in time. Some MDs would be blocked due to the deficiency of the resources, leading to the increase of blocking number. There exists an asymmetric relation between Fig. 7a and Fig. 7b. That is, the greater the MD, the higher the blocking probability; the longer the latency requirement, the lower the blocking probability.
2) UTILITY Fig. 8a shows the system utility value under different number of the MDs. The system utility increases along with the increasing population of MDs in the beginning. However, after reaching a peak, each utility curve starts to decline. The reason is that the overall system resource is limited when the number of MDs goes larger, some of the MDs are going to be blocked and not served by the system. In such a situation, the offloading failure starts to happen and the blocked MDs have to tackle the whole tasks by themselves locally. Moreover, as mentioned previously, the blocked MDs cannot finish the task by themselves and get any accuracy rewards due to the limited capability, which causes a decline in utilities.
For the benchmark algorithms, UP is usually worse than LP due to the contention and collision on the shared unlicensed spectrum. The UGA MDs perform well when the number of them is low and the resource is sufficient; however, when the population of the MD increases, the system resource is exhausted quickly, and the serious blocking situation causes the low utility. Besides, the system utility of the DFA MDs outperform that of the UGA MDs when the number of MD is large. It is because each MD in DFA mode utilizes the least system resources, which makes the system in DFA mode can accommodate more MDs than in UGA mode, and therefore can cut down the utility loss on blocking events.
The outcomes of the proposed algorithm with the three desirability fulfillment vectors outperform the benchmark algorithms in utility performance since we jointly consider the usage of licensed and unlicensed resources. The MDs considering value-load ratio (p/w) as decision criteria can reach the best utility performance when the number of MD is small, and the MDs considering negative workload (-w) as decision criteria can alleviate the blocking situation and thus reach the best utility performance when the population of MD is large. Finally, the MDs in throughput considering mode OCBOA(Thput) consume more system resource which leads to an early drop on the system utility.  fulfillment vectors all almost outperform the benchmark algorithms. Considering the value-load ratio (p/w) as the selection criteria can reach relatively lower energy consumption when the population of MD is small since the MDs with higher utility and lower energy consumption tasks have a higher chance to be selected. On the other hand, considering workload (-w) as the decision criteria can make the system serve more MDs. Hence, when the number of MDs grows up, the overall energy consumption of the workload-considering case grows the least. In general, our best algorithm with workload mode (-w) can save more than 50% energy in comparison to the worst benchmark algorithm UGA(UP).

3) ENERGY CONSUMPTION
For the benchmark algorithms, the MDs have to additionally consume more energy for carrier sensing in UP mode. And, when the competitors for unlicensed band usage become more and more, the carrier sensing energy consumption also increases proportionally. Moreover, when there are many competitors contending for the unlicensed band resource, the throughput of each competitor would be lower and thus might not be able to satisfy its latency requirement, which causes the offloading failure, and the blocked MD must complete the task by itself. That is the reason that the energy consumption on UP is higher than that on LP.
Since in this work we consider the energy consumption of MD only, the MDs in UGA mode make good use of the remote computing resource and process the least task locally, the energy consumption is, therefore, the lowest when the number of MD is small. On the other hand, the DFA MDs process merely most of the tasks locally, so the energy consumption is larger than that of the UGA ones. However, when VOLUME 8, 2020 the population of MD increases, the system may not be able to serve all of the MDs and some of them are going to be blocked. So, the energy consumption is dominated by the severity of blocking situation since the blocked MDs have to complete the entire task locally. In summary, DFA consumes the least system resource and accommodates more MDs than UGA, it outperforms on the energy-saving issue when the number of MDs is large.

4) DETECTION PERFORMANCE
In terms of detection performance, we consider a new metric called average mAP (mAP). In mAP we take the average mAP of all the MDs, that is Recap the definition of mAP, when an MD is served by the system successfully, it will get its precision reward; otherwise, it will get nothing.
In Fig. 8c, we fixed the resolution level and check the systematic mAP value. When the system is able to serve all of the MDs with sufficient resources, mAP increases. However, when the system resource is exhausted due to the overloading demand for videos with higher resolution, i.e. 1080p, some of the MDs will be blocked. In such a case, those MDs cannot fulfill their QoS requirements, which results in the zero corresponding precision reward and the decline of the average mAP value. Because of the least communication and computation resources requirement, DFA(LP) can accommodate more tasks and reach a higher mAP level even though the cost of computational energy consumption is high. However, our proposed algorithm under both value-load ratio (p/w) and workload (-w) mode reach satisfied mAP level, and with almost the best energy-saving performance which is already depicted in Fig. 8b. Besides, Fig. 8c shows that the proposed algorithm with workload mode (-w) can support the best mAP performance on all levels of video resolution, and 720p is the most suitable video resolution for object detection under this configuration.

5) EFFECT OF α
Recap (18), our utility function has one weight factor α which can be adjusted. Shown in Fig. 8d, we use value-load (p/w) as the decision criteria and change the weight value α. In addition, we fix the number of the frame to be ten in one task with different resolutions, such as 240p, 360p, 480p and 720p. Fig. 8d shows how weight α affects the system power consumption and the average detection accuracy (mAP). α being close to one implies that the MDs put more emphasis on the energy consumption term than the detection performance term. With the high value of α, the MDs tend to offload more portion of the tasks since local processing is more power costly than transmitting. In such a case, the network communication and remote computation resources (at the edge and the cloud) will be consumed quickly. Therefore, the high value of α results in that some MDs will be blocked out and cannot get any mAP score, which leads to the decreasing of the entire system mAP performance.
Besides, when we decrease the value of α, the mAP score becomes more important than energy consumption. The MDs thus will try to find the safest offloading strategy to ensure the task is completed and getting the mAP score. Therefore, we can obtain a higher entre system mAP score at the cost of higher system energy consumption. That is the reason that the average accuracy (mAP) is proportional to energy consumption.

VII. CONCLUSION
Based on the requirements of the industrial use case, we adopt the state-of-the-art object detection framework YOLOv3 to emulate the real object detection application in the factories of the future. By transforming the offloading decisions into a QoS-optimization problem with a set of real physical constraints, we find out an energy-efficient and QoSsatisfied solution in an effective manner with our proposed opportunity-cost-based offloading algorithm.
Simulation results show that with better coordination of local, remote computation resource and licensed, unlicensed transmission resources, the proposed algorithm almost outperforms the other baseline algorithms on resource allocation with lower energy consumption and task blocking probability. The proposed algorithms can also support the offloading of high-resolution video streaming and reach high objection detection quality, which is imperative in factory production. Furthermore, by adjusting the weighting of energy consumption and precision reward, we can optimize the entire system utility based on the different system design purposes and requirements. As a follow-up of this study, we plan to consider task offloading for more than one applications on a single MD. It is also recommended to take other QoS requirements such as service continuity into account for a more comprehensive and realistic IIoT scenario.
YUNG-LIN HSU (Graduate Student Member, IEEE) received the M.S. degree from the Department of Communications, Navigation and Control Engineering, National Taiwan Ocean University, Taiwan. He is currently pursuing the Ph.D. degree with the Graduate Institute Communication Engineering, National Taiwan University, Taiwan. He is also with the European Union's Horizon 2020 Clear5G Project. He has dedicated himself to multiaccess edge computing (MEC) task offloading schemes in 5G and beyond heterogeneous networks, such as the industrial IoT (IIoT) data exchanging scenarios. His research interests include OFDM techniques, 5G NR and beyond wireless communications, 5G NR and beyond initial access, vehicular to everything communications (V2X), and smart factory infrastructure communication design.