On Offloading Decision for Mobile Edge Computing Systems Considering Access Reservation Protocol

For several years, mobile edge computing (MEC) has been highlighted as a promising technique to support emerging computation-intensive applications in cellular networks, e.g., 5G and 6G. Most previous studies have mainly focused on jointly optimizing communication (i.e., radio) and computation resources to improve offloading performance without considering specific communication protocols required for operating the MEC systems in practice. In this article, we newly design a contention-based access reservation protocol (ARP) for efficiently supporting simultaneous task offloading requests from a number of edge devices (EDs), and further define necessary signalings required to determine an optimal offloading factor. Thereafter, we formulate an optimization problem to find the optimal offloading factor that minimizes the task-completion latency. Through simulations, we mainly evaluate the task-completion latency performance of the MEC system incorporated with our proposed access reservation protocol under a multiple-input multiple-output (MIMO) environment. Particularly, we thoroughly investigate several practical aspects such as the effect of the number of edge devices attempting to join the task offloading, the number of antennas equipped at the edge device, and the amount of reserved radio resources required for configuring access reservation protocol on the optimal offloading decision. From the results, we verify the validity of our approach, and provide meaningful insights regarding how the network should be configured to fruitfully exploit the offloading with the MEC systems.


I. INTRODUCTION
With the rapid proliferation of various Internet-of-Things (IoT) devices such as wearable devices, smart sensors mounted on autonomous vehicles, and automated actuators in smart factories, unprecedented new types of computation-intensive applications requiring real-time and fast processing of a large amount of data are constantly emerging. Even though recent IoT devices have relatively The associate editor coordinating the review of this manuscript and approving it for publication was Tiago Cruz . powerful computation capabilities compared to traditional energy-hungry devices, they often need more computation resources to fulfill their own missions while satisfying stringent requirements on task-completion latency performance. To meet such demand, a notion of mobile edge computing (MEC), which is a distributed computing paradigm deploying high-power computing resources near network edges such as the device itself and the base station (BS), has attracted great attention in the field of communication networks.
Under the network architecture incorporating with the MEC server, the IoT device expects to perform task offloading (i.e., computation offloading), namely, transferring a fraction/entire of its task to the MEC server, and requesting remote computing at the MEC server. Due to its advantages from the perspective of energy consumption and latency, it has been consistently highlighted as one of the top 10 innovative technologies by Garter since 2018. Consequently, several challenging issues remain open to the industry and academia. The main issue of importance is optimizing two kinds of resources, namely, communication (i.e., radio) and computation resources.
Most previous studies have focused on jointly optimizing them under either energy consumption constraint or latency constraint [1], [2], [3], [4], [5]. Among them, there have been enormous studies to intensively address the task-completion latency issue [2], [3], [4], [5]. Particularly, Feng et al. [2] jointly optimized the computation and radio resource allocation problem to minimize overall task-completion latency. Pursuing a similar objective, Tan et al. [3] considered collaborative MEC systems where the computing power of idle edge devices as well as the MEC server is also collaboratively utilized to improve the offloading performance. Moreover, Yang et al. [4] employed a value iteration algorithm (VIA) to select an optimal offloading node (i.e., server) from a pool of heterogeneous edge servers taking into consideration the available network bandwidth and location of the mobile devices to minimize the offloading time. Zhang et al. [5] proposed a small area-based parallel task offloading model where each task generated by the mobile devices is offloaded to the corresponding servers for parallel execution to achieve ultra-low latency requirements.
Naturally, the offloading incurs extra communication latency during signaling exchanges between the devices and the BS integrated with the MEC server. This becomes much severer when considering the massive IoT scenario in cellular networks such as 5G and 6G. One of the challenging issues is to efficiently handle simultaneous multiple offloading requests from a number of internet-of-things (IoT) devices in practice [6].
To this end, efficient design of protocols and relevant signalings is required, but most previous studies have not handled this important aspect yet, to the best of our knowledge. Particularly, Ke et al. [6] investigated a grant-free massive access (GFMA) scheme and demonstrated its effect on the offloading performance, which may be a reasonable approach when the size of the task is not large. However, when the data size becomes large, a grant-based (i.e., scheduling-based) offloading scheme becomes preferable since the task offloading cannot be completed within a few consecutive time slots. In such situation, the role of access reservation protocol for grant management becomes more important.
Motivated from this, we newly design a contention-based access reservation protocol (ARP) for efficiently handling simultaneous task offloading requests from a number of edge devices (EDs). Furthermore, we newly define several control signalings required to determine the optimal offloading factor. We thoroughly investigate the task-completion latency of the MEC systems incorporated with our proposed ARP under a multiple-input multiple-output (MIMO) environment. The main contributions of our work can be summarized as follows: • We newly propose a MEC framework consisting of contention-based access reservation protocol (ARP) and scheduling-based offloading, which is especially suitable for supporting IoT scenarios.
• We provide an offloading decision strategy, which finds an optimal offloading factor to minimize overall task-completion latency (See Algorithm 1).
• Through extensive simulations, we investigate the effect of several important control parameters on the task-completion latency performance: the number of edge devices attempting to join the offloading, the number of antennas equipped at the edge device, and the amount of radio resources reserved for configuring our proposed ARP.
The rest of this paper is organized as follows. In Section II, we introduce related work. In Section III, we describe our system model including communication model, local and offloading computing models. In Section IV, we newly propose an access reservation protocol tailored for the MEC systems. In Section V, we provide an offloading decision strategy for the MEC systems considering our proposed ARP. In Section VI, we present several numerical results. Finally, we draw conclusions in Section VII.

II. RELATED WORK
The notion of MEC has attracted great attention recently, and there is no doubt that it will play an important role in the next generation of cellular systems. To satisfy various requirements, the industry and academia have made an enormous effort to perform extensive research on exploiting the MEC in commercial communication networks, e.g., the 5GX-MEC service of SK Telecom in the Republic of Korea.
As mentioned before, literatures [2], [3], [4], [5] have mainly focused on minimizing the task-completion latency. In addition to them, several studies have been performed to optimize both the communication and computing resources for minimizing energy consumption or task-completion latency under various system models. Due to the deep penetration of unmanned systems in communication architectures recently, there have been research trend to mount the computing capability to the unmanned vehicles such as high-altitude platform [6], [7] and low-earth orbital (LEO) satellite [8], and solve various optimization problem [9], [10]. Particularly, Tun et al. [9] considered the unmanned aerial vehicle (UAV)-assisted scenario and jointly optimized task offloading decision, resource allocation, and UAV trajectory while considering the communication and computation latency requirements. Sun et al. [10] optimized the three-dimensional deployment of the UAVs in a multi-UAV-enabled MEC system to minimize the total time required for the UAVs to complete the offloaded task [10]. In [11], Zhang et al. investigated a UAV-assisted MEC system consisting of a flying UAV and a ground base station both equipped with computation resources. They designed a game-theoretic computation offloading (GTCO) scheme to minimize the weighted cost of time latency and energy consumption under the constraints of offloading decisions and resource competition.
The offloading can be interpreted as consecutive uplink transmissions to transfer tasks to the MEC server. The event that multiple devices request offloading at the same time may occur, and, thus, several researchers have considered the user association problem, specifically selecting a group of devices to get the opportunity for offloading [12], [13], [14], [15], [16].
Dong et al. [14] investigated the MEC system's effective offloading rate and latency in a resource-limited carrier sensing multiple access with collision avoidance (CSMA/CA) network considering the transmission outage and medium access control (MAC)-layer channel competition. Chen et al. [17] proposed a channel-reserved MAC protocol for a centralized edge computing-based IoT network to reduce collisions and response latency. Xing et al. [15] optimized task assignment along with time and power allocation to minimize the computation latency in a multi-user cooperative MEC system using the time division multiple access (TDMA) communication protocol. In [16], Ding et al. introduced the use of non-orthogonal multiple access (NOMA) in a MEC system to efficiently reduce latency and energy consumption during offloading.
Considering IoT networks in a dynamic environment, Mohammed et al. in [18] proposed a new architecture for a blockchain-based multi-UAV-enabled MEC for secure computational offloading and resource allocation, and utilized a deep reinforcement learning method to handle the complex computation offloading and resource allocation problem. Han et al. [19] studied a mobile cloud computing system consisting of a single mobile device and a cloud computation platform, and proposed a theoretical framework to analyze the outage performance of the computation offloading. Zheng et al. [20] formulated the problem of multi-user computation offloading for mobile cloud computing in a practical dynamic environment as a stochastic game, and demonstrated that the formulated stochastic game was similar to a weighted potential game with at least one Nash equilibrium (NE). They then proposed a multi-agent stochastic learning algorithm with a guaranteed convergence rate to achieve the NE. Fig. 1 describes a system model we considered. We consider a single base station (BS) integrated with a mobile edge computing (MEC) server which plays a key role to provide high-performance computing resources to the resource-demanding edge devices (EDs) in proximity. 1,2 The BS is also equipped with multiple J antennas where j represents the antenna index at the BS. Let J denote a set of indices indicating the BS's antenna where J = {j | 1, · · · , J } and |J | = J . Thanks to the multiple antenna configuration at the BS side, spatial diversity gain or spatial multiplexing gain can be exploited during uplink transmissions (e.g., task offloading) according to the antenna configuration at the ED side.

III. SYSTEM MODEL
We consider multiple EDs having on-board computation capability. For simplicity, the computation capability of each ED is assumed to be the same, and that of the ED k is denoted by f k . Let K denote the number of EDs which are possible candidates for joining the task offloading to the MEC server. Each ED is equipped with I antennas where i represents the antenna index at each ED. Let I denote a set of antenna indices at the ED side where I = {i | 1, · · · , I } and |I| = I .
We consider a scenario where each ED generates a relatively large volume of data (e.g., 100Mbits, 5Gbits). Thus, even though the computation capability of the ED is sufficient enough, partial/full offloading is sometimes essential due to stringent requirements such as task-completion latency performance. We assume that the generated data can be partitioned into subsets of any size and can be executed at the ED and the MEC server in parallel [21]. Each ED will establish a connection with the BS using the access reservation protocol (ARP) which will be further explained in Section IV, and exchange several signalings to acquire fundamental system information required for decision-making on the offloading.

A. COMMUNICATION MODEL
When each device transfers its task to the MEC server, the use of a wireless channel is necessary. Let H k = h k ji (∈ C J ×I ) denote a channel coefficient matrix from the ED k to the BS, where h k ji (∀j ∈ J , ∀i ∈ I) is the channel coefficient from the i-th antenna at the ED k to the j-th antenna at the BS, which is the element of the j-th row and i-th column of H k . We consider a block fading model where the channel state is preserved constant within a basic radio unit, i.e., 1 ms, and a Rayleigh fading where h k ji ∼ CN (0, 1) is considered as a small-scale fading model. Note that we do not consider large-scale fading since each device can perform power control, i.e., open-loop/closed-loop power control before/after connection establishment, to compensate for the large-scale path loss. Transmit signal-to-noise ratio (SNR) of the ED k is denoted by SNR k , where SNR k ≤ SNR max and SNR max is the maximally achievable SNR when the device applies the maximum transmit power.
The wireless system should configure the radio resource to enable wireless communications. We especially consider the orthogonal frequency division multiple access (OFDMA) system. Let B denote the total bandwidth allocated to the MEC system, and the radio resource is used in a fully orthogonal manner. 3 Our proposed framework expects that a fraction of bandwidth should be reserved to configure the access reservation protocol which operates in a contention-manner, and the rest of bandwidth will be used for scheduling-based uplink transmissions for task offloading. Thus, let B ARP and B OFF represent the bandwidth reserved for the ARP and the bandwidth reserved for the uplink transmissions during task offloading to the MEC server, respectively, where B = B ARP + B OFF . We assume that the basic radio unit occupies B REF in frequency domain and 1 ms in time domain. Thus, 4 Note that the performance trade-off can be observed according to the frequency bandwidth separation status between B ARP and B OFF .

B. ON-BOARD LOCAL COMPUTING MODEL
Let α k and β k denote the size of the original data generated from the edge device k and the offloading factor (or, equivalently, offloading ratio) determined by the edge device k, respectively. The total amount of data to be locally computed can be calculated as α k × (1 − β k ), and the latency required for local computing can be calculated as: where f k is the maximum computation capability (cycles/sec) of the edge device k and c k is the required CPU cycles processing 1-bit calculation of the edge device k. Note that the local computing performance highly depends on the amount of remaining task to be processed locally as well as the 3 Even though multiple devices simultaneously perform offloading in the uplink direction, the bandwidth reserved for the task offloading should be shared by the participants. However, thanks to the scheduler, e.g., proportional fair (PF) scheduler, adopted at the BS, the interference among devices can be well managed. Note that the PF scheduler can be also effective to prevent a channel capturing from a malicious selfish device. 4 In commercial LTE/5G systems, a resource block (RB) occupies 180 kHz and 0.5 ms in frequency and time domain respectively, and, thus, the basic radio unit in our system model is the same with two consecutive RBs, i.e., a single RB pair, which is the minimum resource to be scheduled/allocated to the mobile terminals in LTE/5G. device's capability. Especially, T l k is a function of β k , where the optimal value will be determined after comparing with the expected time spent the offloading and remote computation at the MEC server. We sometimes use the expression of T l k (β k ) when the explicit expression of β k is required (see (12) and (13)).

C. OFFLOADING AND REMOTE COMPUTING MODEL
If the value of β k is non-zero, the edge device transmits a fraction of computation task to the MEC server via wireless channel. The instantaneous channel capacity between the edge device k and the BS at time slot n can be expressed as: where B k [n] (≤ B OFF ) is the bandwidth assigned to the edge device k at time slot n, SNR k [n] is the transmit SNR at time slot n, and H k [n] is the instantaneous wireless channel coefficient matrix at time slot n. It is worth noting that the amount of the bandwidth allocated for the edge device k depends on various factors such as the scheduling policy adopted by the BS, the number of devices sharing radio resources, etc. 5 The total amount of data to be offloaded to the MEC server via wireless channel can be expressed as α k × β k , and the number of uplink transmissions until the completion of the offloading is n OFF−TX k , which is the minimum value of n satisfying In this case, the latency required for transmitting the partial/full task to the BS (i.e., offloading) is n OFF−TX k ms with the time slot configuration of 1 ms.
Since the amount of data processed at the MEC server is α k × β k , the latency required for remote computing at the MEC server can be calculated as: where f s , η k , and c s represent the maximum computation capability (cycles/sec) of the MEC server, the computation resource utilization efficiency of the edge device k, and the required CPU cycles processing 1-bit calculation of the MEC server, respectively. 6 If we assume that the computation capability of MEC server is evenly shared among K edge devices, then η k in (4) can be simplified as η k = 1/K . Consequently, 5 For example, if the scheduler assigns the equal radio resource (e.g., bandwidth) to the entire K devices then B k [n] can be calculated as 6 If the BS is equipped with κ MEC servers, the computational capability can be improved up to approximately κ in an ideal situation. In our paper, we consider a single server scenario, i.e., κ = 1, to intensively focus on the effect of our proposed ARP on the offloading decision, and the result can be readily extended to multi-MEC server scenario. the time duration for offloading and remote computation at the MEC server is It is worth noting that (5) is also a function of β k . Thus, we sometimes use the notation of n OFF−TX k (β k )+T s k (β k ) when we need to explicitly represent β k (see (11), (12) and (13)).

IV. ACCESS RESERVATION PROTOCOL FOR MEC SYSTEM
In this section, we emphasize the necessity of the access reservation protocol (ARP) and propose an ARP particularly tailored for operating computation offloading in the MEC systems. Generally, each user equipment (UE) or mobile terminal including internet-of-things (IoT) device is not always in a connected state with the BS and stays in out-ofconnection (i.e., idle) state to prolong its operation lifetime. Moreover, EDs with sufficient computing capability have no reason to always maintain a connection with the BS since they are willing to primarily attempt on-board local computing and then proceed to attempt offloading to the MEC server in proximity if more powerful computing resources are required.
This implies that even though each ED determines to attempt offloading to the MEC server, it should re-establish connection with the BS in advance before the actual transmissions of a fraction of the task to the BS take place. 7 The role of the connection re-establishment is important but most researchers do not consider this practical aspect.
To elaborate on the ARP, we consider a contention-based two-step protocol (i.e., multi-channel slotted ALOHA protocol) since the contention-based approach is utilized when the system cannot configure a number of dedicated resources for the entire devices. 8 The proposed ARP will be adequate for supporting simultaneous multiple access requests of a number of devices with a small amount of resources. The detailed explanations of each of the steps are as follows: 1) (Step 1) Access Reservation Request (ARR) : In this step, each ED transmits the access reservation request (ARR) message to the BS through a randomly selected ARP resource among N ARP resources. 9 In order to achieve a two-step approach, the edge device identifier (EDID) should be necessarily included in the ARR message to let the BS know which device wants to attempt offloading, and a control message such as connection request message should be included. 10 Once 7 Even though a certain ED does not perform any offloading to the server, the connection re-establishment is also required for reporting the local computing result to the application server via the BS. 8 The random access (RA) procedure is the protocol in the commercial cellular networks such as LTE/5G aiming the same purpose. But, it can be classified into a variation of a multi-channel slotted ALOHA protocol, which consists of 4-steps of handshaking. 9 Multiple N ARP resources per time slot can be configured, and they occur periodically. Without loss of generality, we consider a period of 1 ms, which implies that there exists N ARP resources in frequency domain per every time slot. 10 We do not specify the message format, but aforementioned two fields are mandatory and others are remained as optional.
the ED transmits the ARR message, it starts the contention resolution (CR) timer.

2) (Step 2) Access Grant Response (AGR):
The BS attempts to decode the received signal through ARP resources. If there exists any successfully decoded ARR message, then the BS broadcasts the access grant response (AGR). The AGR message should contain the EDID and uplink grant (UG), where the EDID field is required to identify the destination of the message and the UG field is required to indicate the location of uplink resources to be subsequently used for additional signalings.
If the ED successfully receives an adequate AGR message before the CR timer is expired, it regards the access reservation as successfully completed and subsequent signalings can then be performed via granted resource. Otherwise, if the ED cannot receive any AGR message until the CR timer expires, it regards the ARR message as not successfully decoded at the BS. 11 In this case, the ED performs backoff procedure and reattempts the ARP at the next-available opportunity.

V. OFFLOADING DECISION IN MEC SYSTEM CONSIDERING ACCESS RESERVATION PROTOCOL
In this section, we elaborate on the operating mechanism of the MEC system incorporated with our proposed ARP. Thereafter, we explain an offloading decision strategy determining the optimal offloading factor by formulating an optimization problem minimizing the overall task-completion latency.
A. OPERATION PROCEDURE Fig. 2 shows the overall operation procedure of the MEC system incorporated with the ARP, and the detailed explanations will be followed. Once the data is generated at the ED side, it should determine whether to proceed with the offloading or not, and the amount of data to be transferred to the MEC server if so. At the same time, the ED should prepare for the worst-case scenario where the ED does not perform offloading and computes the task locally whatever the reason is. This implies that the ED should start to perform the local computing once the data is generated regardless of the final decision on the offloading. In order to make a decision on the offloading, the ED needs some system information such as the current wireless channel condition from the ED itself to the BS, and the number of EDs sharing the resources (i.e., wireless channel and computation resource of the MEC server). To this end, each ED and BS follows three-phase procedure as follows: 1) (Access Reservation Phase) Each ED should perform the ARP since most of the EDs are expected to be 11 The ARR message cannot be decoded due to the collision among multiple devices using the identical ARP resource, or due to the poor wireless channel condition between the device and the BS.  out-of-connection with the BS as mentioned in the previous section.

2) (Decision-Making Phase) It exchanges several signals
to acquire the required system information. 12 Based on the acquired information, each ED calculates the optimal offloading factor and determines whether to proceed with the offloading or not. Thereafter, the ED notifies the result of offloading decision to the BS.

3) (Offloading & Remote Computing Phase)
If the offloading factor is not zero, then the ED proceeds with offloading and remote computing. Note that there exist two events that the ED terminates the offloading attempt and performs local computing only: 1) the ED cannot succeed in its access reservation attempt, and 2) the optimal offloading factor came out as zero.

B. DECISION ON THE OFFLOADING FACTOR
To determine the optimal offloading factor, the ED should consider the several components affecting the latency performance, such as the ARP, the extra signalings for acquiring system information, the offloading to the MEC server, and the remote computation at the MEC server.
During the access reservation phase, the primary factor that affects the latency is the collision caused by simultaneous access attempts from multiple edge devices. Particularly, when two or more devices transmit their ARR message on the identical ARP resource at the same time, no request message can be successfully decoded due to the packet collision. The collision probability, p c , can be expressed as 12 Without the connection establishment, it may be the best to accurately estimate such information at the ED side. Even though extra time for exchanging signals incurs, it is more important to avoid inaccurate estimation on the offloading factor since most of latency occurs during the transmission of a fraction of the task to the BS (i.e., offloading).
where N ARP and K represent the number of reserved resources for the ARP (i.e., the number of ARP resources) and the number of EDs joining the ARP. Thus, the time spent for the ARP of the ED k, T A k , can be expressed as where m is the number of ARP attempts until the success of the ARP, where m ≤ M max and M max is the maximum value allowed for the ARP attempts. In addition, T S , T B and T CR represent the time duration for completing the successful ARP, the time duration for performing the back-off procedure, and a pre-defined contention resolution (CR) timer which is the required time to recognize the ARP failure at the ED side, respectively. Note that T B and T CR are effective when the ARP attempt becomes a failure at least once, i.e., m > 1. The event that the ED cannot succeed in its ARP attempts within M max implies that there may be a large number of EDs trying to share the radio resources. In this case, the ED indeed cannot get any required system information from the BS since it fails to establish a connection with the BS. It immediately terminates the offloading attempt and focuses on its local computing (i.e., case #1 in Fig. 2).
On the contrary, in the event that the ED succeeds in its ARP attempt, it requests system information required for deciding the offloading factor such asK andH k . LetK and H k denote the number of devices sharing the resources (e.g., radio and computation) and the current channel condition of the device k, respectively.
Let T o k denote the time spent for the offloading and remote computation at the MEC server. WithK ,H k , and α k , the ED k estimates T o k for varying the value of β k , i.e., 0 ≤ β k ≤ 1, as follows: and where T DM k represents the latency required for the extra signaling for acquiring system information, and f (·) is a function that estimates the time spent for the offloading and remote computation at the MEC server when β k of α k is transferred to the server under assumption ofK andH k . Since T A k and T DM are known and deterministic, respectively, T o k (β k ) in (8) is mainly affected by the selected offloading factor, β k .
With a given value of β k ,ñ OFF−TX k times uplink transmissions are expected to be required which is the minimum value It is worth noting that (10) is similar with (2), but we have to rely on a partial prioriy information such asK andH k and additional assumptions during the decision-making phase. The additional assumptions applied to (10) is as follows: • Each ED is assumed to be equally allocated the bandwidth, i.e., B k [n] = B OFF K .
•K is assumed not to be changed during the offloading.
• The maximum SNR, SNR max is considered regardless of the time slot. Now, we can represent f (β k ;K ,H k , α k ) in (8) as Finally, the ED can formulate an optimization problem to determine the optimal offloading factor, β ⋆ k , which minimizes the overall task-completion latency. The optimization problem (P1) can be formulated as follows: Note that β ⋆ k = 0 implies that the ED k decides not to perform offloading and depends on the local computing only (i.e., case #2 in Fig. 2).
It is natural that the situation on the number of devices and the amount of radio and computation resources is changed. Thus, even though β ⋆ k is the best decision at the time when the decision is made, the actual achieved task-completion latency may be changed. Specifically, the time spent for the offloading and remote computing at the MEC server may be changed since the channel status varies at every time slot and the number of edge devices joining the offloading dynamically is changed.

VI. NUMERICAL RESULTS
In this section, we present several numerical results to demonstrate the performance of the MEC systems incorporated with the ARP. Furthermore, we consider a MIMO environment. Specific simulation parameters are listed in Table 1. Fig. 3 shows examples on the offloading decision in several scenarios when the data size is set to 100 Mbits. Within the entire subfigures in Fig.3, we especially consider N ARP = 5, N OFF = 45, J = 8, and SNR = 20 dB. In Fig. 3a, we present the result under a normal situation where two EDs with a single antenna attempt to offload their task, i.e., K = 2 and I = 1. As the offloading factor increases the time required for the on-board local computing decreases since the offloaded task increases and accordingly the amount of remaining task decreases. On the contrary, as the offloading factor increases the time required for offloading and remote computing at the MEC server increases since the amount of offloaded task to the MEC server increases. Note that most of the latency during the offloading and remote computing at the MEC server is caused by the time duration for performing offloading (i.e., uplink transmissions), not the time duration for the ARP nor the processing time at the MEC server. Accordingly, there exist a point where two lines intersect, which is the optimal point that can achieve the minimum value. Particularly, in Fig. 3a, the optimal offloading factor is approximately 0.3. This implies that 70% of the task is computed locally and the remaining is computed remotely, but in parallel, which results in 0.7 sec for task-completion. If the device performs fully local computing or remote computing, the latency becomes 1 sec or 2.3 sec, respectively.
The effect of the number of EDs on the decision of offloading factor can be found in Fig. 3b, and we have considered an overloaded situation where K = 8 and I = 1. The same trend in the latency required for on-board computing can be found since it only depends on the remaining amount of the task. But, several observations can be found in the latency required for the offloading and remote computing at the MEC server. The first one is that much more time is required for the completion of access reservation since the contention becomes severer due to the increased number of contending participants, i.e., K : 2 → 8. The second one is that even though the ARP is successfully completed, the time duration for performing offloading also increases since the number of devices sharing the given amount of radio resources during the offloading increases. This results in a steep increase in the latency as the offloading factor increases. Note that even though the computing power of the MEC server is sufficient enough but it should be also shared among multiple devices. Accordingly, the processing time at the MEC server also shows a slight increase due to the resource sharing. In the overloaded situation, consequently, each edge device makes the following decision: reducing the amount of offloaded task to the MEC server is the better choice to reduce overall taskcompletion latency.
The effect of the number of antennas at each device on the decision of the offloading factor can be found in Fig. 3c, and we have considered a normal situation, i.e., K = 2, but each device has multiple antennas, i.e., I = 8. The number of EDs are the same with that in Fig. 3a, and, thus, the time duration for the ARP and the processing time at the MEC server shows similar trends with those in Fig. 3a. However, we should focus on the time duration required for performing offloading, where much lower latency can be achieved compared to those in Fig. 3a and Fig. 3b. This is due to the spatial multiplexing gain which can be achieved in the MIMO channel. Consequently, in a normal situation with more antennas equipped at the device side, increasing the offloading factor so as to transfer much more tasks to the server is the better strategy for the device itself. Fig. 4 shows examples of offloading decisions when the data size is given by 5 Gbits. Regardless of the scenario, all graphs show a similar trend with those in Fig. 3. This is because our MEC framework is operated in a scheduling-based manner after the completion of ARP. Even though the latency performance in ARP may depend on the contention status, the latency performance during the offloading and remote computing at the MEC server highly depends on the size of the original data. In other words, the time spent during the offloading and remote computing at the MEC server increases in proportion to the original data size. Fig. 4 considers 50 times larger data size compared to Fig. 3, and, thus, the achievable task-completion latency becomes approximately 50 times longer. It is worth noting that the offloading factor is slightly changed especially in Fig. 4b, since the contention status is changed in this experiment. The portion of the time spent in ARP among the final task-completion latency becomes negligible when the larger data size is considered, which implies that the scheduling-based offloading after contention-based ARP is reasonable approach. Fig. 5a shows the effect of the number of EDs, K , on the task-completion latency performance for several combinations of I and N ARP when J = 8 and SNR = 20 dB. As K increases, the latency increases and converges to the value of 1 sec regardless of I and N ARP values, which is the latency achieved when the device performs the on-board local computing only without offloading. When the radio resource reserved for the ARP is set to any given value, e.g., N ARP = 2, 5, and 10, it is natural that the contention during the access reservation phase becomes severer as the number of contention participants, K , increases. With the larger value of N ARP , the more devices can exploit remote computing at the MEC server via partial offloading, which implies that each device can achieve the latency less than 1 sec. When each ED is equipped with multiple antennas, the time duration required for performing offloading can be dramatically reduced (see Fig. 3c) and thus much better latency performance can be achieved. The corresponding offloading factors can be found in Fig. 5b, where the offloading factors decrease and converge to zero as more EDs attempt to participate in the contention-based ARP for the offloading. Fig. 6a shows the effect of the number of transmit antennas equipped at each ED, I , on the task-completion latency for several combinations of J and K when N ARP = 8 and SNR = 20 dB. In all cases, the values of offloading factor are not zero, and, thus, each ED determines to offload any fraction of its task to the MEC server. Comparing the cases having the same value of J , the effect of the number of devices sharing both the radio and computation resources can be investigated. Obviously, the reduced latency performance can be found with the smaller K value. Furthermore, the effect of the number of received antennas at the BS can be investigated comparing the cases having the same value of K . It is worth noting that the spatial multiplexing gain is limited by the minimum value between the number of transmit antennas and the number of receive antennas. Accordingly, when J is less than I , the spatial multiplexing gain cannot be fully exploited, and, thus, the slope of the latency shows a steep decrease beyond a certain point, i.e., I > J . For exploiting the antenna technique in the spatial domain, more antennas should be equipped at the receiver side, i.e., the BS, in this case.
When J = 8, the latency shows gradual decreases as I increases from 1 to 8, where I ≤ J always holds. Even though the larger value of J is considered, i.e., J = 16, the performance cannot be dramatically improved since the number of transmit antennas becomes a bottleneck in this case, but a slight diversity gain can be effective to improve the link quality during the task offloading. This consequently contributes to the improvement of the latency performance. It is worth noting that to well exploit the high-power computing capability at the MEC server, it is most important to equip sufficient antennas at the ED side under the assumption that the antennas at the BS are sufficient enough. Even though the BS is equipped with hundreds of massive MIMO antennas only the diversity gain can be exploited with the devices with a single antenna. The corresponding offloading factors can be found in Fig. 6b, where the offloading factors show a tendency to increase as more transmit antennas are equipped at the device side. Fig. 7a shows the effect of the number of the ARP resources, N ARP , on the task-completion latency for several combinations of I and K when J = 8 and SNR = 20 dB. Regardless of N ARP value, the more number of transmit antennas, the less task-completion latency can be achieved, which coincides the result already found in Fig. 6. In addition, the task-completion latency can be reduced with a smaller value of K since the contention during the access reservation phase becomes alleviated.
Increasing the number of ARP resources implies that the contention during the access reservation phase is reduced and thus the time duration for the completion of the ARP is also reduced. However, the radio resources for the task offloading (i.e., uplink transmissions) correspondingly decreases since N = N ARP + N OFF , the time duration for the task offloading may increase as an opposite effect. Thus, with given parameters such as I and K , there always exists an optimal point that can achieve the minimum task-completion latency. When the device is equipped with multiple antennas, the wireless channel can be efficiently utilized in the spatial domain, and the smaller number of ARP resources is sufficient enough compared to the case of J = 1. The corresponding offloading factors can be presented in Fig. 7b. Fig. 8 compares the task-completion latency performance of our proposed technique with that of the state-of-the-art work [6], grant-free multiple access based offloading strategy. For convenience, we call it GFMA-offloading strategy. Note that the GFMA-offloading strategy is operated in the same manner with a multi-channel slotted ALOHA protocol. In our proposed framework, each device should perform contention with N ARP resources during the access reservation phase, and, thereafter, it performs scheduling-based consecutive offloadings.
On the contrary, in the GFMA-offloading strategy, each device attempts to transmit each fraction of the task with a contention manner. In other words, it uses a randomly selected resource among N resources whenever it attempts to send each partitioned task, which is the way that may result in a resource collision. If any of the offloaded packets experiences a collision, it will be retransmitted after back-off procedure. For a fair comparison, we applied the same retransmission model with ours, and related parameters such as T S , T B , and T CR are also set based on Table 1. Furthermore, we applied the same offloading factors that are found for operating our MEC framework, and, thus, we can purely investigate the effect of applying different schemes on the offloading performance. In any case, our proposed technique outperforms the GFMA-offloading strategy. This is because our technique requires a few contentions during the access reservation phase only, but GFMA-offloading strategy requires a large number of contentions during the entire offloading process. Furthermore, in case of GFMA-offloading strategy, the task-completion latency shows an increase before a certain point since the contention becomes severer as K increases. However, it decreases again and converges to the same result with ours as K increases since the same offloading factor is applied, i.e., β → 0 (see Fig. 5b). As a result, the GFMA-offloading strategy cannot exploit the advantage of offloading, and it becomes worse as the contention becomes severer, e.g., when the number of contention participants increases or when the amount of offloading attempts increases.

VII. CONCLUSION
In this article, we newly designed an access reservation protocol (ARP) tailored for operating mobile edge computing (MEC) systems. We investigated the task-completion latency performance of the MEC systems incorporated with our proposed ARP, and proposed an offloading decision strategy by formulating an optimization problem that finds the optimal offloading factor minimizing the overall task-completion latency. Through simulations, we evaluated the performance of the MEC system considering the ARP under a MIMO environment, and presented several results investigating the effect of the number of edge devices, the number of transmit antennas at the device side, and the amount of radio resources reserved for the ARP on the overall latency performance. We conclude that the offloading decision strategy (e.g., offloading factor) can be changed when considering practical aspects required for operating the MEC systems, and multiple antennas are highly recommended to the device side as well to efficiently exploit the high-power computation capability of the MEC server. Evaluating the performance in more practical environments (e.g., multi-server and heterogeneous scenario) and developing a fully decentralized offloading decision strategy remains as further work.