Intelligent Mobile Edge Computing With Pricing in Internet of Things

In this paper, we investigate mobile edge computing (MEC) networks for intelligent information services, where there are <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> users equipped with <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> antennas and one access point (AP). The users have some computational tasks, and some of them can be decoupled by the AP, at the cost of a fee charged by the AP. For the considered system, we firstly consider two important metrics of interest: latency and fee. Then, we formulate a stochastic game to model the interaction between users and the AP. In this game, the AP sets prices to maximize its profit, while users devise the offloading strategy to reduce both the latency and charge. We further optimize the system by applying the array signal processing schemes on the users, in order to reduce the transmission latency. Simulation results are finally presented to verify the effectiveness of the stochastic game, and it is shown that the array signal processing scheme can help reduce the transmission latency significantly.


I. INTRODUCTION
In recent years, the research of wireless communications has made a great progress [1]- [3], where the transmission data rate and the reliability have been explosively increasing [4], [5]. For example, the data rate in the fifth-generation (5G) communication systems has increased to about ten or hundred times, compared with the data rate in the fourth-generation (4G) communication systems. To support the explosively increasing data rate, many new techniques have been proposed. In particular, the technique of multiple antennas has been proposed to speed up the data rate by exploiting the spatial and temporal gains among antennas [6]- [8]. As a virtual form of multiple antennas, relaying technique is shown to be effective in improving the data rate by providing transmission diversity gain [9]- [12]. Besides the multiple antennas and relaying, cognitive technique has attracted much attention from researchers [13]- [15], since it can efficiently utilize the spectrum resources and help improve the transmission data rate [16]- [18]. Recently, the intelligent surface reflection technique has been proposed, The associate editor coordinating the review of this manuscript and approving it for publication was Shouguang Wang . which has extended the research of wireless communication from the conventional engineering perspective to the perspective of material science [19]- [22].
As an extension and application of 5G communication systems, the intelligent internet of things (IoT) has attracted much attention from the researches, since it can be used in a lot of fields and daily life, such as the intelligent transportation systems and intelligent video surveillance. Many new technologies have been proposed to support the application of the intelligent IoT. One big progress is the wireless caching technique [23]- [25], where the files can be pre-stored at the near-by nodes during the non-peak traffic. In this area, an important research aspect is to devise which files should be cached at the nodes, since in general the storage at the nodes is limited [26], [27]. The conventional most popular content (MPC) and largest content diversity (LCD) can be applied, which can obtain the largest signal cooperation gain and largest caching gain, respectively [28], [29]. Besides the caching technique, some intelligent algorithms can be applied into the intelligent information services. For example, the Q-learning based intelligent algorithms [30]- [34] have been proposed into the wireless transmission systems, in order to guarantee the security for the application systems [35].
An evolution of wireless cache is the mobile edge computing (MEC), which has been widely applied in the intelligent information services in recent years. In MEC networks, nodes can not only cache and communicate, they can also compute or help compute the tasks from the near-by nodes. In this way, the computational tasks can be computed very efficiently, with limited latency and energy consumption. In MEC networks, an important research point is the offloading strategy, which determines while file should be computed by which node. In this areas, some existing works such as [36]- [38] have studied the offloading strategy, and proposed some intelligent algorithms, in order to reduce both the latency and energy consumption.
In this paper, we investigate MEC networks for intelligent information services, where there are N users equipped with K antennas and one access point (AP). The users have some computational tasks, and some of them can be decoupled by the AP, at the cost of a fee charged by the AP. For the considered system, we firstly consider two important metrics of interest: latency and fee. Then, we formulate a stochastic game to model the interaction between users and the AP. In this game, the AP sets prices to maximize its profit, while users devise the offloading strategy to reduce both the latency and charge. We further optimize the system by applying the array signal processing schemes on the users, in order to reduce the transmission latency. Simulation results are finally presented to verify the effectiveness of the stochastic game, and it is shown that the array signal processing scheme can help reduce the transmission latency significantly.
The organization of this paper is given as follows. After the introduction in this section, we will discuss the system model of MEC networks from the perspectives of both users and the AP in Sec. II. Then, we introduce how to intelligently optimize the system performance by using the intelligent algorithms as well as the array signal processing in Sec. III. Sec. IV will present the simulation results and conclusions are finally made in Sec. V. Fig. 1 describes the system model of MEC network with multiple users, where there exits one access point (AP) with one MEC server. Each user has a task to be computed within a slot. Due to the limited computational power, these users may not complete the tasks within the prescribed time. Users need to offload partial or full task to the nearly AP with powerful computational capacity. The AP can assist users to complete the offloaded tasks and charge some expenses for users. We assume that there are N users and each user equipped with K antennas has a task of length l n . Therefore, the set of tasks for all users can be denoted as L = {l 1 , l 2 , . . . , l N }. The AP with powerful computational capability can provide users with different computational capability based on users' requirement, reasonably, the AP will set a higher price for more powerful computational capability. The set of the computational capability by the AP can be denoted as = {ξ 1 , ξ 2 , . . . , ξ M |ξ 1 ≤ ξ 2 ≤ . . . , ≤ ξ M }, and the corresponding price set is denoted as

II. SYSTEM MODEL
The AP can evaluate proper price parameters, meanwhile users can obtain the price that the AP set by the information exchange model and they design proper offloading strategy. It is worth noting that each user cannot obtain the offloading decision of other users.

A. USER MODEL ANALYSIS
In this system, we focus on the charge which users should pay to complete the computational tasks by the AP, and the latency includes both the latency calculated locally and the latency offloaded to the AP. We use the symbol θ n to denote the offloading decision for the user n, where θ n satisfies the constraint of θ n ∈ [0, 1]. When θ n = 0, it means that the whole task will be calculated locally and the computational capability of the user n is denoted by ζ n . When θ n = 1, the whole task of user n will be calculated by the MEC server. When 0 < θ n < 1, a partial task with size θ n l n will be offloaded to the AP and the residual task with size (1 − θ n )l n will be calculated locally. By analyzing offloading decision, we can obtain the local computing time of user n as Since user n need not pay the charge when whole task is computed locally, the charge is equal to zero. When a part of task is offloaded to the AP, user n will transmit it to the AP by wireless link and then the AP computes it and returns the result to the user. At the same time, the user will pay for the associated charge to the AP. Therefore, the transmit latency can be given by where R n is transmit rate of user n and it can be denoted by In Eq. (3), the symbol B n is bandwidth allocated by the system to user n, and P n is the transmit power of user n. Notation h n ∼ CN (0, n ) is the channel parameter of the wireless link from user n to the AP, and σ 2 is the noise power of the additive white gaussian noise (AWGN) at the AP [39]- [41], where the noise effect on the receiver can be found in the literature [42]- [44]. In addition, for user n, the time that the offloaded task is executed by the AP is given by when the AP sets the price as µ m . Since the computational result is very small in general, the time that result is returned is ignored in this system. From the above description, we can write the latency to complete the offloaded task as In practice, task offloading and task computing locally can be implemented in parallel for mobile devices. Therefore, the total time for each user n to complete own task is the maximum of the time of computing locally and the time of offloading to the AP. Accordingly, for the user n, the total time T total can be given by Duo to computational assist, the user n need pay for the charge to the AP. We assume that the charge is proportional to the size of the offloaded task, and hence the charge of user n can be given by For the n-th user, it can improve its communication and computational performance by reducing the total latency and the total charge. As wireless mobile communication technology has been developing continually, transmitting a large data is no longer a limitation in wireless networks. So users can reduce the total latency by offloading more tasks to the AP with powerful computational capability. While, by the equation (7), users have to pay more charge to the AP if they offload more tasks to the AP. From above description, the key to improve the user's performance is designing a proper offloading strategy θ n .
For the users, there are two important metrics of interest for the MEC-based wireless network, and we try to minimize both the latency and charge to reduce each user's cost. By the description, we find that it is a multi-objective optimization problem to improve users performance, which causes much difficulty to solve in practice. In addition, users may face an urgent task or tend to pay less in different scenarios. We use a weighted factor and turn the multi-objective optimization problem into a linearly weighted objective function by the weighted factor λ. The linearly objective function can be given by where the weighted factor λ ∈ [0, 1], and the n is the total cost that the n-th user completes the computational task. The usage of weighted factor λ not only simplifies the multi-objective optimization problem into a single-objective optimization problem, but also enables Eq. (8) to apply to more scenarios. In particle, when the value of λ becomes lager, the latency becomes dominant in the optimization problem. Instead, when the value of λ becomes small, the users need to compute task locally as much as possible to reduce the total cost of users.

B. AP MODEL ANALYSIS
The AP with MEC server earns revenue by assisting users to compute tasks. We assume that the AP's profit can be expressed as a function with the sum of offloaded tasks, and we set the AP's price µ m to independent variable of profit function. According, the AP's profit can be formulated as where the vector θ = (θ 1 , θ 2 , . . . , θ N ) and l = (l 1 , l 2 , . . . , l N ) are offloading decision list and task size list for all users, respectively. C total (θ , l) is the total cost that the AP computes all offloaded tasks. From (9), we find that the AP's profit is related to the price and the total offloaded tasks, and hence the AP pricing will directly affect the AP's profit. If the price is too low, the AP's profit will decrease; while users tend to compute task in local if the price is too high. Therefore, a dynamic price scheme should be applied to adjust to a variety of scenarios and make more profit for the AP.

III. PROBLEM OPTIMIZATION
In this section, we firstly analyze the objective functions of users and the AP, from which we formulate the system optimization into a stochastic game problem. The method Win or Learning Fast Policy Hill Climbing (WoLF-PHC) is proposed to solve the stochastic game problem. Moreover, we apply some array signal processing schemes to further enhance the system performance of MEC networks.
A. STOCHASTIC GAME FORMULATION In this system, each user wants to minimize the total cost of completing its computational task by designing the offloading decision θ n based on the computational capability and the AP's price. Meanwhile, the AP wants to increase its profit by changing prices. Therefore, the system model can be described as two optimization problem: the AP wants to maximize its profit by selling the computational capability to users, and the optimization problem can be expressed as The objective of each user is to minimize its own cost by choosing the optimal offloading decision for a given price µ m VOLUME 8, 2020 Note that the problem P1 and the problem P2 are coupled in a complicated way: the AP's price strategies affect the offloading decision of each user and each user's offloading decision θ n also has an influence on the AP's profit in turn. Hence, P1 and P2 can be described as a stochastic game problem.
A stochastic game problem can be described as a multi-agent reinforcement learning problem with a known reward matrix. However, it is very difficult to do ''moving target'' in the multi-agent problem. In the multi-agent problem, the agents' environment will be affected when each player changes the offloading decision. Each player is not able to control the other players or even know their next state, i.e., each user can only obtain the price of the AP, but it cannot obtain the offloading decision of other users.
To solve the multi-agent optimization problem, we apply the Win or Learn Fast Policy Hill Climbing (WoLF-PHC), which extends PHC with the ''Win or Learn Fast'' and Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to increase the selection probability of the maximum expected action. As the name implies, each agent has to determine whether it is currently wining or losing, and each agent will choose a low learning rate when it is wining currently, instead, it will learning quickly.
In the following, we will introduce the detail of WoLF-PHC in multi-agent MEC networks. WoLF-PHC can be applied to multi-agent stochastic game scenarios because it combines the algorithm Q-learning and PHC. As shown in Fig. 2, for each agent, there are mainly three parts in WoLF-PHC: environment, Q-learning and PHC. Each agent gets an action according to choosing selection probability of the maximum expected action in PHC model. Then, each agent obtains the reward value and next state by the selected action in environment model. Finally, each agent updates the Q-table by action, reward and next state in Q-learning model. The detailed description of the WoLF-PHC is given by the algorithm 1.
for each agent i in all agents do (a) From state s, select action a with probability π i (s, a) with some exploration. (b) Observing reward r and next state s , Update the Q-table by s , a )).
Update estimate of average policy π Update π i (s, a) and constrain it to a legal probability distribution, In this algorithm, the Q-values are stored and updated in the same manner as the Q-learning, which can be described by Eq. (12). Instead of using the action with the highest Q-value as the response for a given state, a probabilistic policy π is used which follows a selection probability function. The selection probability function consists of one probability per action. As the agent takes action, the selection probability function is modified by Eq. (14), and A i is the action set of agent i. In WoLF-PHC, the learning rates δ w and δ l are designed to change the algorithm's learning rate δ, and the rule for selecting the learning rate δ is given by Eq. (15). The estimate of average policy π (s, a) is used to estimate whether the agent i wins or not currently. Meanwhile, it is related to the times C(s) that current state s is visited and updated by Eq. (13). In addition, the WoLF-PHC is rational, since only the rate of the learning process is altered. This modification provides additional time for the other players to adapt the agent's changes in the same environment.

B. ARRAY SIGNAL PROCESSING
In this system, all users have multiple antennas, and hence the channel gain can be exploited by the array signal processing. Users need to offload a partial task to the AP to reduce the total cost of users by the wireless communication. Users are able to choose different antenna selection schemes to improve the channel gain from users to the AP. The K antennas are equipped at each user, and a simple method to exploit the multiple antennas is the random antenna selection (RAS) scheme, which means that each user may choose only one antenna randomly among K ones. Accordingly, the transmit rate that the offloaded task is transmitted from user n to the AP can be given by where the h n,k ∼ CN (0, k ) is the channel gain when user n select the antenna k to communicate with the AP. In addition, other antenna selection schemes are exploited at users in this subsection. Generally, the selection combining (SC) method can improve the equivalent channel gain, thereby increasing the transmit rate. SC can maximize the transmit rate of wireless communication when some antennas are used in MEC network. Using the SC selection antenna scheme, the transmit rate for user n can be given by The maximum ratio transmission (MRT) is another antenna selection scheme to improve the data transmit rate of the users. MRT scheme means that multiple antennas are used to assist users when users offload tasks to the AP. This method can significantly improve the users's transmit rate at the cost of increasing RF chains. And the transmit rate is written by

IV. SIMULATION RESULTS
In the simulations, we explore the proposed multi-agent game algorithm with different antenna selection schemes. There are 3 users in this system and their computational capabilities are set to 0.7 × 10 9 cycle/sec, 0.6 × 10 9 cycle/sec and 0.7 × 10 9 cycle/sec, respectively. Moreover, each user is equipped with two antennas, and the size of computed task l n is in the range of [2,3] Mega Bytes. Different pricing schemes are used and there are three prices corresponding to three computational capabilities. The three prices are set to 0.1, 0.2 and 0.5,  respectively, and the three computational capabilities are set to 1 × 10 9 cycle/sec, 2 × 10 9 cycle/sec and 4 × 10 9 cycle/sec, respectively. This means that users need to pay more charge when they choose more powerful computational capability.
In not specified, we use the equal bandwidth with 20 Mhz for each price. In algorithm WoLF-PHC, the α is equal to 0.8, and the symbols δ w and δ l are set to 0.1 and 0.5, respectively. In Fig. 3 and Fig. 4, the convergence of the algorithm is shown. We present how the weighted cost of each user and the AP's profit vary with the number of iterations in the WoLF-PHC algorithm, where we set the weighted factor λ to 0.5. From Fig. 3, we can find that the overall trend of the weighted cost is falling, while the profit of the AP is rising in volatility as shown in Fig. 4. All lines are convergent when iterating to more than 2500 times. From these results, we can see that WoLF-PHC can be used to solve the multi-agent game problem efficiently.
In Fig. 5, the weighted cost of each user is exploited with respect to the weighted factor λ, which varies from 0 to 1. Some other offloading schemes are used to compare with the proposed WoLF-PHC. There, ''All-Offloading (m = 1)'',  ''All-Offloading (m = 2)'' and ''All-Offloading (m = 3)'' mean that user 1, 2 and 3 offload the whole task to the AP with θ n = 1, where the AP prices are set to µ 1 , µ 2 and µ 3 , respectively. In addition, we consider the offloading scheme where each user computes the whole locally, denoted by ''All-Local''. From these figures, we find that the offloading decision ''WoLF-PHC'' has smaller weighted cost than other offloading decisions when the weighted factor λ ∈ [0.2, 1]. On the contrary, the weighted cost of ''WoLF-PHC'' is almost the same as ''All-Local'' when λ ∈ [0, 0.1] and n ∈ [1, N ]. The reason of this phenomenon is that the computed task is a non-urgent task. Meanwhile, each user wants to pay the AP as little as possible and tends to execute the whole task locally. As λ increases, the latency will dominate in the weighted cost of users, and accordingly, users prefer offloading task to the AP to reduce latency. The value of n is larger than T total,n when the decision ''All-Offloading (m = 3)'' is used, therefore line ''All-Offloading (m = 3)'' is a downward trend with the increase of λ. In addition, the line ''WoLF-PHC'' increases with the increase of λ when λ ∈ [0, 0.6]. This indicates that the latency dominates in the weighted cost.
In practical scenarios, the task may be urgent, and it should be completed within a prescribed time in MEC networks. So we exploit whether the task of each user can be completed within a time limit or not, and meanwhile, we observe the influence of different time limits on users' weighted cost FIGURE 7. Impact of the time limit on the weighted cost for each user, where λ is equal to 0.5. and the AP's profit. In order to exploit the problem globally, we add the fourth pricing and computational capability for the AP, which can be expressed as µ 4 = 0.8 and ξ 4 = 8 × 10 9 , respectively. The simulation results are shown in Fig. 7. From the results, we can see that users' weighted cost increases when the time limit decreases. This phenomenon implies that each user needs to increase θ n , in order to complete his own task within the time limit. Accordingly, users' total cost and the AP's profit increase.  In addition, the weighted cost is presented in Fig. 6 with different antenna selection schemes versus the weighted factor λ. In Sec. III, RAS, SC and MRT are employed and antenna selection schemes mainly effect the transmit latency. There, we set the number of antennas to 2 and set the weighted factor to 0.5. From Fig. 6, we can find that the MRT has the smallest weighted cost, and the SC outperforms the RAS in the total cost for all users when the weighted factor λ varies in [0.3, 1.0]. On the contrary, the three antenna selection schemes show the similar performance when λ ∈ [0.0, 0.2]. This is because that the charge n is dominant and the task tends to be computed locally instead of by the AP. In Eq. (6), the total time required to complete the task l n is the maximum between the T local,n and T off ,n for each user, while antenna selection schemes can only help reduce transmission latency. Therefore, the antenna selection schemes cannot affect the system performance when T local,n > T off ,n .
Finally, we exploit the impact of bandwidth for MEC networks. We set different bandwidth schemes for different prices of the AP. A simple bandwidth allocation scheme is to allocate the bandwidth equally for different prices, where, the bandwidth corresponding to prices [µ 1 , µ 2 , µ 3 ] is [20,20,20] MHz. Another scheme is to allocate the bandwidth based on the different prices, where, the bandwidth corresponding to prices [µ 1 , µ 2 , µ 3 ] is [10,20,30] MHz. For simplicity, we denote these two schemes by ''E-WoLF-PHC'' and ''P-WoLF-PHC'', respectively. From Fig. 8 and 9, we find that the scheme ''P-WoLF-PHC'' has a better performance when the tasks are urgent. This is because that users need more powerful computational capability and the AP sets higher prices for users when the weighted factor λ increases. Meanwhile, users tend to offload the whole task to the AP when the weighted factor λ is equal to 0.8 in Fig. 9. Hence, the AP's profit will not increase when the value of λ varies from 0.8 to 1.0.

V. CONCLUSION
In this paper, we investigated MEC networks for intelligent information services, where there are N users equipped with K antennas and one AP. The users had some computational tasks, and some of them could be decoupled by the AP, at the cost of a fee charged by the AP. For the considered system, we firstly considered two important metrics of interest: latency and fee. Then, we formulated a stochastic game to model the interaction between users and the AP. In this game, the AP set prices to maximize its profit, while users devised the offloading strategy to reduce both the latency and charge. We further optimized the system by applying the array signal processing schemes on the users, in order to reduce the transmission latency. Simulation results were finally presented to verify the effectiveness of the stochastic game, and it was shown that the array signal processing scheme could help reduce the transmission latency significantly. In further works, we will apply the considered MEC networks into the application of IoT based systems such as the works in [45]- [47]. Moreover, we will consider to use some other intelligent algorithms [48]- [54] to the considered system, in order to further enhance the system performance by reducing the latency and energy consumption.