Artificial Intelligent Multi-Access Edge Computing Servers Management

The advances of multi-access edge computing (MEC) have paved the way for the integration of the MEC servers, as intelligent entities into the Internet of Things (IoT) environment as well as into the 5G radio access networks. In this paper, a novel artificial intelligence-based MEC servers’ activation mechanism is proposed, by adopting the principles of Reinforcement Learning (RL) and Bayesian Reasoning. The considered problem enables the MEC servers’ activation decision-making, aiming at enhancing the reputation of the overall MEC system, as well as considering the total computing costs to serve efficiently the users’ computing demands, guaranteeing at the same time their Quality of Experience (QoE) prerequisites satisfaction. Each MEC server decides in an autonomous manner whether it will be activated or remain in sleep mode by utilizing the theory of Bayesian Learning Automata (BLA). A human-driven peer-review-based evaluation of the edge computing system’s provided services is also introduced based on the concept of Bayesian Truth Serum (BTS), which supports the development of a reputation mechanism regarding the MEC servers’ provided services. The intelligent MEC servers’ autonomous decisions’ satisfaction is captured via a holistic utility function, which they aim to maximize in a distributed manner. Finally, detailed numerical results obtained via modeling and simulation, highlight the key operation features and superiority of the proposed framework.


I. INTRODUCTION
With the advent of Internet of Things (IoT) and the 5G networks, the number of mobile devices, such as smartphones, wearable devices, tablet computers, and others, has dramatically increased, resulting in 8.2 billion mobile subscriptions in 2020 [1]. Also, the mobile applications running on the mobile devices, such as social networking and video-based applications, online gaming, and others, request services with various stringent delay, energy, and processing constraints. Such mobile applications are typically resource-hungry in terms of computation demand and energy consumption, thus, the mobile devices cannot support them locally and they migrate them to other available computing resources [2]. The current end-to-end computing continuum consists of multiaccess edge, fog, and cloud computing.
The associate editor coordinating the review of this manuscript and approving it for publication was Shiwen Mao .
The novel concept of multi-access edge computing (MEC) brings the computing capability closer to the end users by deploying modest-size MEC servers at the edge of the radio access networks. MEC, as compared to cloud computing, can significantly improve the data offloading delay, the corresponding mobile devices' energy consumption, as well as the security levels, given that the users' offloading tasks travel shorter distances. A large part of the recent literature has focused on the problem of users' optimal data offloading (or equivalently tasks offloading) by proposing either centralized approaches exploiting the softwarization of the data offloading decision-making process [3] or distributed approaches based on optimization [4], game theoretic [5], or learning methods [6] to support the users' autonomous decisionmaking.
However, limited research efforts have been devoted to the problem of activating the sufficient number of MEC servers, either from the users' perspective, i.e., satisfy the users' Quality of Service (QoS) and Quality of Experience (QoE) prerequisites [7], or from the service providers' perspective (the ones that own the MEC servers), i.e., maximize their techno-economics benefit or profit. Consequently, in this paper, the problem of distributed and autonomous MEC servers' management is studied based on an artificial intelligence-enabled framework that is jointly driven by the end users' QoE satisfaction and the computing service providers' reputation.

A. RELATED WORK
In this section, we will give a brief overview of some related research works in regards to MEC servers' management in terms of their activation and allocation of their computing capability to the end users. The problem of managing the computing capabilities of the servers has attracted the researchers' interest early enough in the literature. In 2007, the problem of noninstantaneous server activation (i.e., the non-zero time to replicate and activate a server) is studied in [8]. The authors introduce a dynamic approach in order to share the servers' resources among multiple users, while considering the hysteresis control to reduce the servers' activation and deactivation cost when momentary fluctuations of the workload occur. In [9], the authors introduce a two-timescale approach, where the servers' activation decision occurs in slow time scale, while the servers' computing capacity allocation to the users' offloaded tasks is made at a faster time scale based on the servers' power scaling criterion. The introduced two-time-scale joint job scheduling and servers' management problem is addressed via stochastic optimization. An energy saving-driven approach is proposed in [10], where the authors formulated a minimization problem of the MEC servers' total energy consumption under the constraints of users' QoS prerequisites and mobility patterns. The solution of this problem concluded to the selection of a set of MEC servers to be activated based on the users' offloading tasks request profiles.
In [11], a three-layer computing architecture is introduced consisting of the user, edge computing, and the cloud computing layers. Focusing on the edge computing plane, the authors propose a hierarchical structure of the geo-distributed activated MEC servers to aggregate the users' offloaded tasks, efficiently exploit the MEC servers' resources and manage the workloads during peak hours. The latter challenge is also addressed in [12], where virtualized network functions are strategically allocated to the available servers in order to minimize the energy consumption of the computing and networking infrastructure, as well as improve the protection level against resource demand uncertainty. In [3], a software defined networking approach is introduced to jointly manage the users' tasks offloading and the MEC servers' resources' exploitation based on a game theoretic and reinforcement learning approach, respectively. A similar approach is proposed in [13], by adopting a deep reinforcement learning model, while considering that the users may be reluctant in revealing personal information about the MEC servers' selection preferences and the computing demands of their offloading tasks.
A centralized approach for the MEC servers' management problem is introduced in [14], where a regional orchestrator manages the servers' activation and distribution of workloads towards guaranteeing the users' QoS prerequisites via formulating the problem as a stochastic overlapping coalition formation game. Focusing on the energy efficient operation of the MEC servers, the authors in [15] use a Lyapunov optimization approach to determine the MEC servers' energy harvesting policy from the environment, jointly with the task offloading scheduling. Following the philosophy of MEC servers' energy efficient operation, a dynamic power management scheme is introduced in [16] towards deciding when and for how long each MEC server should be in sleep mode in order to minimize the system's energy consumption without compromising the users' QoS prerequisites satisfaction. A scalability analysis of the MEC system is discussed in [17] for increasing number of users and offloading tasks. The authors adopt a Particle Swarm Optimization method in order to decide on the number of activated MEC servers, while lowering the users' serving delay and maximizing the MEC system's cost effectiveness.
A minority game theoretic approach is introduced in [18], [19] via considering the total number of MEC servers that should be activated to serve the users and allowing the MEC servers to autonomously decide their activation in order to respect the aforementioned constraint by following the theory of minority games. This work has been further extended in [20] via considering the problem of users' association to the MEC servers and introducing a distributed reinforcement learning-based decision making process.

B. CONTRIBUTIONS & OUTLINE
Despite the efforts made in the previous works, in regards to the MEC servers' management, how to incorporate the end users' personalized Quality of Experience satisfaction in the MEC servers' activation decision making still remains to be an open issue. Moreover, to facilitate the infrastructure/service providers' penetration in the computing market, how to capture the MEC servers' reputation is even challenging. In this work, we strive to tackle these issues. In detail, the design goal is to capture the end users' evaluation of the edge computing system and of the activated MEC servers that process their offloaded computing tasks in terms of satisfying their QoE prerequisites, create a reputation model for the MEC servers, and devise a distributed and autonomous MEC servers' activation mechanism based on the principles of artificial intelligence. The main contributions of this work that differentiate it from the rest of the literature, are summarized below.
1) A human-driven evaluation of the edge computing system (i.e., activated MEC servers) is introduced in regards to the services that they provide to the end users, based on the theory of Bayesian Truth Serum. The proposed approach supports the extraction of an objective evaluation from subjective data provided by the users.

2) A reputation scheme is proposed based on a novel
Bayesian model to quantify the reputation of each activated MEC server in the computing market based on the Quality of Service that they provide to the end users and based on the latter ones subjective QoE satisfaction. 3) An artificial intelligence-based MEC servers' activation mechanism is devised based on the theory of Bayesian Learning Automata. The MEC servers decide their activation or not in a distributed and autonomous manner towards improving their reputation in the overall edge computing system, while considering their cost to serve the users computing demands. A distributed and lowcomplexity algorithm is also designed that converges to the Bayesian Nash equilibrium, which is a stable operation point for the whole edge computing system. 4) A series of experiments are performed to evaluate the performance of the overall MEC servers' management scheme, in terms of the extraction of the objective evaluation of the MEC servers' provided computing services based on the users' subjective data, the reputation scheme for the MEC servers, and their autonomous decision making regarding their activation or not. Also, a detailed comparative evaluation with alternative MEC servers' management schemes demonstrates our proposed framework's superiority and benefits.
The rest of the paper is organized as follows. The system model is introduced in Section II, while the users' evaluation mechanism of the MEC servers is discussed in Section III. Section IV introduces the MEC servers' reputation model and the artificial intelligence-based MEC servers' activation is analyzed in Section V. Simulation results are investigated in Section VI, while Section VII concludes the paper. ] that is allocated to the users' offloaded computing tasks in order to process them. The MEC server's reputation is gained based on its capability to serve the users and satisfy their Quality of Experience prerequisites. More information regarding the MEC servers' reputation scheme is provided in Section IV.

II. SYSTEM MODEL
Each user u ∈ U offloads a task T ] expresses the computation complexity of the task requested by the user, and its value depends on the nature of the application, i.e., a larger value of φ (t) u expresses a more computationally intensive task. The problem of users' association to the available MEC servers is not addressed in this paper, while a similar approach is adopted from our previous works [21], [22] considering the users' computation tasks and personal characteristics in order to select a MEC server. Thus, in the rest of the analysis, we consider that each user's tasks are offloaded to the edge computing system and distributed among all the activated MEC servers following an intelligent software-defined orchestration mechanism [23], [24].
Eventually, each MEC server s ∈ S decides in an autonomous and distributed manner whether it will offer its computation capability to the users' tasks execution or not, i.e., a s = 1} to be the set of the activated MEC servers at a specific time slot t. The adopted system model, as well as the overall architecture of the proposed artificial intelligent multi-access edge computing servers management system is presented in Fig. 1.

III. HUMAN-DRIVEN COMPUTING SYSTEM EVALUATION
In this section, a human-driven peer-review-based evaluation of the edge computing system and the corresponding MEC servers is introduced in regards to the services that they provide to the end users, based on the theory of Bayesian Truth Serum. At the end of each time slot t, each user u ∈ U evaluates the perceived QoE from the activated MEC servers, which are all responsible for the execution of its task T (t) u at time slot t, in a distributed manner. In order to assess how truthful were the aforementioned users' evaluations, we adopt the concept of Bayesian Truth Serum (BTS), which allows us to elicit an overall objective evaluation from subjective data when the ground truth is unknown and being at the same time strict Bayes-Nash incentive compatible for |U | → ∞ [25]. Specifically, at time slot t, user u answers a binary question, i.e., ''Are you satisfied from the activated MEC servers' S (t) provided service?'', by providing the following two evaluation reports: • The information report x u denotes the prediction regarding the fraction of the users whose answer is x is the prediction for the fraction of the users whose answer is x Based on the users' information and prediction reports, the population endorsement frequencies x (t,S (t) ) i (Eq. 1) and the geometric mean of the users' population predictions y (t,S (t) ) i (Eq. 2) for each of the answers i, i ∈ {0, 1} are calculated as follows: and log(y Consequently, the BTS score sc , y u (t,S (t) ) ) of each user u, which depicts how truthful its personal answer was, is calculated based on the formula: where the first part of Eq.3 is the information score and the second part is the prediction score of the user. The information score increases if an answer i, i ∈ {0, 1} is surprisingly common, i.e., if the mean endorsement population frequency x (t,S (t) ) i is higher than the corresponding mean prediction y (t,S (t) ) i . The surprisingly common criterion is based on the Bayesian reasoning principle, which states that a user believes that the rest of the population will underestimate its personal opinion, reporting thus a higher prediction for that answer in order to further support it.
As a result, according to the Bayesian argument [26], the users that are truthful regarding their personal information reports x u (t,S (t) ) , they should also provide higher predictions, leading thus to the conclusion that a truthful user's opinion is more likely to be surprisingly common. Moreover, it should be noted that the prediction score (second term of Eq. 3) acts as a penalty proportional to the Kullback-Leibler divergence [27] between the actual population endorsement frequencies of the answers x (t,S (t) ) i and the respective user's population predictions' geometric means y (t,S (t) ) u,i . The physical meaning of this observation is that the optimal prediction score is achieved when the user's prediction is equal to the actual mean frequency of the answer i (absolute accuracy), i.e., y . In this case, the prediction score is equal to 0 since the user will experience a zero penalty.
The parameter α, α > 0 in Eq. 3 controls the effect of the prediction score (penalty) in the total BTS score, i.e., the higher is the value of parameter α, the higher is the contribution of the prediction score in the calculation of the BTS score and the lower will be the overall BTS score. The physical meaning of the parameter α is that it fine-tunes the importance of the prediction error, since a potential minority of the users who may be more distant from the activated MEC servers will experience a worse QoE satisfaction due to the increased latency than the majority of the users which is closer, thus, satisfied. As a consequence, the prediction of the unsatisfied users may not be representative with respect to the overall service provided by the activated MEC servers and thus, they should not be strictly penalized because of the VOLUME 8, 2020 inaccurate prediction. As a next step, the average BTS score u Finally, the most truthful answer x (t) BTS regarding the activated MEC servers' provided services at time slot t is the answer that has the highest average BTS score, i.e.: Since all the activated MEC servers contribute to the users' tasks execution in a distributed manner, the outcome of Eq. 5 determines if the experienced service provided by S (t) activated MEC servers, satisfied (x (t,S (t) ) BTS = 1) or not (x (t,S (t) ) BTS = 0) the users' QoE prerequisites, based on the corresponding users' subjective evaluation.

IV. SERVERS REPUTATION SCHEME
In this section, we exploit the human-driven evaluation of the edge computing system, as presented in the previous section, in order to create a reputation scheme for the MEC servers, capturing their capability to serve the users and satisfy their QoE constraints. Towards capturing the MEC servers capability to serve the users and the corresponding penetration of the infrastructure/service provider, who owns the MEC servers, into the computing market, we introduce the reputation µ (t) s for each activated MEC server s at time slot t. A Bayesian model is devised, which features adverse selection based on the Bayesian updating of belief of the users [28].
Specifically, all users share the same prior belief distribution µ (s,t) 0 = µ 0 , ∀s ∈ S, ∀t, regarding the QoE satisfaction that each MEC server s can provide to them. Each MEC server s can either offer a high or low QoE satisfaction, with probabilities a H and a L , respectively, where 0 < a L < a H < 1. At every time slot t, the activated MEC servers that offer their computation services to the users, are evaluated by the BTS mechanism (Section III), i.e., if the users are overall satisfied x  s , the number of times until time slot t that the MEC server s offered its services to the users and it was evaluated as helpful (x (t,S (t) ) BTS = 1), and as F (t) s we denote the number of times that the MEC server's s service was evaluated as not helpful (x (t,S (t) ) BTS = 0). Thus, each MEC server's s posterior distribution of reputation is given as follows [28].
Based on Eq.6, we observe that there is a correlation between the MEC server's s history of the users' positive and negative evaluations, i.e., the Q (t) s and F (t) s , respectively, and its corresponding reputation. Specifically, the MEC server's reputation µ (t) s increases with respect to the positive received evaluations, i.e., Q (t) s , and decreases with respect to the negative evaluations, i.e., F (t) s .

A. MEC SERVERS UTILITY FUNCTION
Following the presented reputation scheme for the activated MEC servers at each time slot t, in this section we formulate each MEC server's utility function, which captures the benefit of the MEC server by participating or not in the computing system and offering its computing capability to the users. Each MEC server s acts as an artificial intelligent agent making decisions in an autonomous and distributed manner, thus determining whether it should provide its computation capability F (t) s , i.e., a (t) s = 1 (the server is activated), or not, i.e., a (t) s = 0 (the sever remains in sleep mode), for the users' data processing at time slot t. A holistic utility function is introduced for each MEC server to capture its perceived benefit in terms of rewards by processing the users' offloaded data (users positive or negative provided evaluations). The MEC server's s, s ∈ S, utility function is formulated as follows.
where R (t) u denotes the overall reward which is offered by the user u ∈ U at time slot t for the computational services of the activated MEC servers, i.e., ∀s ∈ S, where a (t) s = 1. In case that a (t) s = 1, i.e., the MEC server s is activated at time slot t, the first part of Eq.7 expresses the portion of the total reward u∈U R (t) u that MEC server s will receive, with respect to its reputation µ s compared to the other activated servers s , s ∈ S (t) , the larger will be the portion of the reward that s will receive by the users. The second term of the utility function depicts the operating computing costs of the activated MEC server s in order to process parts of all the users' tasks. In case that a (t) s = 0, i.e., the MEC server remains in sleep mode, then the experienced utility is 0, as the MEC server s will not receive reward from the users and it will not spend its computational resources. The introduced utility function drives the MEC servers to select in an autonomous and distributed manner whether they will serve the users or not at a certain time slot t, by evaluating the potential economic gains over the operating costs.
In order to formulate the overall reward R (t) u , ∀u ∈ U , that is provided by each user to the activated MEC servers, we adopt the concept of the economic sale price [28] based on the reputation µ (t) s , as formulated in Eq. 6. Specifically, in order to depict in a holistic way the efficiency of the overall MEC environment in terms of serving the users' computation demands, we consider the average Bayesian reputation of the activated MEC servers s, s ∈ S (t) , which offer their computational services to the users at time slot t, as follows.
It is noted that if no MEC server is activated, we have µ (t) = 0. Thus, based on the average holistic bayesian belief µ (t) , the users reward the activated MEC servers with the following reward: It is observed that the reward R (t) u is an increasing function with respect to the average reputation µ (t) , i.e., the better is the overall posterior belief of the users regarding the MEC servers provided QoE satisfaction to the users, the higher is the reward offered by the users.

V. ARTIFICIAL INTELLIGENT SERVERS ACTIVATION
Towards realizing an autonomous and distributed decisionmaking mechanism, we propose a Learning Automata-based learning algorithm, where each MEC server s operates as a Bayesian Learning Automaton (BLA) [29] in order to determine whether it should offer its computation capability F (t) s for the users' offloaded data execution or not at a certain time slot t. The goal of each MEC server is to maximize its utility function as presented in Eq.6. Fig.1 presents the overall procedure of the Bayesian Learning Automata, where the learning process unfolds throughout the iterations at time slot t and in every iteration ite, each BLA (i.e., MEC server s), selects an action a According to the feedback that the server s receives from the MEC environment at the ite iteration, it develops an intelligence, i.e., an action probability distribution, based on which it can make better decisions in the future, i.e., in the Update the maximum experienced utility: that will lead it to receive higher rewards during the learning process. In the following subsections, we present the distributed algorithms regarding the processes of the MEC environment determining whether to reward or penalize the MEC servers for their decisions and the Bayesian Learning Automata decision making process. It is noted that the traditional Learning Automata schemes need to have a predefined (constant) value of the learning parameter, which controls the tradeoff between the learning speed and the learning accuracy. However, this assumption is unrealistic within the examined dynamic MEC environment. Thus, by adopting the Bayesian Learning Automata scheme, which does not require a constant learning parameter, we enable our framework to be dynamic and adaptable during the learning process.

A. BAYESIAN LEARNING AUTOMATA REWARD AND PENALTY FORMULATION
In this section, we present the process of determining if a MEC server s, s ∈ S, i.e., a Bayesian Learning Automaton, should be rewarded or penalized for its selected action a (ite,t) s . The MEC server s must be rewarded or penalized depending on the utility u (ite,t) s (Eq.6) that it experiences in an iteration ite. Specifically, if the MEC server decides to participate (a (ite,t) s = 1) in the users' offloaded data execution, but it provides a low computation capability F  is evaluated according to that value. The BLA rewards and penalties calculation algorithm is executed in every iteration of a time slot of the decision-making learning process and is presented in Algorithm 1.
The complexity of the BLA rewards and penalties calculation algorithm is (|S|), since every MEC server s should determine if it will be rewarded or penalized by performing algebraic calculations, and the complexity of the latter is constant, i.e., (1).

B. BAYESIAN LEARNING AUTOMATA DECISION MAKING
The BLA-based learning method, which operates under the philosophy of the Bayesian reasoning framework [30], is computationally efficient, since it is independent of any learning parameter (in contrast to other Learning Automatabased schemes). Thus, it is self-adaptive to dynamic changes of the MEC environment. Furthermore, it follows an action Beta probability distribution instead of storing an action probability vector, which would increase the memory and computation need of the corresponding algorithm [31]. Every MEC server s that acts as a BLA preserves two hyper-parameters a s,i and β s,i which correspond to the number of times that s received a reward and a penalty, respectively, by choosing action i ∈ {0, 1}. Those hyper-parameters form the Beta probability density function (PDF) as follows: thus leading to the respective cumulative Beta distribution function: where x ∈ [0, 1]. In every iteration ite at a specific time slot t, if the decision a (ite,t) s of MEC server s leads to a reward (E a (ite,t) s = 1), then the corresponding hyper-parameter a s,i is increased, whereas if it leads to a penalty the corresponding hyper-parameter β s,i is increased. Thus, the cumulative Beta distribution function (Eq.10) serves as a Bayesian metric of the reward probabilities of each of the available actions a (ite,t) s ∈ {0, 1}. The steps of the BLA-based decision making process are presented in Algorithm 2, where convergence is our termination criterion and it becomes true when each MEC server s that experiences a reward, i.e., E a (ite,t) s = 1, chooses the same action a (ite,t) s for a specific number of K iterations, implying that a near-optimal solution has been determined with high probability.
Regarding the BLA-based Decision Making algorithm's complexity, for a specific time slot t, in a specific iteration ite, each MEC server s determines its action a (ite,t) s , which involves only algebraic calculations needing constant time (1) to be executed, thus, concluding to a total execution time of (|S|). Afterwards, Algorithm 1 is executed in order all the rewards and penalties to be determined with a corresponding complexity of (|S|) (Section V-A) and the hyper-parameters for s = 1 to |S| do 6: Pick x s,0 and x s,1 from if (x s,0 ≤ x s,1 ) then 8: Participate: a 14: for s = 1 to |S| do 15: if (E a (ite,t) s = 1) then 16: if (a  28: end for 29: end while of all the MEC servers are updated concluding to a complexity of (|S|). Thus, the total complexity of Algorithm 2 for time slot t is O(|S| · Ite (t) ), where Ite (t) is the total number of iterations for convergence in that time slot. The overall complexity of Algorithm 2 for all time slots T that the MEC system is examined, is O(T · (|S| · max t∈ [1,T ] {Ite (t) })).

VI. NUMERICAL RESULTS
In this section, a detailed numerical evaluation is provided in terms of the overall proposed framework's operation effectiveness, its scalability, and its efficiency compared to other alternatives. Specifically, the extraction of the users' evaluation through the Bayesian Truth Serum approach is presented and compared to an alternative approach, named Robust Bayesian Truth Serum, to show the drawbacks and benefits of the proposed human-driven computing system evaluation (Section VI-A). Then, the operation of the Bayesian Learning Automata (BLA) and the reputation system is demonstrated (Section VI-B), and a detailed comparative evlauation to other approaches is presented (Section VI-C).
The proposed framework's evaluation was conducted in a MacBook Pro Laptop, 2.5GHz Intel Core i7, with 16GB LPDDR3 available RAM. We consider |S| = 20 MEC servers, K = 3, T = 100 time slots, µ 0 = 0.2, a H = 0.51, a L = 0.49, and F (t) s ∈ [10 · 10 9 , 12 · 10 9 ] CPUcycles sec , which is clipped into a smaller range of computing power units, i.e., F (t) s ∈ [10,12], ensuring thus the same order of magnitude of the individual considered terms in the MEC servers' utility function. In the following analysis, we consider a population of |U | = 500 users, otherwise explicitly stated.

A. EXTRACTION OF HUMAN-DRIVEN COMPUTING SYSTEM EVALUATION
In this section, the proposed human-driven computing system evaluation based on the theory of the Bayesian Truth Serum (BTS) is presented. In the following, we consider |U | = 15 users for the evaluation of the BTS component at a specific time slot t. Fig. 3a presents the average population endorsement frequencies x (t,S (t) ) i (Eq.1) for both answers i ∈ {0, 1}, i.e., the ''YES'' and ''NO'' answers. We observe that the majority of the users, i.e., 73.33% voted for ''YES'' since they were satisfied from the perceived QoE provided by the MEC servers at time slot t. Fig. 3b shows the mean prediction y (t,S (t) ) i (Eq. 2) of the users regarding the population proportion that voted for ''YES'' and for ''NO'', i.e., 54.18% and 33.46%, respectively, which concludes to the outcome that ''YES'' is a surprisingly common answer. Based on the results presented in Fig. 3a, 3b, we conclude to the average BTS scores u (t) i (Eq. 4) for the aforementioned answers, which are 0.2275 and −0.62562, respectively. Therefore, based on Eq.5, the users that are overall satisfied from the experienced QoE are more truthful than the users with the opposite opinion (as it is shown by the higher average BTS score for the answer ''YES''), thus, the final answer determined by the BTS framework (Section III) is ''YES'', i.e., x (t) RBTS = 1. Complementary to the above analysis, Table 1 presents all the users' answers, predictions and individual scores, which led to the previously discussed results, presented in Fig. 3a-3c. The results reveal that the users, who provided the same answer, achieve the same information score, since the latter is only related to the users' answers' mean endorsement population frequencies x   is higher than the corresponding mean prediction y (t,S (t) ) 1 (54.18%), which makes this answer surprisingly common, thus, a truthful answer, while on the other hand, the ''NO'' answer is unsurprisingly common. Moreover, we can clearly observe that the closer is a user's u prediction y u (t,S (t) ) to the mean of the actual personal answers x (t,S (t) ) i , i ∈ {0, 1}, the better is the prediction score, meaning that it is closer to the optimal value of 0. Thus, we observe that user 4, who answered ''YES'', had the best prediction score since its VOLUME 8, 2020 prediction y 4 (t,S (t) ) was the most accurate among all the users' predictions, and in combination with the positive information score of its surprisingly common answer, it is the most truthful answer in the examined system, receiving the highest BTS score (0.2558). Another interesting observation is that the proposed BTS framework penalizes the extreme users' beliefs. Specifically, we observe that user 2, who answered ''YES'', is extremely biased since it predicted that 98% of the population will provide ''YES'' as a personal report and only the rest 2% will answer ''NO''. However, because of that extreme prediction report, the user received the worst prediction score (−0.4781), since it did not approach the actual mean endorsement population frequencies and its overall BTS score was negative (−0.2215), thus being ranked in the 12 th place of all the BTS scores in the examined system. Furthermore, user 12 answers ''NO'', but it gives a higher prediction for the ''YES'' answer (70%) compared to that of the ''NO'' answer (30%). In that case, we observe that even though this user receives a negative information score (because of its unsurprisingly common answer), it also receives the second best prediction score (−0.0027) and a better BTS score than the aforementioned extreme user 2, which makes user 12 more truthful compared to user 2. Finally, it is shown that the prediction score of user 1 (−0.1132), who is uncertain for its opinion since its prediction is y 4 (t,S (t) ) = (0.5, 0.5), is close to the median of the prediction scores' range. Thus, we conclude that the proposed BTS framework evaluates in a neutral manner the uncertain users, assigning them also a positive BTS score.
As a next step, we examine the descendant of the BTS, i.e., the Robust Bayesian Truth Serum (RBTS), which is strictly Bayes-Nash incentive compatible for |U | ≥ 3 [32], in order to compare it with our BTS component. It is noted that the key difference of the RBTS model compared to the BTS approach is that the first one converges to a stable equilibrium solution for small number of users, i.e., |U | ≥ 3, while the BTS approach requires |U | → ∞. Thus, the RBTS approach is considered to be more stable for smaller populations than the BTS framework, leading to more truthful answers' elicitation. Specifically, according to the RBTS each user u provides the following two reports: • The information report x u (t,S (t) ) ∈ {0, 1}, which is its personal answer and 1 denotes the answer ''YES'', while 0 denotes the answer ''NO''.
• The prediction report y u (t,S (t) ) ∈ [0, 1], which is the user's prediction regarding the fraction of the users that answered ''YES'', i.e., x u (t,S (t) ) = 1. Afterwards, a reference user r u = u + 1(mod |U |) and a peer user p u = u + 2(mod |U |) are selected for each user u ∈ U and the following is calculated: Consequently, the RBTS score of each user u, which depicts how truthful its information report was, is calculated based on the formula [32]: p u ) denote the information and the prediction scores respectively, while R sps is a strictly proper scoring rule, defined as follows [32].
Finally, the truthful answer x (t) RBTS regarding the MEC servers' service at time slot t is the answer that has the highest average RBTS score, and is determined as follows: It is noted that in the presented experiment in this section, the outcome of the RBTS was identical to the one concluded by the BTS mechanism, i.e., x (t) RBTS = 1 (i.e., YES) for fairness in the comparison of the two frameworks. Table 2 presents the users' information, prediction, and RBTS scores, whose personal and prediction reports are the same as the aforementioned BTS instance. The results reveal that users 5 and 10 that provide the same answer, i.e., ''YES'', and almost identical prediction reports, they achieve completely different RBTS scores (1.8064 and 1.3131, respectively), compared to the BTS approach where their scores are almost identical (0.1923 and 0.1851, respectively). This outcome reveals the inherent drawback of the RBTS mechanism, which determines the truthfulness of the users based only on the reference r u and peer p u users. Given that these users (i.e., reference and peer users) are different for each user, the final RBTS scores are fully dependent on the reports of only two other users, which can totally decrease the truthful reporting incentive.
Moreover, we observe that the user 2 with the extreme reporting beliefs, who gave as a prediction report y (t) 4 = (0.98, 0.02), has acquired the 5 th best RBTS score, while the BTS approach discouraged such an extreme behavior by ranking it in the 12 th place. However, if we invert the positions of users 2 and 10, the RBTS scores would be 0.6552 and 1.6375, which means that in that case user 2 would be ranked in the 12 th place among all the users. Thus, we conclude to the outcome that even though the RBTS converges to a Bayes-Nash Equilibrium for smaller populations (|U | ≥ 3) than the BTS mechanism (|U | → ∞), it is more unstable regarding the users' individual truthfulness scores. The basic cause for this fundamental problem is that the RBTS score of each user depends on its position (i.e., index) inside the entire population. For that reason, we conclude that the RBTS mechanism provides a weak incentive to the users to be truthful, compared to the BTS which is a lot more stable as shown in the previous discussion.

B. BAYESIAN LEARNING AUTOMATA & REPUTATION EVALUATION
In this section, we study and analyze the behavior of the servers reputation scheme, as well as the artificial intelligent servers activation mechanism following the Bayesian Learning Automata approach, as they were introduced in the Sections IV and V, respectively. Our goal is to show the performance of the proposed framework towards determining the MEC servers' optimal decision of being activated or remaining in the sleep mode. = 1) keeps increasing until the convergence of the server to its final decision of being activated. The reason that led the MEC server to that decision can be derived from the Beta Probability Density Function (PDF) presented in Fig.4c. Specifically, the results illustrate that the probability of the server s to be activated (a (t) s = 1) is higher (blue curve) compared to the probability of remaining in the sleep mode (red curve). This behavior is justified based on the results presented in Fig.4d, where the hyper-parameter a s,1 is monotonically increasing, while the hyper-parameter β s,1 takes small values. The latter observation means that the server's decision of being activated (a (ite,t) s = 1), led most of the times to a reward (E a (ite,t) s = 1), rather than a penalty. The results presented in Fig. 5 illustrate the operation of the proposed servers reputation scheme. Specifically, Fig.5a captures the average reputation µ (ite,t) (Eq.8) and the total users' reward u∈U R (ite,t) u (Eq.9) as a function of the BLA iterations at a certain time slot t. Initially, we observe that both the average reputation and the users' reward change over time, since in every iteration the decisions of every server s, i.e., a (ite,t) s , may be different, meaning that different MEC servers decide to participate until the BLA's convergence point. Also, we conclude to the outcome that the greater the average reputation µ (ite,t) is, the more reward u∈U R (ite,t) u the users offer to the servers. The latter observation confirms our theoretical formulation (Eq.9) and secondly is expected according to the Bayesian belief, i.e., the greater the servers' reputation is, the stronger is the users' trust that they will experience a high QoE, thus, they provide a high reward to the MEC servers. Moreover, Fig.5b illustrates a Monte Carlo evaluation (10, 000 executions) of the MEC servers' achieved utility with respect to the BLA iterations at a certain time slot t. The results indicate that less than 100 iterations are required for the MEC servers to converge to their final decisions (equivalent to 0.8 sec), while at the same time the MEC servers' average utility is maximized. The latter observation is derived due to the fact that the proposed framework drives each MEC server to make optimal decisions a (ite,t) s regarding its activation (or not), i.e., each MEC server considers if its overall reputation µ   6 shows the operation of the artificial intelligent servers activation mechanism for two indicative MEC servers. Specifically, Fig. 6a shows that MEC server 1 achieves a higher reputation than the MEC server's 2 reputation. Thus, the users trust more server 1 that it will satisfy their QoE requirements. This fact can also be validated in Fig. 6b, where the total number of the positive users' evaluations for the first server, i.e., Q (t) 1 , is always higher than the corresponding Q (t) 2 of the second server. This indicates that the users were more satisfied when server 1 contributed to their tasks' execution (Q (t) 1 > F (t) 1 and Q (t) 1 increases faster than F (t) 1 ), than they were when server 2 was activated since its respective total number of the negative evaluations became equal with that of the positive evaluations after a certain point of time (Q (t) 2 ). Thus, MEC server 2 remained in the sleep mode for the last time slots since it was non-profitable for it to serve the users. Moreover, the fact that the users were satisfied from the received QoE from MEC server 1, drove the latter to become active more times than the MEC server 2 (Fig. 6c). The latter observation is also confirmed from the results presented in Fig. 6d, which show that the first server received a higher fraction of the total users' reward due to its higher reputation, which finally resulted to a higher experienced utility over the time horizon (Fig. 6e). Fig.7 depicts the BLA iterations and the corresponding real execution time required for convergence of the MEC servers activation decision making as a function of the increasing number of MEC servers. The results reveal that the proposed framework scales very well with respect to the increasing number of servers, resulting in realistic execution time. Thus, the proposed framework can be implemented in a realistic MEC environment in real or close to real time manner.

C. COMPARATIVE EVALUATION
In this section, a comparative analysis is presented among the proposed artificial intelligent MEC servers management framework and four other approaches, which are described as follows:  The results reveal that our approach, i.e., (e), outperforms compared to all the alternatives since it jointly considers both the servers' reputation and computation capability in a holistic and intelligent manner, confirming that it enables all the servers to take the most profitable decisions in an autonomous manner. The approach (c) illustrates the second best results in terms of MEC servers' average achieved utility, given that it allows only the most trusted MEC servers to become active. The worst performance is exhibited by the first two alternative approaches (i.e., (a) and (b)), as they are the least sophisticated and they do not consider the system parameters in the MEC servers' activation decision making.

VII. CONCLUSION
The establishment of the multi-access edge computing (MEC) and its widespread adoption in 5G networks require the refinement of its management in an intelligent and costefficient manner. In this paper, a novel approach towards determining the MEC servers' activation, as well as the users' data processing, is introduced, based on the principles of artificial intelligence and specifically of reinforcement learning and Bayesian reasoning. The MEC servers operate as autonomous decision-makers towards maximizing their utility function, considering the offered reward from the users, which is modeled following the economic sale price model as well as the respective computing costs experienced from the aforementioned processing. Accordingly, the users follow a human-driven evaluation of the edge computing system based on the Bayesian Truth Serum concept regarding the satisfaction of their QoE prerequisites. Finally, this evaluation leads to a reputation scheme which characterizes the efficiency of the MEC servers in the edge computing system. A low complexity distributed algorithm which maximizes the utility function of the MEC servers is introduced, while detailed numerical results that clearly demonstrate our framework's operation and superiority are presented.
Our current and future work is focused on the extension of this model under the principles of Deep Reinforcement Learning, where the MEC servers will explore the continuous space of their available computing power resources in order to provide the optimal amount of it for the users' data processing. Moreover, we examine the prospect of incorporating Contract Theory into our framework, where each user acts as an ''employer'' offering rewards to the the MEC servers, which act as ''employees'' providing their computing resources to the users, in order the latter to process their data.