A Two-Stage Incentive Mechanism Design for Quality Optimization of Hierarchical Federated Learning

In this paper, we investigate the aggregated model quality maximization problem in hierarchical federated learning, the decision problem of which is proved NP-complete. We develop the mechanism MaxQ to maximize the sum of local model quality, which consists of two stages. In the first stage, an algorithm based on matching game theory is proposed to associate mobile devices with edge servers, which is proved able to achieve the stability and $\frac {1}{2}$ -approximation ratio. In the second stage, we design an incentive mechanism based on contract theory to maximize the quality of models submitted by mobile devices to edge servers. Through thorough experiments, we analyse the performance of MaxQ and compare it with the existing mechanisms FAIR and EHFL, under different deep learning models ResNet18, ResNet50 and AlexNet, individually. It is found that the model quality can be improved by 8.20% and 7.81%, 10.47% and 11.87%, 10.98% and 11.97% under different models, respectively.


I. INTRODUCTION
According to the forecast of Cisco [1], by 2023 the total number of Internet users will reach 5.3 billion, and the number of network devices will reach 29.3 billion. Ericsson predicts that the average monthly total mobile network traffic will exceed 300 EB by 2026 [2]. In traditional machine learning, a large amount of user data will be uploaded to the cloud server for centralized machine learning. However, this may leakage users' privacy information [3], and bring huge delay and energy consumption [4]. To address these challenges, Google proposed federated learning (FL) in 2016 [5]. In FL, model training is performed on mobile devices, and only parameters update of the local trained models needs to be transmitted to the cloud server for global aggregation, which makes the size of the uploaded data significantly smaller and thus reduces energy consumption and latency. In addition, The associate editor coordinating the review of this manuscript and approving it for publication was Hailong Sun . since users do not have to upload original data, their privacy can be protected.
In FL, the cloud server is mainly used as parameter server [6]. As the number of mobile devices involved in the model training process increases [7], FL faces the challenge of overloaded transmission in the infrastructure networks and computation in the cloud. With the development of edge computing, computing devices on the edge of networks can perform as parameter servers [8], [9], and hierarchical federated learning (HFL) is proposed [10], the architecture of which is cloud-edge-client based and different with the traditional cloud-client one. In HFL, edge servers can aggregate parameter updates from the nodes in their region, uploading the results to the cloud server. This new kind of federated learning technology is used in many applications, such as anomaly detection [11], unmanned aerial vehicular networks [12], and power transformer fault diagnosis [13], etc. The framework of HFL is shown as Figure 1. Model quality may be the most important performance metric in different HFL applications [14], such as autonomous driving [15], [16], health monitoring [17]. Due to the size and quality of data set on mobile devices is usually different, their obtained local model quality always differs with each other. Due to the participation of mobile devices in model training will consume their own resources [18], it is necessary to design effective incentive mechanisms to encourage them to contribute high qualified model updates in the training process. However, there are two key challenges in the incentive mechanism design for HFL. First, the associated mobile devices with a particular edge server may change dynamically due to their mobility, i.e., they can enter or leave the region of the edge server at any possible time. Second, it is difficult for edge servers to evaluate the real contribution of mobile devices until the models training process is finished and the quality of their submitted model update is measured. On the other side, it is also hard for the mobile devices to know exactly the real reward they can get from the edge server before they submit their local updates.
To address these two challenges, we design mechanism MaxQ for the maximization of HFL model quality, which consists of two stages. In the first stage, we determine the association relationship between the devices and edge servers based on many-to-one matching game theory [19]. In the second stage, we address the issue of information asymmetry between the two parties based on contract theory, in which edge servers can offer different rewards to devices according to their different local model quality. After that, in the process of model aggregation, larger weights are given to model update with higher quality, reducing the influence of updates with lower quality. In this way, MaxQ can select mobile devices with high estimated quality participating in model training, encourage them to contribute high-quality model updates, and thus improve the HFL performance.
The main contributions of this paper can be summarized as follows.
• We define the global model quality maximization problem MQM for HFL, and we prove that its decision problem is NP-complete.
• We develop mechanism MaxQ to maximize the global model quality, which consists of two stages. In the first stage, we design an algorithm based on matching game theory to associate mobile devices with edge servers, which can achieve the stability and 1 2 -approximation ratio. In the second stage, we design an incentive mechanism based on contract theory, the optimality of which is also proved.
• We take thorough simulation experiments and find that MaxQ can effectively accelerate the model convergence speed, while improving the model quality by 8.20% and 7.81%, 10.47% and 11.87%, 10.98% and 11.97% on ResNet18, ResNet50 and AlexNet models respectively, compared with the existing methods FAIR and EHFL. The remainder of this paper is organized as follows. We define the system model and quality maximization problem of HFL in Section 2. The mechanism MaxQ is proposed in Section 3. We investigate its performance through simulations in Section 4. Section 5 briefly introduces the related work, before concluding in Section 6.

II. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, we introduce the system model of hierarchical federated learning and the problem of model quality maximization. The main notations used in this paper are listed in Table 1.

A. HIERARCHICAL FEDERATED LEARNING PROCESS
For hierarchical federated learning, we assume that there is one cloud server, multiple edge servers and many mobile devices. We denote the set of edge servers as M = {m 1 , m 2 , . . . , m M } and the set of mobile devices as N = {n 1 , n 2 , . . . , n N }. Model training is taken in discrete time slots T = {t 1 , t 2 , . . . , t T , . . . } and the model can be updated once in each time slot.
The cloud server first distributes a machine learning model to each edge server. We assume that in iteration t the budget of edge server m i is B t i (i = 1, 2, . . . , M ) for the recruitment of mobile devices. According to the budget and the estimated capacity of different mobile devices, m i can determine the devices N t i participating in the model training process in its region, with reward r t ij paid to the selected device n j in iteration t. Once selected, n j uses its own data set D j with size d j = |D j | to train the model and uploads the update to m i , the quality of which is denoted as q t ij . Due to the computing capacity of mobile devices is limited, it is assumed that each device can participate in the model training process for only one edge server in each iteration. After receiving all local model updates from the selected devices, the edge server aggregates them and obtains a new edge model, which is then published to the selected mobile devices again. This process continues until quality of the edge model achieves a certain threshold, and then the model is uploaded to the cloud server. After receiving all the edge models, the cloud server aggregates them into a new global model, which is distributed to edge servers again. This procedure continues until the quality of the global model satisfies a certain requirement.

B. DEFINITION OF MODEL QUALITY
The size of training data samples and their quality can significantly affect the model quality in federated learning [20]. An intuitive way to measure the model quality is to test its accuracy on real data set, which will cause great overhead because the test has to be taken in each iteration. In this paper, we propose to use a function of the reduction of loss function after each iteration to measure the quality.
We assume that mobile device n j starts the local model training with the loss function loss j (τ − ) in time slot τ − , and updates the local model during time slots, resulting in a loss function loss j (τ ) in the time slot τ . We define According to the size of data samples and corresponding improvement in loss function, the model quality of mobile device n j in iteration t can be defined as where ϕ and υ are weight coefficients [21].

C. PROBLEM FORMULATION
Since the quality of aggregated global model is influenced by the local models, the optimization problem of maximizing the aggregated model quality can be transformed into maximizing the sum of local models quality, which is formulated as the following MQM (Model Quality Maximization) problem.
where X t = {x t ij |m i ∈ M, n j ∈ N t i , t ∈ T } is the matching matrix denoting the association relationship between edge servers and mobile devices, and R t = {r t ij |m i ∈ M, n j ∈ N t i , t ∈ T } is the reward matrix per edge server. Constraint (5) indicates that the reward paid by m i in iteration t cannot exceed its maximum budget B t i , and constraint (6) indicates that each mobile device can only participates in the model training of at most one edge server in each iteration.
For the hardness of MQM, we have the following result shown as Theorem 1, i.e., its corresponding decision problem is NP-complete.
Theorem 1: The decision problem of MQM is NPcomplete.
Proof: The decision problem of MQM can defined as follows. In iteration t, if the model quality of each mobile device can reach the threshold q t , under the constraints that the sum of rewards of each edge server paid to mobile devices cannot exceed its maximum budget and each mobile device can participate in the training process of at most one edge server model.
The decision problem of the multi-dimensional knapsack problem (MKAR) with allocation restrictions is known as an NP-complete problem [22]. An instance of the decision problem for the MKAR is as follows. The item set is defined to maximize the total value of allocated items, and each item can only be allocated to at most one backpack. The total weight of items allocated to each backpack cannot exceed its capacity. For each instance of the decision problem of MKAR, we can reduce it to an instance of the decision problem of MQM. For each item set O and knapsack set B, there is a corresponding mobile device set N and edge server set M, respectively. We assume the reward and model quality of each mobile device is same of different edge servers, i.e., r t ij = r t j and q t ij In this way, each instance of the decision problem of MKAR can be reduced to an instance of the decision problem of MQM in polynomial time. Therefore, the decision problem of MQM is NP-complete.

III. DESIGN OF MECHANISM MaxQ FOR THE MODEL QUALITY MAXIMIZATION
In this section, mechanism MaxQ for the maximization of model quality is proposed, which consists of two stages. We realized the association between mobile devices and edge servers based on matching games to make the HFL process feasible in the first stage. Then, based on contract theory, we maximize the quality of the local model of mobile device training associated with each edge server in the second stage, that is, the sum of the quality of the local models of all devices. Finally, in order to improve the quality of the global model, we use the model quality as the weight of the model aggregation.

A. MATCHING GAME FOR ASSOCIATING MOBILE DEVICES WITH EDGE SERVERS
We use matching game theory to associate proper mobile devices with the edge server. Because each mobile device can participate in the model training of only one edge server, but for the server there may be more than one devices associated with, we construct an one to many matching game between the server and devices.
First, we estimate the model quality of different mobile devices. Assuming mobile device n j participates in local model training in iteration t 0 , t 1 , . . . , t k , we estimate its model qualityq Due to the model quality may change with time and the quality measured in a more recent time slot may reflect the change trend more accurately, an exponential forgetting function is used to allocate different weights to quality values obtained in different slots, providing greater weights for the more recent model quality records and smaller weights for the earlier ones. The weight of the latest model quality is set as 1, and the weights of other model quality are calculated according to its relative time distance to the current slot. Weights allocated is the forgetting factor. For iteration t k+1 , the estimated model quality of n j is calculated aŝ We define the problem of maximizing the estimated model quality as We use matching game theory to solve this problem. In iteration t, we define the matching matrix between edge servers and mobile devices as where x t ij = 1 indicates that mobile device n j is allocated to edge server m i , and the two parties form a matching. When the server and the device can not improve their revenue through establishing a new matching, the matching is stable. m i requires N t i mobile devices and the possible reward values it can pay are (r t i1 , r t i2 , . . . , r t ij is the random variable denoting the reward of m i for the device n j and P{R t ij = r t ik } = p t ik , where N t i k=1 p t ij = 1. The real reward value is unknown before the model training process is accomplished. The profit function of the device n j associated with m i is defined as where E(R t ij ) is the mathematical expectation of R t ij , and c is the cost for processing every piece of data sample in the model training process. According to u t j (m i ), we can establish a priority list of edge servers for the device n j as, According to v t i (n j ), we can also establish a priority list of mobile devices for m i as, . We design the matching algorithm between edge servers and mobile devices as shown in Algorithm 1. Firstly, we VOLUME 10, 2022 calculate the profit function of mobile devices and edge servers, and then get the sorted list of available mobile devices M t i (N ) and available edge servers N t j (M). We can get the priority list of mobile devices K i (N ) = {n i 1 , n i 2 , . . . , n i N t i } for the edge server m i , and the priority list of edge servers R j (M) = N t j (M) for the mobile device n j . The matching is successful when n j is in the priority list of mobile devices K j 1 (N ) for m j 1 . Then the priority lists of mobile devices for all edge servers are updated, i.e., device n j is removed. When K i (N ) becomes empty, m i is removed from the priority lists of edge servers for all mobile devices. If there is no change in K i (N ) before and after two matches, the best mobile device n i 1 in K i (N ) is removed. When n j is not in the priority list of mobile devices K j 1 (N ) for the edge server m j 1 , the edge server m j 1 is removed from R j (M) and then n j is associated with m j 2 . The procedure above continues until the matching game is successful or R j (M) is empty.
For the performance of Algorithm 1, we have the following results.
Theorem 2: The time complexity of Algorithm 1 is O(MN 2 ).
Proof: In lines 1 to 2 and in the loop in lines 3 to 4, there is only one level of loops, which loop M and N times, respectively. In the worst case, each match is unsuccessful, then the while loop at lines 5 to 19 executes 2N loops. In the while loop, lines 5 and 10 are double loops that execute MN times in the worst case. In lines 11 to 19 is also a double loop, which executes MN times in the worst case. Since the number of edge servers is smaller than the number of mobile devices, i.e., M < N . Therefore, in the worst case, Algorithm 1 is executed 2MN 2 times. All above, The time complexity of Algorithm 1 is O(MN 2 ).
Theorem 3: The matching game in Algorithm 1 is able to achieve stability.
Proof: For the mobile device n j that has not form a matching relationship after l matches, we assume mobile devices that have already finished matching with the edge server m i are n i 1 , n i 2 , . . . , if the edge server m i is not willing to change the existing matching relationship to establish a new matching relationship with the mobile device n j .
In the l-th round of matching, if n j that establishes matching relationship with m i , m i is the optimal choice of n j in that round. As a result, n j is not willing to change the existing matching relationship.
In summary, for edge servers and mobile devices that complete the matching process, both parties will not change the existing matching relationship, so the matching can achieve stability.
Theorem 4: The approximation ratio of Algorithm 1 is 1 2 . Proof: In the iteration t, we assume that U t is the set of mobile devices that have not been allocated, and M t U is the set of edge servers that can afford the reward requirement of at least one of the mobile devices in U t . Q t (U t ) is the sum of the model quality of all mobile devices in U t , and Q t is the sum of the model quality of all allocated mobile devices. In each edge server, since the allocated mobile devices satisfy the half-full property, we have Q t (U t ) ≤ Q t .
For any feasible solution, we assume that U t ⊂ U t is set of assigned mobile devices and Q t (U t ) is sum of the model quality of these mobile devices. Among all solutions including the optimal solution, since mobile devices in U t can only be allocated to the edge server in M t U , there is Q t (U t ) ≤ Q t (U t ). Therefore, the maximum model quality that can be obtained by mobile devices in the allocation is i.e. Q t ≥ 1 2 Q t opt , so Algorithm 1 has a 1 2 -approximation ratio. for each n j ∈ N do 10: for each m i ∈ R for each n j ∈ N do 22: Delete m i in R Delete the best mobile device in K

B. OPTIMAL CONTRACT DESIGN BETWEEN MOBILE DEVICES AND EDGE SERVERS
After the association relationship between mobile devices and edge servers is determined, edge servers can provide devices appropriate reward contacts to motivate them improving the quality of their local trained models.

1) UTILITY OF EDGE SERVERS AND MOBILE DEVICES
In iteration t, the size of data samples of each mobile device in the set N t i matched with the edge server m i can be listed in ascending order, d t i1 < d t i2 < · · · < d t iN t i . Due to information asymmetry, m i does not know the accurate size of data samples of mobile device n j . It only knows the probability p t ij that n j having the samples D t ij , and n j ∈N t i p t ij = 1. Edge servers should provide different contracts for devices with different data samples. We denote the contract provided by m i to n j as , where d t ij is the size of data samples for n j training the model, q t ij is the model quality, r t ij (θ t ij , d t ij ) is the reward paid by m i , and r t ij (θ, d) is a monotonically increasing concave function of θ and d. The higher the model quality is, the higher the reward will be provided.
With contract (r t ij (θ t ij , d t ij ), d t ij ), the utility of the edge server m i with which n j is associated is where ω > 0 denotes the normalized parameter between model quality and rewards. Under the constraint that the payment of the edge server can not exceed its maximum budget, its utility maximization problem with the incentive of mobile devices training a high-quality model can be defined as The utility of the mobile device n j is defined as where c denotes the cost of the mobile device for per piece of data sample, and d t ij is the size of data samples. When the mobile device n j does not participate in the model training, its reward is 0.

2) OPTIMAL CONTRACT DESIGN
To ensure the proposed contracts are feasible, each contract must satisfy individual rationality and incentive compatibility.
Definition 1 (Individual Rationality (IR)): The utility of the mobile devices n j participating in the model training associated with m i model is not less than 0, that is We can define the optimal contract design problem for the edge server m i as follows.
where (20) Based on equations (24) and (25), we have Adequacy: If r t ij ≥ r t ik , from equation (27) we know that (26) and ) is a monotonically increasing function about θ and d.
And r t ij ≥ r t ik , that is Therefore, based on equations (28) and (29), we have θ t ij ≥ θ t ik . Theorem 6 (Reduce IR): When the mobile device with the smallest size of data samples satisfies the IR constraint, i.e., r t i1 − cd t i1 ≥ 0, then the other mobile devices also satisfy the IR constraint. VOLUME 10, 2022 Proof: For feasible contracts (r t ij (θ t ij , d t ij ), d t ij ), based on IC constraint, we have Based on Lamma 1 and Theorem 5, d) is a monotonically increasing function about θ and d, so Therefore, if r t i1 (θ t i1 , d t i1 ) − cd t i1 ≥ 0, then the other mobile devices also satisfy the IR constraint.
To reduce the IC constraint, we use the following definitions as [23].
Definition 3: For IC constraints between mobile devices n j and mobile devices n k participating in the training of the edge server m i model:

, IC is called local upward incentive constraint (LUIC). Theorem 7 (Reduce IC): According to the monotonicity, the IC constraints can be reduced into local downward incentive constraints (LDIC) and local upward incentive constraints (LUIC).
Proof: We assume there are three mobile devices n i(j+1) , n ij , n i(j−1) participating in model training for the edge server . Based on IC constraint, we have For all mobile devices participating in model training associated with the edge server m i , we have Thus, they all satisfy the DIC and LDIC constraints. Similarly, it can also be proved that the mobile devices participating in model training for edge server m i satisfy the UIC and LUIC constraints.
The IC constraint can be reduced to According to Theorems 5, 6 and 7, the optimization problem (19) is reduced and relaxed to This is a convex optimization problem, and the optimal contract (r t ij , d t ij ) can be obtained using convex optimization tools. With the reward r t ij from edge server m i to mobile device n j , d t ij data samples are contributed by n j and the edge model quality of m i is maximized, resulting in the optimization of the global model quality.

C. MODEL AGGREGATION BASED ON MODEL QUALITY
In iteration t, each associated mobile device uploads its local model parameters w t ij to the edge server m i , and m i aggregates them to update the edge model parameters w t i . Different with the existing methods, we consider not only the size of training data set of each node, but also the quality of local models, Similarly, the global aggregated model is updated as We improve the model training process of HFL involving our proposed incentive mechanism based on matching game and contract theory, as well as the new model aggregation approach, which is shown as Algorithm 2.

IV. PERFORMANCE EVALUATION
In this section, we investigate the performance of our proposed MaxQ through simulations.

A. SIMULATION ENVIRONMENT SETTINGS
We consider a hierarchical federated learning scenario consisting of 60 mobile devices, 5 edge servers and a cloud server. It is assumed that each mobile device can train model for an edge server and each edge server can associate with up to 10 mobile devices participating in its model training. We assume that each mobile device arrives or leaves randomly. For the model training an image classification task is considered, on standard datasets CIFAR10 [24] with ResNet18 and ResNet50 models, as well as MNIST [25] with AlexNet model. The possible reward from the edge server to each mobile device lies in [5,50]. We compare the performance of MaxQ with the existing algorithms FAIR [20] and EHFL [26], the former using quality-based reverse auction for node selection in FL and the latter considering optimization for device association with edge servers in HFL. According to algorithm 1, the set of mobile devices training model for each edge server m i as N t i is obtained. 3: for each m i ∈ M do 4: Based on contract theory, the edge server m i encourages mobile devices in N t i to select training data set for model training. 5: for each n j ∈ N t i do 6: The mobile device n j performs local model training to obtain local model parameter w t ij 7: end for 8: if t mod l = 0 then 9: The edge server m i performs the edge aggregation model w t i according to equation (41) 10: end if 11: end for 12: if t mod kl = 0 then 13: The cloud server performs the edge aggregation model w t according to equation (42) 14:   Figure 2 that the accuracy of MaxQ is higher than FAIR and EHFL after the same number of iterations. After convergence, the model accuracy achieved in MaxQ is 80.18%, 80.60% and 95.03%, while that in FAIR and EHFL is 73.56% and 77.13%, 78.31% and 78.10%, 92.35% and 92.96%, respectively. Also, it can be found in Figure 3 that the loss of MaxQ is lower after the same number of iterations. This is due to the effectiveness of proposed incentive mechanism, which encourages mobile devices provide high quality data and the model quality can be improved, resulting in faster convergence speed.

C. ANALYSIS OF THE MODEL QUALITY
We analyse the obtained model quality under different mechanisms. It can be observed from Figure 4 that the quality of MaxQ is higher than FAIR and EHFL, with the improvement 8.20% and 7.81%, 10.47% and 11.87%, 10.98% and 11.97%, respectively. This is due to the fact that MaxQ selects mobile devices with high quality models for each edge server based on matching game and contract theory. In FAIR, only the selection of mobile devices is optimized, but the selection of edge servers is not considered. In EHFL, edge servers select mobile devices with lower energy consumption during training and the ones with smaller datasets may be selected, which can not improve the model quality.

D. ANALYSIS OF TRAINING DELAY
We also investigate the training delay under our proposed method. The computing delay of local models is calculated as t comp = cD f , where c is the number of CPU cycles per bit to process a piece of data sample (set as 20 cycles/bit in the simulation), D is the size of data samples and f is CPU cycle frequency (set as 1 GHz). Figure 5 demonstrates the average computing delay of local models on mobile devices under ResNet18, ResNet50 and AlexNet, respectively. It can be found that the average delay of MaxQ and EHFL is better than that of FAIR. Compared with the traditional two-layer FL, HFL can improve the training delay because the quality of edge models is usually better than that of local models on devices.

E. EFFECTIVENESS OF MaxQ WITH THE EXISTENCE OF SELFISH NODES
To analyse the effectiveness of MaxQ tolerating selfish nodes, we introduce certain number of selfish nodes which uploads randomly generated model parameters for edge model aggregation instead of actual training. As the number of selfish nodes increases, the quality of the aggregated model will decrease, as shown in Figure 6. It can also be found that the global model accuracy of MaxQ is higher than that of FAIR and EHFL, with an average improvement of 18.10% and 64.95%, 17.62% and 63.74%, and 21.22% and 82.35%, respectively. The performance of EHFL is worse than MaxQ and FAIR because the latter two methods both consider the model quality as weight coefficients in the process of aggregation, while it only simply takes the average of uploaded parameters. MaxQ outperforms FAIR because in FAIR model aggregation is taken only once on the parameter server in each global iteration, while in MaxQ the aggregation is also taken on the edge servers, which reduces the influence of node selfishness.

V. RELATED WORK
Google proposed federated learning (FL) in 2016 [5]. However, in cloud-based FL the communication efficiency between cloud servers and mobile devices may be low, while in edge-based FL the number of participating mobile devices may be limited. Therefore, Liu et al. proposed cloud-edgeclient based hierarchical federated learning [10].
Compared with FL, HFL can significantly reduce the communication overhead between mobile devices and the parameter server [27]. In literature, Wang et al. [28] proposed HF-SGD model with multi-level parameters aggregation. Xu et al. [26] proposed a device allocation strategy according to energy consumption of different devices. However, they do not consider how to design effective incentive mechanisms.
To encourage mobile devices to train high quality models in HFL, some researchers use the size of training data samples to measure the contribution of mobile devices [29], [30]. By combining game theory with deep reinforcement learning, Zhan et al. [30] proposed an incentive mechanism     for FL. They formulated the interaction between the parameter server and clients as a Stackelberg game. Pandey et al. [31] developed an incentive mechanism which is a value-based compensation strategy and the reward to mobile devices is proportional to the level of participation in FL. A two-stage Stackelberg game approach is used to solve the initial optimization problem to maximize the benefits of both parties, and an admission control scheme is introduced for clients to ensure a certain accuracy level. Deng et al. [20] investigated the learning quality maximization problem in FL based on auction theory. However, the above work does not address the issue of information asymmetry between edge servers and mobile devices.
In [32], an incentive mechanism based on contract theory was proposed. Among the existing work where contract theory is used, Kang et al. [33], [34] proposed to recruit devices with high-quality data to participate in FL. Reference [34] proposed an effective incentive mechanism combining reputation with contract theory, which encourages high reputation mobile devices with high-quality data to participate in model learning. Ding et al. [35] introduced a multi-dimensional contract design method for the context where users' multidimensional private information was considered. They also investigated the impact of the information asymmetry level on the server's optimal policy design. Li et al. [36] transformed the incentive data holder problem into the utility maximization problem and established an incentive mechanism based on the contract theory. Contract theory based incentive mechanism design is also investigated in the related research area, e.g., mobile crowd sensing [18]. However, little existing work has considered the dynamics of the association relationship between mobile devices and edge servers, which is different from our work. We introduce contract theory on the basis of [37] to encourage mobile devices to train high-quality models, after the association relationship is determined.

VI. CONCLUSION
To optimize the model training quality in hierarchical federated learning, we investigate the problem of global model maximization, the decision problem of which is proved NP-complete. We propose an mechanism MaxQ which consists of two stages. In the first stage, we associate mobile devices with proper edge servers based on matching game theory, which is proved able to achieve the stability and that the approximation ratio is 1 2 . In the second stage, we propose an incentive mechanism based on contract theory to improve the quality of models submitted by mobile devices to edge servers. Through thorough experiments, we analysed the performance of MaxQ under different deep learning models ResNet18, ResNet50 and AlexNet individually, compared with the existing approaches FAIR and EHFL. It is found that MaxQ can improve the model quality by 8.20% and 7.81%, 10.47% and 11.87%, 10.98% and 11.97% with different learning models, respectively.