Pilot Contamination Mitigation in Massive MIMO Cloud Radio Access Networks

Massive multiple-input multiple-output (MIMO) technology is expected to achieve significant gains in both signal-to-noise-plus-interference ratio (SINR) and throughput for 5G cellular wireless networks. Efficient and highly accurate channel state information (CSI) acquisition at the base stations (BS) is essential to achieve the potential benefits of massive MIMO systems. However, the inadequate number of orthogonal pilot sequences used for CSI estimation leads to erroneous channel estimation as it causes interference between pilot sequences. This phenomenon is coined pilot contamination and severely limits the system performance. Therefore, we address in this paper the pilot contamination problem in massive MIMO Cloud-Radio Access Networks systems. Leveraging on a real telecom operator data delivered by Call Detail Records (CDR), our objective is to maximize the average uplink achievable rate by reducing the effect of pilot contamination in massive MIMO Cloud-Radio Access Networks (C-RAN). As such a problem is a non-linear integer problem that has no solution in polynomial time, we develop a two-stage solution to solve it. First, a coalition game is proposed where Remote Radio Heads (RRHs) gather into clusters with random user-pilot allocation. Then, two greedy heuristics are applied to match each user in the cluster with a given pilot. The goal is to further improve the average uplink rate achieved in the first stage. Simulation results show that our heuristic solutions for pilot contamination mitigation outperform the traditional pilot allocation solution and a state-of-the-art pilot allocation scheme based on large-scale fading in terms of average uplink achievable rate and SINR.


I. INTRODUCTION
The fifth generation (5G) wireless communication networks are expected to fulfill the demand for higher data rates. Therefore, many technologies have been proposed to improve the communication rates of 5G [1]: among them, massive multiple-input multiple-output (MIMO) is considered as a promising solution [2]. Massive MIMO technologies is an extension of MIMO which essentially groups together antennas at the transmitter and receiver to provide better throughput and spectrum efficiency. Through the use of spatial multiplexing, massive MIMO technologies allow the transmission of multiple parallel data streams over the same time-frequency resources. This leads to increase the capacity and data rate without the need for more spectrum or more base stations (BS). In this architecture, the conventional BS is equipped with hundreds of antennas, which are processed coherently to increase both the signal quality and the data rate on the uplink and downlink [3]. As the number of antenna increases in a massive MIMO system, radiated beams become narrower and spatially focused toward the user. These beam antennas help focus energy into a smaller region of space, leading to increase in the desired user throughput while reducing the interference to their neighboring.
For signal detection and precoding, massive MIMO relies on channel state information (CSI). CSI is the information of the state of the communication link from the transmitter VOLUME 1, 2022 to the receiver. It represents the combined effect of fading, scattering, and power decay with distance. With perfect CSI, the performance of massive MIMO can grow linearly with the number of transmitting or receiving antennas [4]. For conventional channel estimation in time division duplex (TDD) massive MIMO systems, the base station can estimate the downlink channel with the help of channel reciprocitybased operation. During uplink, users in every cell send their pilot sequences to their serving base station. Based on these pilot signals, the base station is able to estimate the uplink channels by using the received pilot data [5]. This estimated CSI is then used in signal detection on uplink and in precoding on downlink.
Thus, the accuracy of channel estimation is one of the key factors that significantly influences the performance of massive MIMO systems. In pilot-based channel estimation, due to the limitation of pilot resources, the same pilot sequences are reused by various users in different cells. This causes higher inter-cell interference that seriously affects the channel estimation accuracy at the BS and consequently hinders the performance of massive MIMO. This effect is called pilot contamination (PC) and cannot be eliminated even when the number of BS antennas goes to infinity [6]. Many studies have been proposed to mitigate the impact of pilot contamination. However, most of these research are based on pilot allocation algorithms, where the purpose is to find the best user pilot allocation (UPA) that minimizes the nocive effect of pilot contamination. Thus, such solutions suffer from high complexity, especially when the number of users in the network is high. Another approach is to have recourse to clustering methods. In this situation, the base stations (BSs) are grouped into clusters and managed by a central unit. This solution reduces the impact of PC in any cluster of cells as their serviced users are allocated orthogonal pilots. Hence, PC is totally eliminated within the same cluster. The concept of BS clustering can be achieved in the Cloud-Radio Access Networks (C-RAN), where the conventional base station is broken down into a Remote Radio Head (RRH) and a Base Band Unit (BBU). While the BBUs are pooled in a cloud data center, the RRHs are distributed across multiple sites. Moreover, the RRHs are connected to the BBUs via highperformance optical fronthaul links. This separation allows the RRHs to be grouped together and form clusters. In this paper, we investigate the problem of pilot contamination for massive MIMO systems for C-RAN. Our proposed scheme aims to maximize the average uplink achievable rate by reducing the effect of pilot contamination. As such a problem is an integer non-linear programming problem which has no solution in polynomial time, we develop a two-stage solution to solve it. The first stage is casted as a coalition game, where the RRHs organize themselves into clusters. Since the RRHs belong to the same cluster shared the same pool of orthogonal pilots, this minimizes the impact of PC by reducing the number of disjoint RRHs. Note that, at this stage, pilots that are available for the cluster are randomly allocated to users. Then, two greedy heuristics are applied to match each user in the cluster with a given pilot. The goal is to further improve the total average uplink rate achieved in the first stage.

II. RELATED WORK
Since pilot contamination (PC) problem has a serious impact on the performance of massive MIMO systems, many solutions have been proposed to curtail its effect. The works in [7], [8] proposed a time-shifted pilot method to reduce the effect of PC. The idea is to divide cells into several clusters, where the cells in each cluster transmit pilots in different time slots. The proposed scheme ensures that there is not PC among users from different clusters. However, these studies only considers PC between clusters (inter-cell interference) while intra-cell interference is totally ignored. The authors in [9], [10], and [11] used the Fractional Pilot Reuse method (FPR) to reduce PC. Generally, this method can be classified into two categories. The first divides the cells into two groups. The cells in the same group are assigned with the same pilot sequences, while those in different cells are assigned with the orthogonal pilot sequences to mitigate PC [9], [10]. The second category divides the users into two groups according to PC levels, namely center users who suffer from modest PC and edge users who suffer from severe PC. A cell-center pilot group is reused for all cell-center users, while a cell-edge pilot group is applied for edge users in adjacent cells [11]. This method improves the quality of service (QoS) of edge users at the cost of a slight rate loss of center users. Note that, all mentioned works assign the pilot sequences randomly to users.
Power control is also a promising method to reduce PC. The authors in [12] proposed a power control that consists in splitting coherent time into two parts and sends pilots in different time slots. In [13], the transmit power is controlled where user groups with significant cross gains choose different transmission time slots, or the transmit power of users assigned with identical pilot sequences is reduced. However, this scheme needs a control mechanism to ensure the pilot sequences are synchronous in adjacent cells. In [14], the authors developed techniques based on existing long-term evolution (LTE) measurements -open loop power control (OLPC) and pilot sequence reuse schemes that avoid PC within a group of cells. Furthermore, the authors in [15] mitigated pilot contamination by optimizing the pilot power of each user, while both pilot power and data power are jointly optimized in [16]. However, when the number of BS antennas increases the power control method becomes inefficient.
Various pilot allocation schemes have been developed to reduce the effect of PC. A Smart Pilot Assignment (SPA) scheme is proposed in [17] to improve the performance of users with severe PC. Using the large-scale characteristics of fading channels, the BS firstly measures the inter-cell interference of each pilot sequence caused by the users with the same pilot sequence in other adjacent cells. Then, the proposed SPA method assigns the pilot sequence with the smallest inter-cell interference to the user having the worst channel quality in a sequential way to improve its performance. The authors in [18] proposed an Adaptive Pilot Allocation (APA) scheme to mitigate the effect of PC. The objective is to improve the total achievable throughput in the network. As the users with the smallest large-scale fading achieve the lowest user throughput, the proposed algorithm tries to improve their throughput by assigning them unique orthogonal pilot sequences. The proposed scheme start by sorting the users in ascending order according to their largescale fading. Then, it computes the number of unique orthogonal pilots, denoted by P , that can be assigned to the first P users with the lowest large-scale fading. After that, these P users are assigned orthogonal pilot sequences that are not re-used by any other user in the system, which significantly reduces the inter-cell interference and improves the total achievable throughput in the network. To reduce the effect of PC, the authors in [19] and [20] develop an algorithm for pilot sequences allocation. A heuristic solution for pilot allocation was formulated in [19]. The aim is to maximize the target cells achievable sum rate. To solve the latter, the users in the network are divided into two groups: edge and center users. The orthogonal pilots are allocated to edge users in the different cells, whereas center users share the available pilots. In [20], the authors present two game-theoretic approaches to reduce PC, where the players are the BSs. The first game is a non-selfish game as the payoff function of the player takes into account both suffered interference and caused interference. In the second one, the players are selfish and their payoff function only considers the suffered interference.
The negative impact of the imperfect channel state information (ICSI) on the performance of massive MIMO systems has been addressed in [21], [22], [21]. The authors in [21] modeled and analyzed the impact of random radio-frequency (RF) mismatches on the performance of linear precoding in a TDD multi-user massive MIMO system. By considering the channel estimation error, they used the truncated gaussian distribution to model the RF mismatch. They also derived closed-form expressions of the output signal-to-interferenceplus-noise ratio for maximum ratio transmission and zeroforcing precoders. Simulation results showed the critical impact of erroneous channel estimation on the performance of massive MIMO systems. In [22], the authors developed a novel closed-form uplink and downlink spectral efficiency (SE) expressions that take imperfect channel estimation into account. This study used statistical channel cooperation power control (SCCPC) to mitigate inter-user interference. A novel channel prediction framework that integrates the imperfect channel estimation of the massive MIMO-OFDM into the deep neural network scheme is proposed in [23]. Numerical results proved that a deep neural network is an efficient method for the imperfect channel estimation in massive MIMO-OFDM systems. It outperformed the conventional least square method based on interpolation and achieved higher channel estimation accuracy.
Game theory has been more widely used to mitigate PC. In [24], a distributed algorithm based on a coalition game was proposed for pilot allocation between cells. Instead of modeling the user pilot allocation (UPA) problem, the authors propose to divide equally the orthogonal pilots between different cells. The goal is to maximize the average spectral efficiency of each cell. Thus, each cell, that has its own pool of orthogonal pilots, searches to form clusters with other cells to gain access to more pilots and serve more users. In [25], users are considered as players where the purpose is to find the best partition to cluster the players to maximize network performance. The idea is that users in the same coalition share the same pilot. The algorithm starts from random clustering. Then, each user will move to the coalition that improved its performance. At the end, the game will reach a stable partition, which means no user can not find a better coalition to switch to. However, this study focused on user clustering, which increases the complexity of finding the best user partition, especially that the number of players in massive MIMO systems is expected to be enormous. Unlike all the aforementioned works, our study benefits from the physical separation between the BBUs and the RRHs to mitigate the effect of pilot contamination on the performance of massive MIMO systems. The originality of our work is the integration of massive MIMO technologies in the C-RAN context, where the RRHs are grouped into clusters that shared the same pool of orthogonal pilots. This leads to fully eliminating intra-cell interference and reducing the effect of inter-cell interference through dynamic clustering formation. In addition, our approach takes into account the variation of traffic load conditions delivered by the Calls Detail Records (CDR) of a real telecom operator. The main contributions of our work can be summarized as follows: • We investigate the pilot contamination (PC) problem for time division duplex (TDD) massive MIMO systems in the C-RAN architecture. Our objective is to maximize the average uplink achievable rate by reducing the impact of pilot contamination. • We prove that the optimal user-pilot allocation problem that reduces the effect of PC while maximizing the average uplink achievable rate is a NP-hard problem. • To reduce the high computational complexity of such a problem, we develop a two-stage solution to mitigate the effect of PC. First, a coalition game is proposed to reduce PC. Thus, the RRHs organize themselves into clusters where the users within the same cluster share orthogonal pilot sequences. At this stage, the RRHs are considered as players which reduces the complexity of finding the best partition in comparison with [25] (i.e., the number of RRHs to be clustered is smaller than the number of users). Further, the pilots are randomly allocated to the users in each cluster. According to the output of the first stage, two heuristic solutions, namely Greedy and ϵ-Greedy algorithms, are applied in each cluster to improve the total average achievable uplink rate by choosing to which user a given pilot should be allocated. VOLUME 1, 2022 • Our solution fully eliminates the intra-cell interference, where the users within the same cluster are allocated orthogonal pilot sequences, and reduces the effect of inter-cell interference by finding the best user pilot allocation (UPA). • To reduce the signaling overhead, our approach can be executed in two different times-scales: the large time scale, which corresponds to each hour of the day, and the small time scale that represents the user arrivals. More precisely, our two heuristic solutions, Greedy and ϵ-Greedy algorithms, are applied at each user arrival to immediately assign it the best free pilot, while the clustering stage can be executed sequentially at each hour of the day to enhance the total network performance expressed in term of the uplink achievable rate. • For a realistic scenario, we evaluate our proposed solution based on a real Calls Detail Records (CDR) dataset and study its complexity.
The rest of this paper is organized as follows. Section III describes the system model. In Section IV, we formulate the problem of user pilot allocation (UPA) and study its complexity. In Section V, the devised pilot mitigation algorithm based on a coalition game is presented. Our heuristic solutions for UPA are introduced in Section VI. Section VII describes the implementation of our solution for pilot contamination (PC) mitigation. Simulation results are presented in Section VIII. Finally, concluding remarks are provided in Section IX.

III. SYSTEM MODEL
Consider the uplink of time division duplex (TDD) massive MIMO systems in the Cloud-Radio Access Networks (C-RAN) as shown in Fig. 1. We deem by R = {r 1 , r 2 , · · · , r R }, and B = {b 1 , b 2 , · · · , b B } the sets of R RRHs and B BBUs respectively. In this work, we assume that RRH r is equipped with M antennas and can be associated with at most one BBU. Each RRH can cover an area with radius r c . Since the BBUs are physically separated from the RRHs in the C-RAN architecture, many RRHs can be grouped into one BBU and form a single cluster. Users that belong to a single cluster (i.e., are served by the same BBU) equally share the total bandwidth allocated to the BBU, deemed by W b . We denote by U = {u 1 , u 2 , · · · , u U } the set of U active users that are uniformly distributed in the network. User u is equipped with a single antenna and is associated to its RRH r according to the Call Detail Records (CDR) presented in [26]. This is a data structure that contains spatial and temporal data information about users association. More precisely, for each user, we know the user identity, as well as the user-RRH association as a function of date and time of the day. Let U r denote the set of users served by RRH r, where U r << M [27], [28].
We denote by β u rr ′ the large-scale fading between user u attached to RRH r and interfering RRH r ′ . It depends on both the path loss and shadow fading and can be expressed where γ u rr ′ is the shadow fading, d u rr ′ is the distance between user u attached to RRH r and RRH r ′ , and α is the path loss exponent. As the distance between user u and RRH r is much larger than the distance between the antenna elements, we assume that β u rr ′ is independent of the antenna index. Furthermore, we denote by g u rr ′ the small-scale fading between user u attached to RRH r and RRH r ′ . The smallscale fading is assumed to be statistically independent for all users and exponentially distributed with unit mean.
Let h u rr ′ ∈ C M ×1 represent the channel gain between RRH r ′ and user u attached to RRH r. It depends on largescale fading and small-scale fading and can be expressed as [30]: In time division duplex (TDD) mode, data transmission is divided into coherence frames, as shown in Fig. 2. Coherence frames depend on both the coherence time T c and the coherence bandwidth W c . In each frame, the channel between user u and its serving RRH r has a constant channel response. Consequently, each frame contains S = T c · W c transmission symbols [31], [32]. Therefore, the channel coherence frame limits the length of the TDD frames. Based on the concept of reciprocity between the uplink and downlink channels, a TDD massive MIMO C-RAN system works in three phases during each coherence frame. In the first phase, all users in all cells transmit their pilot sequences synchronously to their corresponding RRHs. Based on the pilot sequences, the RRHs estimate their uplink channel matrices. Then, the users transmit their uplink data symbols, which are processed at the RRHs by utilizing the channel estimations. In the last phase, based on channel reciprocity in TDD massive MIMO mode, each RRH precodes the downlink data according to the estimations obtained in the first phase and then transmits the precoded data symbols to its users.

A. CHANNEL ESTIMATION PHASE
In this phase, all users simultaneously send pilot sequences to their corresponding RRHs to estimate the propagation channels between users and RRHs, and subsequently detect the transmitted user data. In time division duplex (TDD) mode, the length of pilot training sequences is proportional to the number of active users rather than that of RRH antennas. Assuming that τ p symbols out of S are allocated for pilot signaling, then the remaining S−τ p symbols are used for data transmission. The τ p symbols allow only τ p orthogonal pilot sequences. More precisely, only τ p users in the entire system can transmit pilots without interfering with other users. Thus, in massive MIMO C-RAN systems, where the number of users is expected to be large, only users belonging to the same cluster, managed by the same BBU, are guaranteed to have orthogonal pilot sequences. Hence, the same set of pilot sequences are reused for users in other clusters, making the estimations of one RRH contaminated with channel components of users serviced by other RRHs [27] in different clusters. This effect, known as pilot contamination (PC), curbs the performance of massive MIMO systems for both uplink and downlink.
We denote by Ψ r = [Ψ 1 r , · · · , Ψ Ur r ] T of dimension U r × τ p , the pilot matrix that holds orthogonal pilot sequences assigned to the U r users served by RRH r and satisfying Ψ r × Ψ H r = I Ur τ p . Accordingly, we deem by Y r ∈ C M ×τp the received signal at RRH r. It can be expressed as: where p p denotes the pilot transmit power, and Z r ∈ C M ×τp is the additive white Gaussian noise (AWGN) defined as CN (0, δ 2 z ). To avoid the intra-cell interference among users in the same cluster, we assume that the number of users in one cluster is less than or equal to the number of pilot sequences in one time frame (i.e., only τ p users in one cluster can be served simultaneously without interfering with each other).
We define the binary variable a p r,u as follows: Each RRH correlates its received pilot signals with its own orthogonal pilot signals while all users in other clusters contribute to PC. Thus, the channel estimation of user u attached to RRH r can be obtained by correlating Y r with (Ψ u r ) * as [29]: where (.) * denotes the complex conjugate, and w u r is the equivalent noise.

B. DATA PHASE
During the data phase, the received signal at RRH r can be expressed as: where p t denotes the uplink data transmit power, x u r ′ is the normalized symbol transmitted by user u associated to RRH r ′ with E x u r ′ 2 = 1, and n r ∈ C M ×1 denotes the AWGN noise vector with E n r n H r = I M .
Using the channel estimate of user u in (5), the maximum ratio (MR) detector is employed by RRH r to separate the data transmitted by user u as [33]: In (7), the first term denotes the desired signal component, the second term represents the intra-cell interference, the third term is PC, the fourth term denotes the inter-cell interference, and the last term presents the uncorrelated interference and VOLUME 1, 2022 noise which decreases substantially by adding more BS antennas and goes to zero when the number of BS antennas is infinite [34], [35].
According to (7), the uplink SINR of user u in RRH r can be expressed as in (9). However, when the number of BS antennas M goes to infinity, the uplink SINR achieved by user u attached to RRH r is proportional to its largescale fading coefficients [36]. Therefore, (9) amounts to the following: Consequently, the corresponding average uplink achievable rate of user attached to RRH r can be presented as: where µ 0 = τp Tc represents the loss of spectral efficiency caused by transmitting uplink pilot to estimate the channel, which actually is the proportion of the pilot length τ p and the channel coherence time T c . W u represents the channel bandwidth allocated to user u.
It is clear that the channel estimationh u r of user u attached to RRH r is a linear combination of the channels h u ′ rr ′ of users in all cells having the same pilot sequence, which is the cause of the PC problem. It is also clear that the AWGN noise and the small-scale fading coefficients approach zero as the number of BS antennas M goes to infinity. However, the average uplink achievable rate is still limited by PC and cannot increase even if we increase the transmit power p t or p p . In the following, we address the problem of UPA. Our goal is to improve the average uplink achievable rate by reducing the effect of PC. Thus, we develop a two-stage solution to achieve the latter target. In Table 1, we summarize the notation for the system model.

IV. USER PILOT ALLOCATION (UPA) PROBLEM
As shown in (10), the average uplink achievable rate of user u attached to RRH r can be improved by reducing the effect of pilot contamination (PC) expressed in the denominator of SINR in (10). This is done by suitably assigning the available pilots to the users. Thus, our optimization problem (P) consists in finding the optimal user pilot allocation (UPA) that maximizes the total average uplink achievable rate. Accordingly, problem (P) can be written as follows: subject to a p r,u ̸ = a p r,u ′ , ∀(u, u ′ ) ∈ U r , u ̸ = u ′ (13) Constraints (13) guarantee that users in the same cluster are allocated orthogonal pilot sequences.

A. COMPLEXITY ANALYSIS
The optimization problem presented in (12) can be classified as an NP-Hard problem. We can do this by applying the Graph Coloring problem (whose optimization is a wellknown NP-Hard problem) to our UPA problem. Graph Coloring [37] is the problem of assigning colors to the elements of a graph (edges and vertices) subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Let the vertices of the graph be the pilots, and the edges be the users. Using Graph Coloring approach, the problem of UPA consists in coloring the conflict graph according to this constraint: vertices that are joined by an edge are given different colors (i.e., once two users are adjacent, we should avoid assigning the same pilot to these users). The Graph Coloring problem is known to be NP-complete [38], which means that the optimization problem of UPA (12) must be NP-hard, and it cannot be solved optimally in polynomial time.
Although the optimal solution can be obtained through exhaustive search. However, this requires exploring all possible user pilot associations (UPA) to find an optimal combination of users and pilots that maximize the total average uplink achievable rate, while satisfying all constraints. Consequently, the computational complexity for obtaining the optimal solution using exhaustive search is in O(U ψ ) which becomes intractable when the number of users is large. To apply the UPA algorithm to practical networks and feasibly solve the problem, we develop a two-stage solution to mitigate the effect of PC. First, a coalition game (CG) is proposed to reduce the effect of PC through RRHs clustering. Second, two greedy heuristics are applied in each cluster to improve the total average uplink achievable rate by associating each user properly to a given pilot sequence.

V. PILOT CONTAMINATION MITIGATION AS A COALITION GAME
To reduce the required complexity for finding the optimal solution of (P), we present in this section an approach based on a coalition game theory (CG) to mitigate the effect of PC. Due to the limitation in the number of orthogonal pilot sequences, the same pilot sequences are reused in the different clusters, which amplifies the effect of PC. The idea is that the RRHs organize themselves into clusters in order to reduce the number of disjoint RRHs, where the users within the same cluster share orthogonal pilot sequences. Thus, instead of proposing an algorithm for pilot allocation to decrease the effect of PC, we propose to find the best set of RRHs that can be clustered together such as the number of users in each cluster is less than or equal to the number of orthogonal pilot sequences. Note that, at this stage, the pilots in each cluster are randomly assigned to the serviced users. Theorem 5.1 (Coalition Partition): A coalitional structure S is a partition of R. It is a set of disjoint clusters {S 1 , . . . , S i } where S i represents an agreement between the RRHs to be associated with a single BBU such that:

Theorem 5.2 (Coalition Value):
A coalition value υ(S i ), is a real number that quantifies the total utility that the players (i.e., RRHs) can get from coalition S i . In our study, υ(S i ) represents the total average uplink achievable rate that can be achieved in cluster S i . It can be written as: The coalition formation algorithm which produces the best coalition structure is described in Algorithm 1. Starting from a random initial coalitional structure, such an algorithm explores all possible partitions, given by the Bell number B r , and selects the best one, which maximizes the total average uplink achievable rate presented in (10). As mentioned earlier, in the first stage, we ignore the UPA. Thus, in the second stage, two heuristic solutions, namely Calculate the utility achieved by partition S i ;

5:
If S i is preferred over S * according to (16) 6: Greedy and ϵ-Greedy algorithms, are applied in each coalition (i.e., cluster) resulting from our first stage to improve the total average uplink achievable rate.

VI. HEURISTIC SOLUTIONS FOR USER PILOT ALLOCATION (UPA) PROBLEM
At this stage, the best coalitional structure S is obtained according to the first stage (i.e., RRH coalitions are known). For each coalition S i ⊂ S (i.e., cluster), we know the number of users in each cluster as well as the set of free pilots denoted by P * Si = p * 1 , .., p * |P * S i | (i.e., pilots are not assigned to users in cluster S i ). The two algorithms for user pilot associations (UPA) start by comparing the number of users in each cluster with the number of free pilots devoted to this cluster. If card(P * Si ) is equal to zero (i.e., there is no free pilots in cluster S i ), then the algorithm stops and the pilots remain randomly allocated to users according to the first stage. However, when card(P * Si ) is greater than zero (i.e.,, the total number of pilot sequences allocated to cluster S i is greater than the total number of users belonging to this cluster), one of the two heuristic solutions for UPA can be applied to the clusters formed by our devised coalition game (CG).

A. GREEDY UPA
The purpose of this approach is to maximize the total average uplink achievable rate of the target cluster by reducing the effect of deleterious pilot contamination (i.e., caused by users that are assigned to the same pilot sequence in other clusters). This can be done by finding the best user pilot association (UPA). Given a cluster S i , we deem by U Si the set containing all users belonging to S i , and by P * Si = p * 1 , .., p * |P * S i | the set of free pilots in S i . Each user u ∈ U Si chooses randomly a pilot p * i ∈ P * Si and computes its new uplink achievable rate according to (10). If this allocation enhances user u uplink achievable rate, then the value of the binary variable a p * i Si,u is set to 1 (i.e., pilot p * i in cluster S i is allocated to user u). Moreover, the previous pilot p used by user u in the first stage VOLUME 1, 2022 will be free to be allocated to other users within the same cluster (i.e., a p Si,u = 0). Otherwise, user u keeps its previous pilot. These steps will be repeated for each user in U Si . Note that, at the end of the round, each user u assumes that it got a sub-optimal pilot allocation. We summarize the proposed solution in Algorithm 2. For all users u ∈ U Si

5:
Compute current uplink achievable rate, using current pilot p assigned to user u ∈ U Si from the first stage, according to (10); 6: Choose a new pilot p * i ∈ P * Si ; 7: Compute the new uplink achievable rate using the new UPA; 8: If new uplink achievable rate is greater than current uplink achievable rate The epsilon-Greedy solution, ϵ-Greedy, is another heuristic method employed for UPA. Unlike the Greedy algorithm, for each round, each user u ∈ U Si has probability ϵ to select a random pilot p * i ∈ P * Si from the free set of pilot sequences P * Si , and probability 1 − ϵ to keep its current UPA. At the initial step, each user u selects randomly a number q between 0.0 and 1.0. If the selected q is greater than ϵ, then u chooses a random pilot p * i ∈ P * Si and computes its new uplink achievable rate according to the new allocation. If the new uplink achievable rate is higher than the current uplink achievable rate, then pilot p * i is assigned to user u. Otherwise, user u keeps its current pilot assignment. Contrarily, when q is less than ϵ, the current user u keeps its current pilot assignment. Note that this step is repeated n times for each user u in cluster S i . Algorithm 3 summarizes the ϵ-Greedy solution.
As illustrated in the simulation section, ϵ-Greedy algorithm outperforms the Greedy algorithm even when the number of free pilots increases in a cluster. In fact, when the ϵ-Greedy approach is used for UPA, each user has the opportunity to explore different UPA for n times, to find a suboptimal pilot allocation that improves users uplink achievable rate. However, when the Greedy algorithm is applied for UPA, each user gets only one attempt to explore a new UPA to enhance its performance in term of uplink achievable rate. If q > ϵ 8: Compute current uplink achievable rate, using pilot p assigned to user u ∈ U Si from the first stage, based on (10); 9: Choose a new pilot p * i ∈ P * Si ; 10: Compute the new uplink achievable rate using the new UPA; 11: If new uplink achievable rate is greater than current uplink achievable rate

VII. USER PILOT ALLOCATION IMPLEMENTATION
For the implementation of our solutions (clustering through a coalition game (CG), Greedy, and ϵ-Greedy solutions), the BBUs require the large-scale fading coefficient as well as the pilot contamination (PC) term to calculate the users' average uplink achievable rate. During the uplink transmission in the massive MIMO C-RAN systems, each RRH transmits the pilot signals of their served users to its serving BBU via the fronthaul links. Based on the received pilot signals, the BBU estimates the channel state information (CSI) and can assume large-scale fading between the user and its serving RRH. Moreover, the BBUs are all connected via the X2 interface, which allows them to share and exchange information concerning PC in the network. Thus, the centralized BBUs are able to compute the achievable users average uplink rate according to (10), and efficiently implement our proposed algorithms in a practical scenario.
Furthermore, to reduce the signaling overhead, our algorithms can be executed in two-time scales: the long-time scale, which corresponds to each hour of the day, and the short-time scale which corresponds to the user arrivals. We assume that the users are uniformly distributed in the network. Given a random RRH clustering, each user is assigned with the sub-optimal pilot allocation to Greedy, and ϵ-Greedy heuristic solutions described in (VI), during the short-time scale. As each arrival user will be assigned with the pilot that can reduce the effect of PC, its uplink achievable rate can be enhanced with high probability. In the worst case scenario, where average uplink achievable rate is still limited by PC, the clustering stage solution will be executed during the longtime scale which improves the average uplink achievable rate given by (11). These two time-scales allocations guarantee maximizing the average uplink achievable rate by reducing the effect of PC.

VIII. SIMULATION RESULTS
To show the effectiveness of our solutions to mitigate the effect of PC, we compare them with the traditional approach, where the pilots are randomly allocated to users, and to the Adaptive Pilot Allocation (APA) scheme [18] where unique pilots are assigned to the users with the smallest large-scale fading so that they will not interfere with other users in the system.
The simulation results were obtained using Matlab software on a machine with Intel Core i5, 2.5 GHz Processor and 8 GB RAM. For illustration, a hexagonal cellular topology with 7 cells is considered, where each cell has one RRH equipped with M antennas as presented in Fig. 1 and served U users which are randomly and uniformly distributed inside the cell. The center RRH is surrounded by a ring of 6 immediately adjacent RRHs. All the results are obtained for an average of 1000 simulations and shown with 95% confidence intervals. Note that each cluster, formed by one or many RRHs managed by one BBU, can serve simultaneously up to 30 users (recall that the number of users in one cluster must be less than or equal to the number of orthogonal pilots). The simulation parameters adopted are summarized in Table 2.  In fact, the number of activated BBUs given by both solutions (Random allocation and APA) depend exclusively on the number of active RRHs. More specifically, this number is equivalent to the number of serving RRHs. However, for the same number of users, our heuristic CG solution reduces the number of active BBUs, mainly at low traffic load (at 4:00), in comparison with the two other solutions. By reducing the number of clusters (number of active BBUs), the reuse of non-orthogonal pilots between different clusters decreases. This improves both the average achievable SINR and uplink rate (cf. Fig. 5 and 7) by decreasing the impact of pilot contamination (PC).  According to (10), the SINR achieved by a user depends on the endured interference caused by PC. By applying the APA solution for the pilot allocation, the achievable SINR outperforms that provided by the traditional solution based on random allocation most of day time. By allocating orthogonal pilot sequences to users with the worst SINR, the APA solution eliminates any PC that can affect these users, which improves the total average uplink achievable rate in the network. However, when the number of users increases (at 10:00, 17:00 and 19:00), the remainder orthogonal pilot sequences given by the APA scheme and shared by the rest of users magnifies the effect of PC. This leads to the lowest SINR in comparison with the random allocation.
Besides, when our heuristic CG solution is used for the user pilot allocation (UPA), it outperforms the random pilot allocation solution at each occurrence. In fact, this solution effectively reduces the number of clusters (i.e., number of VOLUME 1, 2022 active BBUs cf. Fig. 4) in comparison with the random pilot allocation solution. This significantly enhances the achievable SINR by reducing PC between clusters. Moreover, when the Greedy and the ϵ-Greedy algorithms are applied to our CG solution, a significant gain can be achieved in term of SINR if the number of available pilot sequences is greater than the number of users in the cluster. Note that, at 4:00 all users are located in one cluster (i.e., one BBUs is activated), and their number (30 users) is exactly equal to the number of orthogonal pilots τ p that can be allocated to one cluster. Thus, the Greedy and the ϵ-Greedy algorithm cannot provide any enhancement to our heuristic CG solution. This is due to the lack of free pilots, which prevents users from exploring a new UPA to improve the achievable SINR. In contrast, at 5:00, there are almost 40 users located in 3 clusters (i.e., 3 BBUs are activated cf. Fig. 4) with an average of 13 users per cluster as shown in Fig. 6. This means that the number of available pilots in each cluster (30 pilots) is much higher than the number of active users in each cluster. Therefore, each user has the opportunity to explore different UPA to improve its SINR. As a result, the Greedy and the ϵ-Greedy algorithms can bring a significant enhancement in term of SINR to our CG solution. Also, whenever there is a large number of free pilots (i.e., pilots not assigned to any users in a cluster) for a small number of users in a cluster (at 5:00), the ϵ-Greedy provides better performance than the Greedy algorithm. We conclude that our heuristic coalition game (CG) solutions for user pilot allocation (UPA) realize a significant improvement in SINR in comparison with the random and Adaptive Pilot Allocation (APA) solutions. This is due to the efficient mitigation of pilot contamination (PC). Besides, when the number of free pilots is greater than the number of users in a cluster, the Greedy and the ϵ-Greedy algorithms can improve the performance of our CG solution. Furthermore, as the number of free pilots increases in a cluster, the ϵ-Greedy algorithm performs better than the Greedy algorithm. Fig. 7 illustrates the average users uplink achievable rate during day time. When the APA is adopted for the UPA solution, it achieves a higher uplink rate than when the random pilot allocation solution is used, mainly at low load (i.e., number of users is below 50). In fact, when the APA is applied for UPA solution, PC on the users with the lowest large-scale fading can be avoided (i.e., they are assigned with unique pilot sequences). As a result, the uplink rate of these users is enhanced, which raises the average users uplink achievable rate in the network. However, when the number of users in the network is high (at 10:00, 17:00, and 19:00), APA solution for UPA can no longer curtail the impact of PC. This is due to the limited number of free pilot sequences shared by users, which leads to lower average uplink rate in comparison with the random pilot allocation solution. Moreover, when our CG approach is adopted, it achieves the highest SINR (cf. Fig. 5) in comparison with the random pilot allocation solution. As a consequence, the average users uplink rate given by our approach surpasses that of the random pilot allocation solution. Furthermore, when the Greedy and ϵ-Greedy algorithm are applied to our CG solution, they reduce the effect of PC, especially when the number of available pilots is much higher than the number of users in a cluster. As a result, they can further enhance the average uplink rate achieved by our CG solution.
To illustrate the significant gain that can be achieved by our approaches for the UPA solution, we present in Fig. 8 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3177629, IEEE Access . Cumulative average users uplink achievable rate and 9 the cumulative SINR and the cumulative uplink rate, respectively. We can see that when our CG solution is used for UPA, the SINR can increase by up to 15% and 24% in comparison with the APA solution and the random pilot allocation solution, respectively at the end of the day. As a matter of fact, our CG solution adapts the number of cluster formation to network load conditions which decreases the reuse of non-orthogonal pilots between different clusters. Consequently, this reduces the impact of PC providing higher achievable SINR compared to the two other solutions (APA solution and the random pilot allocation solution). Moreover, by adopting Greedy or ϵ-Greedy algorithm, the SINR achieved by our heuristic CG solution can be improved by 40% and 54% respectively. As explained in Section VI, the two algorithms assigned the user to the best free pilot in each cluster. This reduces the pilot contamination effect between clusters, enhancing the total achievable SINR at the end of the day. Furthermore, by increasing the achievable SINR, our CG solution can achieve up to 10% and 18% high uplink rate in comparison with the APA solution and the random allocation solution, respectively. In addition, when the Greedy or the ϵ-Greedy algorithms are applied to our CG solution, the achievable uplink rate can be enhanced by 15% and 23%, respectively at the end of the day. 00 07 13 19 Daytime (Hours) In Fig. 10, we show the mean execution time for the Greedy solution, ϵ-Greedy solution, and our heuristic coalition game solution for UPA (CG-Greedy UPA). We can see that when CG-Greedy solution is used for UPA, it takes an average of 6.5ms. Note that, this sequential approach is executed once at each hour of the day (i.e., at long-term scale period). Meanwhile, our two heuristic solutions for UPA (Greedy and ϵ-Greedy solutions) , that are employed at each users arrival, take on average 0.4ms and 0.6ms respectively to be achieved. The results show that our approach has the potential to be applied in real-time traffic that requires low latency in two different time-scale: long-time and short-time scales.

IX. CONCLUSION
In this paper, we have investigated the pilot contamination (PC) problem for time division duplex massive MIMO systems in Cloud-Radio Access Networks (C-RAN) architecture. The aim is to maximize the average uplink achievable rate by reducing the effect of pilot contamination in massive MIMO C-RAN. For a realistic simulation scenario, we utilized the Call Detail Records (CDR) provided by a real network operator. As such a problem is an integer non-linear programming problem which has no solution in polynomial time, we develop a two-stage solution to solve it. First, a coalition game (CG) solution is proposed aiming to reduce the effect of PC by finding the best clustering among clusters. In this phase, the pilots are randomly allocated to users in each cluster. Then, two greedy heuristics, namely Greedy and ϵ-Greedy algorithms, are applied to match each user in the cluster with a given pilot. The goal is to further improve the total average uplink rate achieved in the first stage. To show the effectiveness of our heuristic solutions for pilot contamination reduction, we compared them with the traditional solution where the pilots are randomly allocated to users, as well as to a state-of-the-art pilot allocation scheme based on large-scale fading.