Energy-Efficient User Clustering and Downlink Beamforming for MIMO-SCMA in C-RAN

Multiple-inputmultiple-output (MIMO) sparse code multiple access (SCMA) is of great interest for future wireless networks to achieve higher spectral efficiency and support massive connectivity. In this paper, we investigate the key problems of user clustering and downlink beamforming for MIMO-SCMA in a cloud radio access network (C-RAN). Using channel state information available at the central processor, an efficient user clustering algorithm based on the constrained $K$ -means method is proposed. Subsequently, two iterative algorithms for beamforming design are developed by minimizing the total transmission power under quality-of-service (QoS) and fronthaul capacity constraints. In the first approach, we approximate the continuous non-convex constraints by convex conic ones using first-order Taylor expansion and iteratively solve a sequence of mixed-integer second order cone programs (MI-SOCPs) to achieve high quality solution, but with higher complexity. In the second approach, a two-stage low-complexity solution is developed in which beamforming matrices obtained from each stage are combined to form a single beamformer for each user. In the first stage, cluster beamformers are designed by taking advantage of block diagonalization, while in the second stage, user-specific beamformers are determined by minimizing transmission power. The performance of the proposed user clustering and downlink beamforming approaches for MIMO-SCMA in C-RAN is validated through simulations over mmWave channels. Compared to benchmark approaches, the results show significant improvements in terms of transmit power and spectral efficiency.


I. INTRODUCTION
In mobile wireless networks, multiple access technologies are of crucial importance to meet performance requirements in terms of data throughput, network capacity, device connectivity and energy consumption. Recently, the application of non-orthogonal multiple access (NOMA) techniques to fifth generation (5G) and beyond 5G (B5G) wireless networks has received considerable attention. In effect, NOMA allows multiple users to access overlapping time and frequency resource elements in the same spatial layer [1]. Hence, this technology has the potential to provide higher spectral efficiency and meet the massive connectivity demand needed for machine-to-machine (M2M) communications and internet of things (IoT) in future wireless networks [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Jiayi Zhang . NOMA techniques can be classified into three main categories, namely: code domain, power domain and multiple domain [3]. In code domain NOMA, different codes are applied to modulate the data streams of the users over multiple resource elements in a sparse manner. Hence, the processed data of different users can be multiplexed over the same resource elements, wherein the induced sparsity allows the control of interference. Power domain NOMA relies on the use of superposition coding strategies, wherein user signals are simultaneously broadcast with different power levels at the transmitter, while successive interference cancellation (SIC) techniques are employed to separate them at the receiver. In multiple domain NOMA, such as pattern division multiple access (PDMA) and lattice partition multiple access (LPMA), multiple user signals are superimposed in multiple domains, including power, code, and spatial domains. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Sparse code multiple access (SCMA) is a code domain NOMA scheme inspired from the well-known code division multiple access (CDMA) technique. While CDMA extends each information symbol (taken, e.g., from a quadrature amplitude modulation (QAM) constellation) into a finite sequence of complex symbols by using orthogonal or near orthogonal spreading codes, SCMA directly maps each group of bits into a sequence of complex symbols by merging together the symbol mapper and the CDMA spreader. The overall process can be interpreted as a coding procedure from the binary domain to a multidimensional complex domain, which in turn raises new problems in terms of codebook design [4].
As an emerging network architecture for 5G and B5G, cloud radio access network (C-RAN) offers several benefits, e.g., improved energy efficiency, ability to handle interference on a larger scale and increased network capacity [5]. The C-RAN architecture consists of three main components, namely: the central processor, the remote radio heads (RRH) and the fronthaul links. The central processor, which is located in one or more data centers within the cloud, is responsible for all the baseband processing. The RRHs connect wireless devices to the network, alike base stations in current cellular networks. The fronthaul link provides connectivity (e.g., via dedicated optical fiber or microwave links) between the central processor and the RRHs. The C-RAN architecture concentrates the baseband processing in the central processor and coordinates the operation of the RRHs. This separation of the central processor and RRHs functionalities reduces the power consumption and complexity of the RRHs, since the latter only need to perform basic transceiver operations.

A. RELATED WORKS
There have been extensive studies devoted to the design of multidimensional constellations for downlink and uplink SCMA systems. In [6], the performance of a systematic sub-optimal design for the mother constellations (from which the individual user codebooks are derived) is investigated and a unified metric is proposed to obtain the optimum codebooks using a specific mother constellation. The authors in [7] evaluate the average bit error rate (BER) performance of SCMA systems in which codebooks are based on star-QAM signaling constellations. Multidimensional constellations with a low number of projections are designed in [8] based on the extrinsic information transfer (EXIT) chart using a multistage optimization. Subsequently, an appropriate labeling method based on the EXIT chart is optimized for the resulting constellation. In [9], the design of SCMA codebooks based on star-QAM constellations is addressed and an analytical approach to obtain the theoretical BER performance over Rayleigh fading channels is proposed. The design of an efficient suboptimal SCMA codebook is proposed in [10] for a large scale scenario with growing number of resources and users.
The use of multiple antennas along with multiple-input multiple-output (MIMO) techniques can lead to significant performance improvements in terms of user capacity, spectral efficiency, and peak data rates, by taking advantage of spatial diversity, multiplexing or beamforming gains. In [11], a joint sparse graph is constructed for a MIMO-SCMA system model, and the corresponding virtual SCMA codebooks are designed for the detector, wherein the message passing algorithm (MPA) is employed to reconstruct the transmitted data bits. In [12], a joint decoding algorithm is proposed for MIMO-SCMA systems based on space frequency block codes (SFBC), which exhibits lower computational complexity than MPA and yet achieves a similar block error rate (BLER). A novel downlink MIMO mixed-SCMA scheme is proposed in [13], such that the transmitted codewords for each user over different antennas come from different codebooks. The authors in [14] propose near-optimal low-complexity iterative receivers based on factor graph for a downlink MIMO-SCMA system over frequency selective fading channels.
Recently, the C-RAN architecture has aroused great interest for the implementation of MIMO-NOMA transmission schemes. In [15], a novel framework for C-RAN is proposed in which two users are scheduled over the same resources according to power domain NOMA, while the performance of cell-edge users is further enhanced by means of coordinated beamforming. Stochastic geometry is used to analyze the outage probability of NOMA under C-RAN in [16], where power domain multipexing along with SIC are employed to increase downlink system capacity. The application of beamforming along with power domain NOMA is investigated for cache-enabled C-RAN in [17]. The design of robust radio resource allocation and beamforming approaches for MIMO-SCMA systems under C-RAN is studied in [18], where the aim is to maximize the total sum rate of users subject to a minimum required rate for each slice.

B. MOTIVATIONS AND CONTRIBUTIONS
MIMO-SCMA combines MIMO techniques, which increase capacity by transmitting different signals over multiple antennas, and SCMA which improves spectral efficiency and device connectivity by transmitting multiple user signals over the same radio resources. As seen in works related to power domain NOMA [19], [20], the joint application of spatial user clustering along with beamforming techniques in MIMO-SCMA systems has the potential to improve spectral efficiency and reduce the total transmit power. Additionally, when considered within a C-RAN architecture, this approach makes it possible to increase the number of supported users in the network by using a common codebook for users in different clusters, while the effect of inter-cluster interference can be eliminated by centralized beamformer design and coordinated RRH operation. In spite of its importance, the joint problem of user clustering and beamforming has not received considerable attention in the literature on MIMO-SCMA, let alone C-RAN. Motivated by the above considerations, we propose energy-efficient user clustering and downlink beamforming approaches for MIMO-SCMA in C-RAN. Our main contributions in addressing the above challenges are summarized as follows: 1) We approach the user clustering problem by modifying the widely-used K -means method from the field of machine learning, in order to limit the number of users in each cluster. Specifically, the proposed constrained K -means algorithm uses the Euclidian metric to characterize the similarities between the user channel vectors and the cluster centers, and seeks to group users with channel vectors exhibiting large correlation. The elbow method is utilized to find the optimum number of clusters for the network. 2) We formulate the beamforming design and RRH selection as a non-convex mixed-integer nonlinear programing (MINLP) optimization problem, aiming to minimize the total transmit power while satisfying the signal-to-interference-plus-noise ratio (SINR) and fronthaul capacity constraints. We then propose transformations to reformulate the problem as a difference of convex functions (DC) program and derive two algorithms for solving the problem. In the first algorithm, we iteratively approximate the continuous non-convex constraints by convex ones using first-order Taylor expansion and solve a sequence of mixed-integer second-order cone programing (MI-SOCP) using dedicated solvers. This algorithm entails high computational complexity, yet it can achieve high quality solution.
3) The second algorithm is based on a two-stage low-complexity beamforming approach wherein the beamforming matrices obtained from each stage are multiplied to form the final beamformer. In the first stage, specifically, a block diagonalization (BD) technique is adopted to design the cluster beamformers (one for each cluster), which remove the inter-cluster interference and thus enhance the quality-of-service (QoS) for intra-cluster users. In the second stage, the user-specific beamformers are designed along with RRH selection by employing a smoothed 0 -norm approximation. The resulting optimization problem is solved via the convex-concave procedure (CCCP) with guaranteed convergence [21]. 4) We evaluate the performance of the proposed algorithms for user clustering and downlink beamforming using in-depth simulations of MIMO-SCMA in C-RAN with mmWave channel models and different parameter configurations. The results illustrate the convergence behavior of the new algorithms and the effect of various parameters on the system performance, while providing useful insights into the advantages of the proposed approaches over competing ones from the literature.

C. ORGANIZATION
The rest of the paper is organized as follows: Section II introduces the MIMO-SCMA system model under C-RAN and describes the problem under consideration. The proposed constrained K -means algorithm for user clustering is introduced in Section III. The two-stage energy-efficient beamforming approach for eliminating inter-cluster interference and minimizing total transmit power is developed In Section IV. The results of our simulation experiments are presented in Section V, followed by the conclusion in Section VI. Notations: Scalars, vectors and matrices are respectively denoted by lower case, boldface lower case and boldface upper case letters. For a matrix A, [A] i,j denotes its (i, j)th entry, while A T and A H denote its transpose and conjugate transpose, respectively. The operators . 2 and . 0 denote the Euclidean and zero norms of a vector, respectively. For a set A, |A| denotes its cardinality. C m×n (R m×n ) denotes the space of m × n complex (real) matrices. B m×n denotes binary matrices of size m×n where the set B = {0, 1}. We use CN (µ, σ 2 ) to denote a complex circular Gaussian random variable with mean µ and variance σ 2 .

II. SYSTEM MODEL AND PROBLEM DESCRIPTION
We consider downlink transmission in a MIMO-SCMA system under C-RAN, as illustrated in Figure 1. The system consists of L RRHs, each equipped with M antennas, and J single-antenna users. The RRHs indexed by l ∈ L {1, . . . , L}, are connected to the central processor via limited-capacity fronthaul links. Due to the fronthaul constraint, each user is cooperatively served by a specific subset of RRHs through joint beamforming. Moreover, the users are partitioned into K non-overlapping clusters, indexed by k ∈ K {1, . . . , K } with the kth cluster comprising J k users VOLUME 9, 2021 such that J = K k=1 J k . Below, we provide further details on the SCMA encoder, mmWave channel, received signal model, and problem description. For convenience, we list the key notations of this paper in Table 1.

A. SCMA ENCODER
In SCMA, contiguous groups of data bits from each user are directly mapped to sparse N -dimensional codewords selected from a predefined codebook and then transmitted over N radio resources, e.g., orthogonal frequency division multiple access (OFDMA) subcarriers. The SCMA encoder for the ith user can be defined as f i : B U → X i which is a one-to-one mapping from the set of U -bit tuples to a codebook X i ⊂ C N of N -dimensional codewords, with cardinality |X i | = 2 U . Specifically, for b = [b 1 , . . . , b U ] ∈ B U , the corresponding codeword is obtained as, where x is a sparse vector with C < N non-zero elements. Each user is assigned C subcarriers such that no two users occupy the same set of subcarriers. Hence, only q users can be supported by SCMA, as given by [22], In this work, we group users into K clusters of size J k ≤ q and remove inter-cluster interference so that the users in different clusters can use common codebooks. Referring to (1), we can associate to each codeword x a vector y containing its C non-zero elements in the same order, i.e., y is obtained from x by removing its zero elements.
For convenience, we represent this operation by the function φ : C N → C C , so that y = φ(x) = [y(1), . . . , y(C)]. Through this operation, the original codebook X i ⊂ C N is transformed into a constellation of C-dimensional codewords, i.e., → Y i denote the composite mapping of f i and φ, so that for any b ∈ B U , and x = f i (b), we have, From this perspective, the SCMA encoder can be redefined as f i (b) = S i g i (b), where matrix S i ∈ B N ×C maps a C-dimensional constellation point to an N -dimensional codeword. Note that S i contains N − C all-zero rows and hence, all the codewords in codebook X i contain 0 in the same N − C positions. Moreover, an identity matrix of order C is obtained by removing the all-zero rows from S i .
The set of resources occupied by user i is determined by the positions (or indices) of the non-zero elements of the binary indicator vector In effect, the complete SCMA encoder structure for q users and N subcarriers can be represented by a factor graph, with associated matrix F = [f 1 , . . . , f q ] ∈ B N ×q . In this interpretation, subcarrier node n and user node i are connected if and only if the corresponding element of matrix F is equal to 1, i.e., [F] n,i = 1.

B. CHANNEL MODEL
Due to the propagation characteristics at such high frequencies, the application of MIMO-SCMA communication in the mmWave band is more challenging than in a conventional low-frequency scenario. The mmWave-based channel vector h (l) jk (n) ∈ C 1×M from the lth RRH to the jth user in the kth cluster over the nth subcarrier can be expressed as the discrete sum of a line-of-sight (LOS) and P non line-of-sight (NLOS) components [23], [24], i.e., h (l) where: p is the path index, with p = 0 corresponding to LOS and p ≥ 1 to NLOS paths; d (l) jk is the distance between the RRH and the user; α (p) is the path loss exponent; a (lp) jk (n) denotes the complex gain for the pth path which follows a complex circular Gaussian distribution, i.e., a (lp) jk (n) ∼ CN (0, 1); and a(θ (lp) jk ) ∈ C 1×M is the antenna array steering vector. In the case of a uniform linear antenna array, the steering vector is given by, where θ (lp) jk is the normalized direction of the pth path. The latter can be expressed as, where φ (lp) jk ∈ [0, 2π] is the angle of departure (AoD) of the pth path, d is the inter-antenna element spacing, and λ is the wavelength at the operating frequency.
In MIMO systems operating at mmWave frequencies, a single-path model is often adopted for the channel vectors by retaining only one dominant path in (4) [25]. In most cases, the latter will be the LOS path, whose gain can be as much as 20dB stronger than that of NLOS paths [26]. However, when there is no LOS path due to blockage, the dominant NLOS path can be considered instead. Hence, the mmWave channel model can be simplified to, where, for simplicity of notation, the superscript p for the path index has been removed.

C. SIGNAL MODEL
Let x jk (n) ∈ C denote the codeword element intended for the jth user in the kth cluster over the nth subcarrier. Due to the sparsity of the SCMA encoder, x jk (n) can be either 0, or a non-zero element with normalized power, i.e., E{|x jk (n)| 2 } = 1. Codeword element x jk (n) is transmitted from the M antennas of the lth RRH by employing the beamforming vector w (l) jk (n) ∈ C M ×1 . Hence, the transmit signal of the lth RRH over the nth subcarrier can be expressed as, where U n,k denotes the set of users in the kth cluster occupying the nth subcarrier. Owing to the limited-capacity fronthaul link, only a selected group of RRHs serve a specific user cooperatively. The process of RRH selection for transmission can be performed through beamforming. That is, w (l) jk (n) 2 = 0 implies that the lth RRH does not participate in the transmission for that user over its assigned subcarrier. Hence, the corresponding network-wide beamforming vector, jk (n)] ∈ C 1×LM denote the network-wide channel vector for the jth user in the kth cluster and z(n) = [z (1) ∈ C LM ×1 denote the network-wide transmit signal over the nth subcarrier. The received signal at the jth user in the kth cluster over the nth subcarrier is given by, r jk (n) = h jk (n)z(n) + n jk (9) where n jk ∼ CN (0, σ 2 jk ) is an additive noise term. We can express the received signal of this user as a sum of the desired signal, the interference from the other users in that cluster (intra-cluster interference), the inter-cluster interference and the noise, i.e., The SINR of the jth user in the kth cluster over the nth subcarrier with non-zero codeword element is given by, where the first term in the denominator represents the intra-cluster interference and the second term represents the inter-cluster interference, i.e., The total transmit power for the whole network over N subcarriers is given by, Upon substitution of (8) into (14) and assuming that the transmitted codewords x jk (n) from different sources are uncorrelated and have zero mean and unit variance, we can write the total transmit power as, where the last equality follows from the definition of the network-wide beamforming vector.

D. PROBLEM DESCRIPTION
In this work, our objective is to group users into nonoverlapping clusters and design beamformers such that the total transmit power is minimized while constraining the inter-cluster interference, the user SINRs and the fronthaul capacity. Indeed, removing inter-cluster interference not only enhances the SINR at the user terminal, but also allows the transmitter to use a common SCMA codebook to serve users in different clusters, which in turn boosts network capacity. To further satisfy the requirements imposed by the limited-capacity fronthaul links of C-RAN, dynamic RRH selection is taken into consideration in our formulation. In order to address the above challenges and obtain the desire solution, we conceive efficient algorithms for user clustering and beamforming design with low complexity.
Specifically, we propose an efficient user clustering algorithm based on the constrained K -means method in Section III. Then, the beamformer design is addressed in Section IV by means of a two-stage energy-efficient approach wherein the inter-cluster interference is removed using a BD technique in the first stage and the total transmit power is optimized under SINR and fronthaul capacity constraints in the second stage.

III. USER CLUSTERING
In this section, we first introduce the proposed constrained K -means algorithm for user clustering. We then apply the elbow method to determine the number of clusters. Finally, we evaluate the computational complexity of the proposed algorithm.

A. CONSTRAINED K -MEANS CLUSTERING
K -means is a celebrated method for grouping inharmonious multi-dimensional data points into K clusters such that a similarity criterion within clusters is maximized [27], [28]. In effect, K -means attempts to group J data points (or vectors) . . , d J } into K clusters by finding cluster centers {c 1 , c 2 , . . . , c K } such that similarities between the points in the same group are high while similarities between the points in different groups are low. Two key factors in the K -means method are the number of clusters K , which is predetermined, and the similarity metric [29].
In the current MIMO-SCMA application, high correlation between the channel vectors of the users in a cluster can provide a better beamforming performance. Indeed, if users in a cluster have highly correlated channels, more degrees of freedom will be left for the inter-cluster interference cancellation (as explained in Section IV). In this work, we utilize the Euclidian distance as a similarity metric to measure the correlation between a user's channel vector and the cluster centers. Moreover, to account for variations of channel gains due to fading and other propagation effects, the channel vectors are normalized, averaged over subcarriers, and treated as the data points in the application of the K -means method, i.e., where h j (n) ∈ C 1×LM for j ∈ J {1, . . . , J } are the known network-wide channel vectors of all users prior to clustering.
The K -means method can be presented as an optimization problem for finding the K best centers such that the sum of squared Euclidean (SSE) distance between the data points and their nearest cluster centers is minimized. Specifically, this optimization problem can be expressed as follows, where C {c k |k ∈ K}.
Proposition 1: Given d j and c k ∈ C 1×LM , we have, Proof: The result follows directly from the linear programming duality theory [30].
By introducing selection variables ι {ι j,k |j ∈ J , k ∈ K} and using Proposition 1, we can reformulate problem (15) as the following problem, where ι j,k = 1 if the jth data point is closest to the kth cluster center, i.e., belongs to the kth cluster, and ι j,k = 0 otherwise. While the K -means method does not involve a priori constraint on the number of users in each cluster [31], the SCMA encoder in the current application can support at most q users over N subcarriers. To avoid solutions with more than q data points in a cluster, we propose adding explicit constraints to problem (19) so that each cluster contains at most q data points, i.e., The constrained K -means algorithm solves problem (20) iteratively by uncoupling cluster center and selection variables. Specifically, in each iteration, this algorithm alternates between solving a linear program for variable ι with fixed c and solving a problem for c with fixed ι. The overall constrained K -means algorithm for solving problem (20) is summarized in Algorithm 1, where the superscript t denotes the iteration index.
Proposition 2: There exists an optimal solution for the cluster assignment subproblem in Algorithm 1 such that ι j,k ∈ {0, 1}.
Proof: See Appendix A. According to Proposition 2 and Appendix A, we can use the network simplex algorithm which is faster than mixed integer solvers for tackling the cluster assignment subproblem.
Proposition 3: The constrained K -means algorithm terminates in a finite number of iterations at a cluster assignment that is locally optimal. That is, the limit point of the iterates K } by selecting K data points from the dataset randomly. Set t = 0.

Repeat:
1) Cluster assignment: Solve the following linear program with fixed c (t) .
2) Cluster update: Update the cluster centers as, generated by the constrained K -means algorithm is a stationary point that satisfies the Karush-Kuhn-Tucker (KKT) conditions for problem (20). Proof: At each iteration, the cluster assignment step cannot increase the objective function of (20). The cluster update step will either strictly decrease the value of the objective function of (20) or the algorithm will terminate since, is a strictly convex optimization problem with a unique global solution (as shown in the cluster update step in Algorithm 1). Thus, the objective of (20) is strictly non-increasing and bounded below by zero. Moreover, there are a finite number of ways to assign J points to K clusters such that each cluster has at most q points and Algorithm 1 does not permit repeated assignments. Consequently, the algorithm must terminate at some cluster assignment that is locally optimal.

B. NUMBER OF CLUSTERS
The choice of the number of clusters K plays a key role in the performance of K -means clustering [32]. An appropriate number of clusters can accurately reflect specific distribution characteristics of users in the network. While the number of clusters cannot exceed the number of users, it should also satisfy the constraint on the maximum number of users in each cluster. However, finding the optimal K is a major challenge in clustering analysis, and there is no definitive solution.
To address this problem, a number of approaches have been proposed such as the elbow [33], silhouette [34], and gap statistic [35] methods. Among these, the elbow method is possibly the most well-known and utilized as it entails the lowest computational complexity while providing very good performance.
Herein, we employ the elbow method to determine the number of clusters. The elbow method is a heuristic method which involves running the clustering algorithm on the dataset and evaluating a clustering criterion for different values of K . The plot of the clustering criterion versus the number of clusters resembles an arm in which the elbow point (the point of discontinuity in the slope of the curve) determines the appropriate number of clusters for the dataset. The sum of the normalized within-cluster SSE distance is a common clustering criterion for applying the elbow method along with K -means.
In a given cluster C k , the within-cluster SSE distance between the data points is given by, Hence, the sum of the normalized within-cluster SSE distances can be expressed as, where |C k | shows the cardinality of the cluster C k . It should be noted that although the sum of the normalized within-cluster SSE distance can give a proper measure of the compactness of the clustering, we may encounter cases with more than one elbow point or no elbow point. In such cases, other reliable methods mentioned before can be used to find the best K .

C. COMPLEXITY ANALYSIS
In this subsection, we analyze the computational complexity of the proposed constrained K -means algorithm by considering the number of required operations (e.g. complex addition and multiplication) in each step and in each iteration of the algorithm. Specifically, we divide the operations for each iteration into three steps: • Calculation of Euclidean distances: The complexity of calculating the Euclidean distance between the data points and the cluster centers is O(JKLM ).
• Cluster assignment: The complexity of solving cluster assignment subproblem via network simplex algorithm is O(J 3 K 2 (log(J )) 2 ) (See Appendix A).
• Cluster update: The complexity of updating the cluster centers is O(JKLM ). Assuming that the algorithm converges after T K iterations. The overall complexity of Algorithm 1 can be expressed as,

IV. DOWNLINK BEAMFORMING
In this section, we first formulate the beamforming design as a non-convex mixed-integer nonlinear programing (MINLP) optimization problem, aiming to minimize the total transmit VOLUME 9, 2021 power while satisfying the QoS and fronthaul capacity constraints. We then propose transformations and convex approximation techniques to derive two iterative algorithms for solving the problem. In the first algorithm, we approximate the continuous non-convex constraints by convex ones using first-order Taylor expansion. Hence, we are able to arrive at a sequence of mixed-integer second-order cone programing (MI-SOCP), for which dedicated solvers are available.
Although the MI-SOCP algorithm entails high computational complexity, it is shown that it can achieve high quality solution [36]. Hence, in this paper, we use MI-SOCP algorithm as a benchmark. A simplified suboptimal approach is also proposed which designs the beamformers in two stages to achieve lower complexity. In the first stage, the cluster beamformers are determined by taking advantage of BD to remove intercluster interference. In the second stage, we obtain the user-specific beamformers with the aid of CCCP method to minimize the total transmit power. Finally, the convergence and the computational complexity of the proposed algorithms are discussed.

A. BEAMFORMING PROBLEM
Our objective is to optimize the total transmit power through joint design of the dynamic RRH selection scheme and the beamforming vectors subject to the QoS and fronthaul capacity constraints on each individual RRH. Let the binary variable s where R jk (n) = log 2 (1 + γ jk (n)) denotes the transmission rate, N jk shows the set of subcarriers occupied by the jth user in the kth cluster, γ min , C max , and P max are the minimum required SINR for the user over the subcarrier, the maximum capacity constraint for each RRH over the subcarrier, the maximum available total transmit power, respectively. Constraints (25b) and (25c) guarantee QoS by removing the inter-cluster interference and satisfying SINR requirements, respectively. The constraint (25d) shows that the sum-rate of the users served by the lth RRH over the nth subcarrier should be smaller than the maximum fronthaul capacity C max . Constraint (25e) utilizes the so-called Big M method which indicates that the beamformer w l jk (n) 2 = 0 if the lth RRH does not participate in transmission for the jth user in the kth cluster over the nth subcarrier, i.e., s (l) j,k (n) = 0, but leaves the beamformer ''open'' otherwise. Therefore, P max can be any large number. Constraint (25f) guarantees that each user is served by at least one RRH. Although constraint (25f) appears to be redundant, it is added to reduce the size of the feasible set of the associated problem which in turn improves the convergence time of the solver. We refer the interested reader to [37] for additional details.
Problem (25) is a non-convex MINLP problem, which can be considered as an NP-hard problem in general and is one of the most challenging class of mathematical optimization problems [38]. Obtaining its optimal solution is challenging due to the non-convexity of the SINR constraints, the combinatorial nature of the RRH selection variable s (l) j,k (n), and the coupling between the variables s (l) j,k (n) and R jk (n) in the fronthaul constraint. Even when the RRH selection scheme s (l) j,k (n) is given, problem (23) is still non-convex and computationally difficult. In the following subsections, we develop two beamforming approaches to find a suboptimal solution.

B. MI-SOCP BEAMFORMING APPROACH
In this section, we first reformulate the problem (25) into a more tractable form. We then solve the resulting optimization problem via a CCCP-based algorithm with guaranteed convergence to a local stationary solution of the transformed problem.
Without loss of optimality, SINR constraint (25c) can be rewritten as the following second-order cone (SOC) constraint, where I (1) jk (n) and I (2) jk (n) are the intra-and inter-cluster interference as expressed in (12) and (13) respectively. We have restricted h jk (n)w jk (n) to be positive real, which incurs no loss of optimality since we can always phase-rotate the vector w jk (n) such that h jk (n)w jk (n) is positive real without affecting the cost function or the constraints.
Let us introduce the auxiliary variables u j,k (n) and v j,k (n) as the upper bounds on the SINR and transmission rate for the jth user in the kth cluster over the nth subcarrier. Hence, constraint (25d) can be rewritten as follows, Since the expression of γ jk (n) is in fractional form, the constraint in (27) is difficult to handle. Therefore, we introduce the auxiliary variables l j,k (n) as the lower bound of the denominator, and then equivalently transform (27) as the following two constraints, From the above discussion, we can finally reformulate the problem (25) into an equivalent problem as given below, where the identity 4xy = (x + y) 2 − (x − y) 2 is used to obtain (32f). We note that even by continuous relaxation of binary variables s (l) j,k (n), optimization problem (32) is still non-convex due to constraints (32d)-(32f). However, the latter can be expressed as differences of two convex functions. Thus, the obtained optimization problem can be efficiently solved using the iterative CCCP.
Basically, CCCP iteratively solves a sequence of convex subproblems, each of which is constructed by linearizing the concave part of the DC constraints using their first-order Taylor expansions [21]. Specifically, the first-order Taylor expansion of the right side of constraint (32d) around the current pointŵ jk (n) is expressed as, (w j k (n);ŵ j k (n)) = j =j,j ∈U n,k where {.} denotes the real part of its argument. In the same way, we convexify the right side of constraints (32d) and (32e) by using the first-order Taylor expansions around the current pointsv jk (n),ŝ (l) j,k (n), andv jk (n) as, By applying the above approximations to the non-convex constraints (32d)-(32f), we can formulate the convex
2) Set t = t + 1. Until: Termination criterion is met: P T < . approximation of problem (32) as shown below, j,k (n),v jk (n)). (36e) Hence, based on CCCP, we solve subproblem (36) at each iteration. Problem (36) is a MI-SOCP which can be solved via modern solvers such as MOSEK [39] or GUROBI [40]. The proposed iterative algorithm is summarized in Algorithm 2. The algorithm terminates if the variation of the total transit power, i.e., P T , is less than a preset threshold .
Initialization: Choosing a feasible point for initialization of Algorithm 2 is essential. For this purpose, we simply setv jk (n) = log 2 (1 + γ min ) and then solve the following feasibility problem P = find{s (l) j,k (n)|(25f),(25g), k j s (l) j,k (n)v jk (n) ≤ C max } which is a mixed-integer linear program which can be solved optimally by off-the-shelf solvers such as MOSEK or GUROBI. Subsequently, we solve the following quadratic program with fixedŝ (l) j,k (n) via any general-purpose solver using interior-point method,

C. TWO-STAGE BEAMFORMING APPROACH
In order to reduce the computational complexity, we propose a two-stage energy-efficient beamforming approach such that, where B k (n) ∈ C LM ×a is the kth cluster beamformer obtained in the first stage which should eliminate the inter-cluster interference and v jk (n) ∈ C a×1 is the user-specific beamformer for the jth user in the kth cluster optimized in the second stage. VOLUME 9, 2021 Using channel state information (CSI) available at the central processor, BD beamforming can be adopted in a MIMO-SCMA system to remove the inter-cluster interference and enhance the QoS for intra-cluster users [41]. Hence, the users in different clusters can share codebooks. Although BD algorithm does not work well in the presence of imperfect CSI, we considered a second stage for beamforming in which the QoS can be guaranteed. Specifically, the BD beamforming projects the transmitted signal onto the null-space of the interfering channels and hence eliminates the inter-cluster interference.
To find the corresponding null-space, let us define, (40) where k ∈ K and H −k (n) ∈ C LM ×(J −J k ) is the matrix containing all interfering channels for the kth cluster. We seek B k (n) orthogonal to the column span of H −k (n), i.e., H −k (n) T B k (n) = 0. Here, it is assumed that the total number of antennas LM is larger than the total number of users J .
The singular value decomposition (SVD) can be employed to calculate the cluster beamformers. Applying the SVD to H −k (n) yields, where U k (n) ∈ C LM ×LM and V k (n) ∈ C (J −J k )×(J −J k ) are unitary matrices and k (n) ∈ R LM ×(J −J k ) is the rectangular diagonal matrix of singular values. Let r denote the rank of matrix H −k (n), which corresponds to the number of non-zero diagonal entries in k (n). The null-space of the interfering channel matrix H −k (n) is spanned by the left singular vectors (i.e. columns of matrix U k (n)) associated to the zero singular values of H −k (n). We can express the kth cluster beamformer as, where u i,k (n) denotes the ith column of U k (n). As mentioned before, constraint (25e) implies that w j,k (n) = 0. Without loss of optimality, the binary RRH selection variable s (l) j,k (n) can be replaced by w (l) jk (n) 2 2 0 , as in [42], [43]. Therefore, upon substitution of (38) and 0 -norm, problem (25) can be rewritten as, It should be noted that the fronthaul capacity constraint (43d) which is expressed in the form of 0 -norm, indicates the inherently dynamic RRH selection. That is, owing to this fronthaul constraint, the network-wide beamforming vectors w jk (n) may have a sparse structure. Although the number of constraints is reduced and the binary RRH selection variable is removed, problem (43) is still non-convex due to constraints (43c) and (43d).
As mentioned before, using cluster beamformer B k (n) obtained from BD can remove inter-cluster interference. Hence, the SINR of the jth user in the kth cluster over the nth subcarrier can be expressed as, where the inter-cluster interference term in the denominator is removed. Consequently, SINR constraint (43c) can be rewritten as follows, which is a SOC constraint.
To address the non-convexity of constraint (43d), we first introduce the auxiliary variables u j,k (n), v j,k (n), and t (l) j,k (n) as the upper bounds of the SINR, transmission rate, and 0 -norm for the jth user in the kth cluster over the nth subcarrier. Hence, constraint (43d) can be rewritten as follows, We then propose to approximate the non-convex 0 -norm by a reweighted 1 -norm as follows [44], β (l) jk (n) is a constant weight which is updated in each iteration according to, whereŵ (l) jk (n) is obtained from previous iteration and τ is a small constant regularization factor controlling the smoothness of the approximation. Based on the updating rule (51), β (l) jk (n) is inversely proportional to the transmit power level ŵ (l) jk (n) 2 2 . Hence, the RRHs with lower transmit power for the jth user in the kth cluster would have higher weights and hence would be forced to further reduce its transmit power and eventually be dropped out of the group of participating RRHs for that user.
We can employ the approach mentioned in IV.B to deal with the non-convexity of the constraints and use CCCP
Until Termination criterion is met: P T < .
to solve the optimization problem. Hence, based on CCCP, we solve the following subproblem at each iteration, Problem (52) is convex and can be solved via any general-purpose solver using interior-point methods [45]. The proposed CCCP-based iterative algorithm is summarized in Algorithm 3.
Initialization: In this case, an initial point for Algorithm 3 is obtained by generatingv jk (n) randomly. Then, w jk (n) and β (l) jk (n) are calculated as in (38) and (51) respectively.t (l) j,k (n) is set to ŵ (l) jk (n) 2 2 0 , andv jk (n) is set to the transmission rate calculated usingŵ jk (n).

D. CONVERGENCE AND COMPLEXITY ANALYSIS
With a feasible initial point, repeated application of the CCCP iteration is guaranteed to converge to a stationary solution of the problem with DC constraints. It can be seen that the optimal solution obtained from the previous iteration, i.e.,ŵ jk (n), is feasible for the convex subproblem at the next iteration for both algorithms. The achieved objective at the current iteration cannot be greater than the one at the previous iteration. Since, the objective function is non-increasing and bounded below by zero, it follows that both algorithms will converge to a point that according to [46] is locally optimal. We refer the interested reader to [46] for a rigorous proof of the convergence.
For Algorithm 2, the overall complexity is dominated by solving the MI-SOCP problem in (36). In particular, there are JLN binary variables s (l) j,k (n), resulting in 2 JLN combinations for all the binary variables. Thus, assuming that MI-SOCP algorithm terminates after T M iterations, the worst-case complexity can be written as At each iteration, the CCCP-based algorithm solves the convex subproblem (52) which can be approximated by a sequence of SOCPs via the successive approximation method. Each SOCP can then be solved via a general-purpose solver, e.g., SDPT3 in CVX [47] with a complexity of O((JCLM ) 3 ). Assuming that the CCCP terminates after T C iterations, the worst-case computational complexity is therefore given by, (54)

V. SIMULATION RESULTS
In this section, numerical experiments are carried out to illustrate the performance of the proposed energy-efficient user clustering and downlink beamforming for MIMO-SCMA in C-RAN. We consider the channel model as described in Section II.B with bandwidth of W = 2 GHz and carrier frequency of 28 GHz. The AoDs are assumed to follow a uniform distribution in [0, 2π]. The inter-antenna spacing is d = λ/2 to reduce the effect of mutual coupling and correlation among neighbouring antenna elements. The noise figure is N f = 40 dBm, hence, the noise power is σ 2 jk = −174 + 10 log 10 (W ) + N f dBm [23]. The pathloss exponent of the LOS and NLOS paths in (4) are α (0) = 2 and α (p) = 3, respectively. For SCMA encoder, the number of subcarriers is N = 4, and the number of non-zero elements for each codeword is C = 2. The corresponding factor graph matrix is, It should be noted that the structure of the factor graph matrix with fixed N and C does not affect system performance significantly. Table 2 summarizes the simulation setting parameters.
We use Monte Carlo experiments to evaluate the performance of the proposed algorithms for user clustering and downlink beamforming. The total transmit power and sum rate are measured for different parameter configurations and the results are compared with benchmark approaches in the literature.  Figures 2a and 2b find the optimal number of clusters K and evaluate its impact on the performance of the proposed scheme. In Figure 2a, we plot the sum of the normalized within-cluster SSE distance which serves as clustering criterion in the elbow method described in Section III.B. It can be seen W K decreases when K increases and the elbow point can be found at K = 4. 1 To gain further insight into the impact of the number of clusters, we investigate the transmit power performance versus target SINR, γ min in Figure 2b, where the number of clusters increases from 2 to 6. It is observed that the total transmit power increases monotonically as γ min increases. Moreover, the best performance is achieved when the number of clusters is K = 4. On one hand, for K < 4, an increase in the number of users in a cluster results in larger intra-cluster interference which results in higher transmit power. On the other hand, increasing the number of clusters intensifies inter-cluster interference which increases power consumption in the first stage beamforming for interference cancellation. We thereby observe that better user clustering with lower total transmit power can be found at K = 4 and the elbow method can efficiently find the optimal number of clusters in this case.
Figures 3a and 3b present the convergence behaviour of the proposed constrained K -means and CCCP-based algorithms. Figure 3a shows the objective value achieved by the constrained K -means algorithm with three different initial points. In this regards, J = 12 users are grouped into K = 4 non-overlapping clusters of size less than 6, i.e., q = 6. We observe that the algorithm converges rapidly in a few steps and the gap between final results of different initial points is small. In Figure 3b, the convergence performance of the CCCP-based algorithms is investigated for the case of γ min = 3 dB. It can be seen that the algorithm converges in less than 15 iterations monotonically to a same value for different initial points.
In Table 3, we present the run-time comparison between the proposed two-stage approach and the MI-SOCP 1 Due to space limitation, we omit the results on the Silhouette and gap statistic methods here for brevity However, it should be noted that using each of these methods for determining the optimal number clusters gives the same result while elbow method entails lower computational complexity.  beamforming for different values of γ min . The results, 2 show that the complexity of the proposed two-stage beamforming algorithm is much less than that of the MI-SOCP approaches, owing to the use of the smoothed 0 -norm approximation.
In Figure 4, we compare the transmit power versus sum rate among different clustering and beamforming algorithms. The cluster-head approach proposed in [48] is used as a benchmark for user clustering which selects the K users 2 Based on the use of a desktop computer equipped with 8th Generation Intel i7-8700 6-core processor (12M Cache, 4.6 GHz) and 32GB RAM. with the highest channel gains as the cluster centers. The cluster assignment is then used to group users into clusters. We also consider the performance of the MIMO-SCMA system with exhaustive and random clustering. The combination of exhaustive search for user clustering and MI-SOCP for beamforming is shown to attain the best performance among all the algorithms. However, this comes at the cost of high computational complexity. As it can be seen from Figure 4, the proposed constrained K -means clustering algorithm exhibits better performance compared to random search and cluster-head approach and can partition users more efficiently. Regarding the beamforming algorithms, it is shown in Figure 4 that the suboptimal solution achieved by the proposed two-stage beamforming is very close to the high-quality solution obtained by MI-SOCP.
To better appreciate the benefits of the proposed MIMO-SCMA scheme in terms of spectral efficiency, we examine the achievable sum rate of the users within the network. We consider orthogonal multiple access (OMA) and power domain NOMA as benchmarks with similar parameters except the number of multiplexed signals over each subcarrier. Specifically, in OMA, each user is assigned only one subcarrier such that no interference occurs with other user signals. Hence, the maximum number of users in each cluster is equal to the number of subcarriers, i.e., q = N . In power domain NOMA, all users have access to all the subcarriers and no constraint is applied to the maximum number of users in a cluster. In this section, we refer to power domain NOMA simply by NOMA. Figure 5 compares the total transmit power versus achievable sum rate among the proposed SCMA, power domain NOMA and OMA schemes. Two different channel models are considered for this purpose, one with no NLOS path, i.e., P = 0, and the other with P = 3 NLOS components. It is observed that in both cases, the results of the proposed SCMA scheme outperforms other schemes in terms of sum rate and the performance gap gets larger as the transmit power increases. Moreover, we observe that the transmit power for NOMA is more than that of OMA. However, as the sum rate increases, the results for OMA exhibits a noticeable increase in transmit power compared to NOMA.
To investigate the impact of imperfect CSI on the proposed user clustering and downlink beamforming, we model the estimated channel vector as follows, where h jk (n) is the actual channel vector and jk (n) is CSI error with i.i.d. entries following a complex Gaussian distribution, i.e., jk (n) ∼ CN (0, σ 2 e I). Figure 6 shows the transmit power comparison for the channel model with P = 3 NLOS paths, where the perfect  and imperfect CSI with different σ e scenarios are considered. It can be seen that the system performance is sensitive to the CSI accuracy. This is due to the fact that the channel correlation is used as the similarity metric for the proposed constrained K -means algorithm, which largely depends on the obtained CSI at the central processor. Moreover, the BD algorithm does not work well in the presence of imperfect CSI and can not remove inter-cluster interference totally. In order to enhance the performance of the proposed user clustering and downlink beamforming in the presence of imperfect CSI, one can use a more sophisticated similarity metric in the clustering algorithm or robust beamforming in the second stage of the proposed approach [49]- [51]. However, these considerations are beyond of the scope of this work.  Figure 7 presents the average number of associated RRHs per user versus the fronthaul link capacity for different target SINRs, γ min . It can be seen that due to the limitation on the capacity of the fronthaul links, each user can be only served by a small group of RRHs. For fixed γ min , the number of RRHs associated with each user will increase as the fronthaul link capacity grows. Moreover, for a fixed fronthaul link capacity, the group of associated RRHs will increase as γ min gets smaller. In fact, the data rate of each user will become smaller for a lower γ min . Thus, each RRH can serve more users with lower data rate.   J = 18 users are grouped into K = 4 clusters. It is worth noting that the number of clusters are determined through the elbow method for this case. For larger M , better beamforming results are expected as more degrees of freedom will be left for inter-cluster interference cancellation. We can observe that the total transmission power decreases as the number of antennas increases which is a consequence of narrower beamforming. Figure 9 shows the total transmit power versus total number of users J for different target SINRs, γ min . In this regard, the channel model with P = 3 NLOS paths is considered and the users are grouped into K = 4 clusters. As the results indicate, the total transmission power depends on the number of users and γ min . As the number of users or the target SINR increases, larger total transmission power is needed to satisfy QoS and fronthaul capacity requirements.

VI. CONCLUSION
In this work, the design of user clustering and beamforming approach was investigated for MIMO-SCMA in C-RAN. We proposed a constrained K -means algorithm for user clustering. By taking advantage of CSI available at the central processor, this algorithm was applied to partition users into non-overlapping clusters based on the correlation between channel vectors. The beamforming design was formulated as an optimization problem, with the aim to minimize the total transmit power under the SINR and fronthaul capacity constraints, and two iterative algorithms were proposed for its solution. In the first approach, the high-quality solution was achieved by solving a MI-SOCP in each iteration via dedicated solvers. In the second approach, a two-stage low-complexity beamforming design was proposed where in the first stage, the BD was employed to obtain the cluster beamformers, while in the second stage, the design of user-specific beamformers was formulated as an optimization problem. Through simulations, it was shown that the proposed user clustering and beamforming approaches for MIMO-SCMA systems can effectively decrease total transmit power, eliminate inter-cluster interference, and improve spectral efficiency compared to the benchmark approaches.

APPENDIX A PROOF OF PROPOSITION 2
In order to prove proposition 3.2, we first transform the cluster assignment subproblem in Algorithm 1 into its equivalent form as a minimum cost flow (MCF) linear network optimization problem. We then show that the optimal selection variable ι j,k is binary, which can be found using fast network simplex algorithms instead of complex mixed integer linear programming [52].
In general, a MCF problem has an underlying directed graph structure G = (V, E) defined by a set of vertices (nodes), V, and a set of edges (arcs), E. For each node ν ∈ V, we associate a value b(ν) indicating whether it is a supply node (b(ν) > 0), a demand node (b(ν) < 0), or a transshipment node (b(ν) = 0). For each edge (ν, ω) ∈ E, we associate a flow of f (ν, ω) on the edge with cost of c(ν, ω) per unit flow. The optimization model for the MCF problem can be formulated as, Let each data point d j correspond to a supply node with b(d j ) = 1 and each cluster center c k correspond to a demand node with b(c k ) = −q. The cost of the edge (d j , c k ) can be expressed as, (59) VOLUME 9, 2021 To satisfy the feasibility constraint of the problem, we consider an articial supply node, a, such that, b(a) = −J + Kq. (60) This artificial node has no edge to or from data points, while the cost of edge form node a to cluster center c k is zero, i.e. c(a, c k ) = 0 ∀k ∈ K. These identifications establish the equivalence between the MCF and the cluster assignment subproblem in Algorithm 1 in which the selection variable ι j,k corresponds to flow f (d j , c k ). The MCF equivalent directed graph structure is shown in Figure 9.
According to [52,Proposition 5.4], since b(d j ), b(c k ), and b(a) are all integers, the optimal flow solution is integervalued. Since the selection variable ι j,k corresponds to flow f (d j , c k ), and since k f (d j , c k ) = 1, the optimal ι j,k is integer with maximum value equal to 1, i.e. ι j,k ∈ {0, 1}.
The MCF formulation allows one to solve the cluster assignment subproblem via network simplex algorithm which is faster than general linear programming codes. Specifically, the complexity of solving cluster assignment subproblem via network simplex algorithm is given by [52], where the number of vertices |V|, and number of edges |E| in our case are, It is of interest to investigate the asymptotic complexity of the algorithms when J and K are large, i.e., when we let J > K → ∞. Under this condition, we can obtain the asymptotic complexity as, C O(J 3 K 2 (log(J )) 2 ).
(64)  His research focuses on the study of advanced algorithms for the processing of information bearing signals by digital means. He has coauthored more than 300 refereed publications in these areas. His research has been funded by the Natural Sciences and Engineering Research Council of Canada, the Fonds de Recherche sur la Nature et les Technologies from the Government of Quebec, and some major industrial sponsors, including Nortel Networks, Bell Canada, InterDigital, and Microsemi. His research interests include areas of statistical signal processing, including detection and estimation, sensor array processing, adaptive filtering, and applications thereof to broadband communications and audio processing. He has been an Associate Editor of the IEEE SIGNAL PrOCESSING LETTERS, the IEEE TRANSACTIONS ON SIGNAL PROCESSING, and the EURASIP Journal on Applied Signal Processing. He has also served on the technical committees for several international conferences in the fields of communications and signal processing. VOLUME 9, 2021