Efficient Distributed Multi-Task Schemes for mmWave MIMO Channel Estimation

In this study, the problem of sparse channel estimation is investigated with the employment of a fully distributed approach. We exploit the spatially joint sparsity structure of the involved channels to formulate the channel estimation problem in the angular domain. The devices collaboratively estimate their channel sparsity support sets before the local estimation of the channel values, assuming the existence of global and common support subsets. The combination of the proposed distributed scheme with the classical Simultaneous Orthogonal Matching Pursuit (SOMP) algorithm is called Weighted Distributed Simultaneous Orthogonal Matching Pursuit (WDiSOMP). The performance of WDiSOMP is assessed under a multi-task scheme considering a new weighted voting method. The new mechanism is applied at each support set index and enhances its estimation accuracy and, thus, the eventual estimation of the overall channel. The mean squared error (MSE) is utilized to derive the performance bounds and assess the efficacy of the WDiSOMP estimator. Finally, the performance and the theoretical findings are evaluated via the comparison of WDiSOMP with Distributed Simultaneous Orthogonal Matching Pursuit (DiSOMP), local SOMP, and a centralized approach based on Structured SOMP (SSOMP) in terms of the MSE.


I. INTRODUCTION
The millimeter wave (mmWave) spectrum plays a key role in the implementation of 5G (and upcoming generations) communication systems. The available bands are between 30 GHz and 300 GHz (i.e., wavelengths between 1 mm to 10 mm), and the large unoccupied frequencies in these bands can be exploited to meet the increasing demand for transmission capacity [1], [2], [3], [4]. However, the transmitting signals in the mmWave bands confront high penetration loss and propagation attenuation, restricting the communication range [4]. Massive Multi-Input, Multi-Output (MIMO) and directional beamforming (facilitated by the short wavelength of mmWave signals) have been leveraged in the relevant The associate editor coordinating the review of this manuscript and approving it for publication was Bilal Khawaja . literature to moderate this loss and ensure reliability in communication.
The incorporation of mmWave MIMO antenna systems in the physical layer of 5G networks [5] has increased their transmission efficiency (spectral and power), throughput and network coverage, ensuring the successful support of various evolving technologies, such as Vehicular-to-Vehicular (V2V) or Vehicular-to-Everything (V2X), Device-to-Device (D2D), and Device-to-Everything (D2E) communication systems, Machine-to-Machine (M2M)/ Internet-of-Things (IoT), Augmented Reality (AR), Virtual Reality (VR), Smart City, etc. [6]. MmWave is a key technology for the deployment of Heterogeneous Networks (HetNets) in 5G IoT (such as smart homes, smart cities, e-health) [7].
To fully exploit the potential of these applications, channel knowledge is required among all the involved antenna pairs of the participating devices. In practical scenarios, the mmWave (massive) MIMO channel is sparse in the angular domain due to the narrow-angle spread and limited scattering in the propagation environment at the base station (BS) side [8]. Thus, a few significant paths are actually active (and, essentially, fewer than the number of transmitting/receiving antennas) and contribute to the transmission. In recent years, Compressed Sensing (CS) techniques have been widely used for channel estimation tasks [9], [10]. CS theory can be leveraged to estimate the channel with reduced training overhead by identifying the parameters involved in the commonly adopted geometric channel model (channel gains, Angles of Departure (AoD) and Angles of Arrival (AoA)), related only to those paths, rather than estimating the channel matrix. In the following sections, related works in the domain of channel estimation with CS-based approaches, the motivation and the contribution of this research will be described.

A. RELATED WORKS
Numerous state-of-the-art solutions for mmWave channel estimation are outlined by Hassan et al. in [10]. In the relevant literature, the problem is mainly addressed either in point-to-point [8], [11], [12], [13] or in point-to-multiple points frameworks [14]. In addition, the sparse channel recovery approaches [15] can be employed either individually [11], [16] or centrally (e.g., at the BS) [8], [17], [18]. In the centralized solutions, the CS algorithms exploit the common sparsity patterns among the channels of nearby devices that are manifested in their measurements due to the transmission environment and, thus, enhance the estimation accuracy or reduce the training overhead. The downlink pilot training overhead would be prohibitively in MIMO systems with large-scale antenna arrays. Exploiting already established CS techniques and the hidden sparsity in sparse channel matrices, efficient channel estimation can be accomplished with fewer training symbols and improved performance due to the limited local scattering at the BS.
More specifically, to further enhance the estimation performance and reduce pilot demand, they leverage the so-called joint sparsity either among devices or among different sub-carriers of single devices due to the shared transmission environment. [8] is one of the first works to establish this property with devices of multiple antennas. In [13], [17], [19], [20], [21], and [22], channel estimation algorithms exploit common sparsity patterns via a distributed-aided approach employed over different orthogonal frequency division multiplexing (OFDM) sub-carriers of single devices. It should be noted that in previous algorithms for channel estimation, such as in the Joint Orthogonal Matching Pursuit (J-OMP) in [8] and the Distributed Sparsity Adaptive Matching Pursuit (DSAMP) in [17], the term ''distributed'' refers to geographically distributed devices that send their data to a central node for processing (e.g., the BS), or engage distributed algorithms executed, however, locally over different sub-carriers of a single device. Here, this term refers to a fully distributed channel estimation approach in which many devices collaborate and exchange information with their neighborhood. Moreover, in [23], Dai et al. impose a non-uniform size burst-sparsity structure in the sparse channel without considering any correlations among the channels of different users. They propose a generic Sparse Bayesian Learning (SBL) framework that jointly detects unknown outliers and bursts and estimates the channel. Furthermore, the authors in [14] introduce a common sparsity model, where the individual non-zero values are treated as outliers, and propose an SBL method that jointly estimates the channel and finds the users' groups. In [24], the authors exploit common sparsity to build a sparse Bayes tensor and Direction of Arrival (DoA) trackingbased channel estimation algorithm for V2X mmWave massive hybrid MIMO systems.
Recent studies see channel estimation as multi-task, which, however, is interpreted differently from this study. Specifically, in [25], the MIMO channel prediction is based on a multi-task least square support vector machine that exploits the spatial correlations between the channels of different antenna pairs instead of predicting the channel per antenna pair independently. Also, in [26], a joint channel estimation scheme is presented for a reconfigurable intelligent surface-assisted mmWave communication system. In particular, it estimates the direct and cascaded channels at the same coherence time instead of learning them separately. Different from the above works, in [27], a multi-task deep learning-based network is proposed to simultaneous design the hybrid precoding and combining matrices under the geometric mmWave channel model. In Table 1, we demonstrate a brief comparative description of our work with recent previous studies. The methods listed in this table are compared, focusing on some key traits, i.e., the assumed properties of the channel, whether multi-task is considered from a sparse representation perspective, the architecture and the type of processing (centralized, distributed, local).
Moving on to the distributed joint sparsity recovery literature [10], [28], several algorithms have been developed for the joint sparsity models JSM-1 (common support and individual values) and JSM-2 (common support), and their generalizations, which, first, were introduced by Baron et al. in [29]. These algorithms either follow a greedy and iterative process (e.g., OMP) or solve an optimization problem (e.g., Least absolute shrinkage and selection operator -Lasso) to estimate the desired sparse vectors or matrices. Here, we will focus on a greedy method, especially OMP, due to its simplicity, fast implementation and lower computational complexity than optimization techniques. However, other sparse coding algorithms can be used, such as matching pursuit (MP) [30]. The distributed versions of subspace pursuit (SP) and orthogonal matching pursuit (OMP), the so-called DiSP and DiOMP, respectively, were first introduced in [31], and they apply to both JSM-1 and JSM-2 models (assuming a priori knowledge of the sparsity structure). Besides, two versions of Decentralized and Collaborative Orthogonal Matching Pursuit (DC-OMP1, DC-OMP2) for JSM-1 have been suggested in [32], taking into account some communication constraints. Compared to DiOMP, DC-OMP 1 recovers the support set with higher accuracy by utilizing a voting mechanism incorporated in the iterations of the OMP. DC-OMP 2 is based on SOMP and is much more accurate than DC-OMP 1 but at the cost of a higher communication burden. It is noted that in the framework of distributed collaborative algorithms, our recent works [33], [34] constitute state-of-the-art solutions for a multi-task joint sparse representation model. Compared to their predecessors, they follow an adaptive process via the employment of adaptive weights majority voting and only need to know the total sparsity level.

B. MOTIVATION
The constantly increasing use of smart devices and multimedia services, as well as the demand for high data rates, high energy efficiency, low latency and high quality of service (QoS), have made fifth-generation (5G) networks and beyond more imperative than ever. 5G networks address the emerging challenges by considering i) massive MIMO systems in the mmWave range, ii) full-duplex (FD) transmissions, iii) D2D communication protocols and iv) network densification with heterogeneous networking architectures, where various types of cells (e.g., small, large) and access technologies are expected to coexist. In the near future, direct D2D communication is regarded as an essential and promising technology, e.g., for extending capacity, enabling scalability and traffic offloading in various short-range applications, including IoT, M2M, and emergency networks [35]. In the vision of 5G networks, users organized into small groups will exchange information, e.g., about the network and channel status, cooperate in various operations of common interest and employ new technologies in favour of coordinated communication [36].
A thorough search of relevant literature shows that the trend toward distributed and cooperative schemes also emerges in the physical layer, with some indicative, yet notable, examples being beamforming [37], [38] and spectrum sensing [39]. For the first time, the present study focuses on the problem of fully-distributed channel estimation for mmWave MIMO systems and proposes novel and efficient schemes that enable devices to cooperate for estimating the involved channels.

C. OUR CONTRIBUTION
As observed in Table 1, our method is distinct from the sparsity-induced multi-task and processing perspectives. In particular, the main innovations of this research are the following: • The measurements model allows the proposed solution to be directly applicable to scenarios with either single-antenna devices, following a Single Measurement Vector method, e.g., OMP, or multi-antenna devices, utilizing a Multiple Measurement Vector method, e.g., SOMP.
• A multi-task framework is elaborated under which the mmWave MIMO sparse channel is modeled and estimated, exploiting the joint sparsity among the channels of proximal devices surrounded by global and some common scattering clusters.
• A fully distributed cooperative channel estimation algorithm, WDi(S)OMP, is suggested that undertakes the multi-task commonalities among the sparsity patterns of adjacent devices in an efficient manner. WDi(S)OMP remains well-performing even when non-uniform overlapping sparsity groups occur in the network. It is achieved by incorporating a new voting mechanism which estimates an individual weight per support set index, not a global one as in our prior scheme [34], and further improves the channel estimation accuracy and reduces training time by the BS.
• A performance evaluation of WDi(S)OMP is presented from various aspects. More specifically: -The new algorithm requires a low number of iterations, as it properly exploits the channel sparsity level, which, in mmWave communications, is small. -The overall complexity is discussed at each stage of the algorithm per iteration -communication round. -The associated communication overhead due to collaboration is analytically captured. Despite the overhead, the distributed approaches, such as the one presented here, might be preferable in many cases encountered in practice (e.g., cases with low channel sparsity appearing in the mmWave range and/or multi-antenna end-users) in contrast to adopting centralized processing at BS. -A theoretical analysis is conducted where the MSE is analytically computed assuming an oracle estimator as a benchmark, from which we evaluate in the experiments how well the suggested estimator behaves and identify the parameters that most affect our solution to attain the minimum error values. The evaluation of the algorithm is demonstrated via simulations, and it turns out that it performs significantly better than the existing ones. Several experiments have been conducted based on the elaborated multi-task sparsity model to evaluate the channel matrix recovery performance in terms of the number of measurements (i.e., training overhead) and noise conditions (i.e., Signal-to-Noise-Ratio (SNR)).
The rest of this paper is organized as follows. In Section II, the data model is described. In Section III, the joint sparse channel model is elaborated in detail, under which our estimator will be designed. Then, Section IV presents the problem formulation for the mmWave channel estimation. In the following, Section V shows the proposed distributed MIMO downlink channel estimation algorithm with the aid of the CS theory. Moreover, Section VI analyzes the proposed method's required iterations and computational complexity, the respective transmission overhead, and the impact of support set and angular grid resolution on reconstruction quality. In Section VII, several experiments are presented to verify the efficiency of the proposed approach, accompanied by the appropriate discussion. Finally, Section VIII outlines conclusions and future directions of our research.
The following notations are used in this paper. Uppercase bold letters are matrices, lowercase bold letters are vectors, letters with a hat are estimations, (·) T denotes the transposition, (·) H denotes the complex conjugate transposition, E[·] means statistical expectation, Tr(.) denotes the trace of a matrix, I is the identity matrix, λ min (·), λ max (·) are the minimum and maximum eigenvalue of a matrix; supp(h), supp(H) are the index set of the non-zero entries of vector h and the non-zero entries columns of matrix H, respectively; A S denotes the sub-matrix formed by collecting the columns of A whose indexes are in set S; A F , A and a denote the Frobenius norm, spectrum norm of A and Euclidean norm of vector a respectively.

II. THE SYSTEM MODEL
In this paper, we will focus on downlink transmission. BS of N t antennas communicates with K devices of N r antennas to transmit (broadcasts) T training symbol vectors which are concatenated in X ∈ C N t ×T in T time slots via a narrowband, block-fading mmWave MIMO channel. Each device receives the signal where matrix H k ∈ C N r ×N t captures the mmWave downlink MIMO channel from the BS to the k-th device. Also, Y k ∈ C N r ×T and N k ∈ C N r ×T denote the concatenated received signal and additive noise vectors, respectively. The elements of the latter are modeled as random vectors with elements being independent and identically distributed (i.i.d.) and complex Gaussian distribution with zero mean and variance equal to σ 2 .
Since mmWave channels are expected to have limited scattering on the BS side, a geometric channel model is adopted where each cluster of scatterers is assumed to contribute multiple propagation paths between the BS and device k. Under this model, the physical channel matrix H ∈ C N r ×N t between the BS and each device can be given as [40] where L AoD represents the number of AoDs, each of which contributes L AoA arrival paths on the device side, α p,q is the complex gain of the q-th path on the device side that departs from the p path on the BS side, φ p , θ p,q are the p-th AoD and its corresponding q-th AoA, and a t (φ p ), a r (θ p,q ) are the antenna array response vectors at the BS and devices, respectively. The matrix form of the channel model in (2) can be written as where A t = a t (φ 1 ), a t (φ 2 ), . . . , a t (φ L AoD ) , A r = a r (θ 1 ), a r (θ 2 ), . . . , a r (θ L AoA ) , H a ∈ C L AoA ×L AoD is a sparse matrix whose non-zero complex values are associated with the channel path gains.
Assuming Uniform Linear Arrays (ULAs), the transmit response vector a t (φ p ) is defined as while the receive response vector a r (θ p,q ) is given as where λ is the signal wavelength at mmWave frequency and d the inter-element spacing.
Since the quality of CS-based channel estimation is limited by the quantization error in the angular domain representation, the AoDs and AoAs (or the corresponding atoms) are taken from high-density grids of G r L AoA and G t L AoD points, obtaining overcomplete dictionaries D t ∈ C N t ×G t , D r ∈ C N r ×G r [40], correspondingly. Then, the channel matrix H can be represented in terms of a non-diagonal sparse matrix H v ∈ C G r ×G t , containing the path gains of the quantized angles, as follows The representation in (6) constitutes a discretized approximation of the channel matrix that can be exploited to cast the estimation of H to that of detecting some non-zero coefficients in the sparse channel matrix H v . Focusing on the rows of the channel matrix H v , they will contain only a few non-zero values corresponding to the AoDs, due to the sparsity of mmWave channels on the BS side. Notice that the number of non-zero values in each of those columns is exactly equal to L AoA .
In the special case of ULAs, with d = λ 2 and quantized AoDs and AoAs selected from uniform grids of resolution equal to transmitting and receiving antennas (G t = N t and G r = N r ), respectively, the dictionaries D r ∈ C N r ×N r and D t ∈ C N t ×N t result in unitary Discrete Fourier Transform (DFT) matrices. This case is known as the Virtual Channel Model [41].

III. JOINT SPARSE CHANNEL MODELING
In this section, the adopted joint sparsity model for the involved channels will be presented. This model captures the different tasks in the network, which in [42] is called multi-tasking with overlapping parameters.
A set of K geospatially distributed devices are interconnected by a graph G(V, E), where V = {1, 2, . . . , K } is the set of devices (or nodes) and E = {(i, j) : Node i ∈ V is connected with node j ∈ V}.
As illustrated in Figure 1, the scattering nature of the transmission environment may lead to the devices grouping according to the scatterers getting involved in their channels. As stated in [8], the devices that belong to the same group may face and thus be affected by the same scatterer. It means that the sparse channel matrices H v k share the same sparsity support set on the BS side, but the corresponding path gains may differ due to the surrounding area of each device. To improve the downlink channel estimation performance, we exploit the previously described relationship on the sparsity support sets among the involved channels of the devices [8], [17].
More specifically, the scatterers on the BS side can be clustered into global and common, from now on referred to as g and c j , which translates the corresponding AoDs into global to all devices in the network and common to particular subgroups. From this observation, we conclude that both the columns (otherwise atoms) of the employed dictionary D t are also global and common, respectively. Following [28], the physical channel matrix structural sparsity is discriminated into 1) Sparsity Support within Channel Matrix: The rows within H v k usually have the same column sparsity support, namely there exists support set S k in H v k , such that

2) Global and Multiple Common Supports between
Different Channel Matrices: The channel matrices of different devices are correlated, especially when the devices are physically close to each other. Based on this condition, different devices and their respective (sparse) channel matrices tend to share some global and, partially, common local scatterers on the BS part. Based on the previously established sparsity cases, the sparse representation matrix H v k of device k can be written as where H g k represents the sparse matrix whose row support set is globally shared among all devices in the network and H c j k captures the sparse matrix whose support set is commonly shared among different subgroups of devices in the network. Also, I k is a set of indices j = {1, 2, . . . , J } indicating the maximum number of common interest support sets per device k.
Moreover, it is assumed that the rows of H g k and H c j k s of each device k share L g non-zero columns and L c j non-zero columns, respectively. So, the sparsity pattern of a device k is divided into global and multiple common interest support sets. The column sparsity is induced by the assumption that the resolution of the departing angular grid points is much greater than the number of the active transmission paths, i.e., L AoD G t . Considering a scenario of multiple devices, the AoDs of the devices' channels impacted by the same scattering clusters are similar [8], [14]. It means that they share the same non-zero columns (i.e., global and common, respectively) in their sparse channels, as shown in Figure 2, which motivated us to investigate the channel estimation via a distributed algorithm that estimates the shared support sets in a collaborative manner. In the following, some definitions will be provided.
and devices in the overlapping areaH v 5 ,H v 6 assuming N r = 2, G r = N r and G t = 256 where red, green and blue bins stand for the global and partially common atoms with total sparsity level L AoD = 6. = {1, 2, . . . , G t } denotes the set of atom indices (namely, the number of AoDs) in the overcomplete dictionary D t .
Definition 1 (Global Interest Support Set): Let the sparse representation matrix H g k , whose columns are globally shared with only L g nonzero entries. The global interest support set is defined as where S g ⊂ and |S g | = L g (l 0 sparsity). Definition 2 (Common Interest Support Set): Let the sparse representation matrix H c j k with only L c j nonzero entries. The common interest support set is defined as where S c j ⊂ , |S c j | = L c j (l 0 sparsity) and C j ⊂ V a set of devices indices that are impacted by the common scattering cluster c j , for j = 1, 2, . . . , J , and are concerned about S c j . Based on the previous definitions, the support set S k of device k is formulated as S k = S g ∪ {S c j k }, for all j ∈ I k , meaning that the joint sparsity L AoD,k ≤ L g + j∈I k L c j . Here, despite S g being globally shared among all devices in the network or S c j being the same among particular devices in the same subgroup C j , it doesn't hold for the respective non-zero values of H g k and H c j k which remain individual and possibly independent among the devices.
Thus, taking into account the aforementioned sharing of support sets, the channel H k of device k can be written as From (10), we see that the column sparsity in According to the adopted formulation, D r is hidden in the sparse matrixH v k which will be the matrix of interest in the proposed algorithm. Here, the columns of D r are fixed to L AoA , i.e., as if we had the receiver response matrix A r ∈ C N r ×L AoA , and the joint sparsity is only observed in the AoDs. In summary, considering (10), our aim is to exploit the joint sparsity sets of neighboring devices to estimate the sparse channel per device in a collaborative and distributed manner. Such an approach will be elaborated on in the following.

IV. MULTI-TASK MMWAVE CHANNEL ESTIMATION
In this section, the problem formulation for the multi-task mmWave channel estimation will be presented. Given the geometric channel model representation in (10), the problem of interest will be addressed as a sparse reconstruction based on compressed sensing, assuming an overcomplete D t for the AoDs sparsifying matrix.
Employing per device k the representation (10), we can rewrite the measurements model in (1), in compact form, as Also, each column ofȲ k , which constitutes the Single Measurement Vectors (SMV) analogue, is captured bȳ According to the Multiple Measurement Vectors (MMV) model [43], in (11), each device k exploits simultaneously the low dimensional measurement vectors, {ȳ k,r } N r r=1 , from all antenna elements, that share the same measurement matrix and sparsity support sets S k , to recover the high dimensional G t × 1 sparse channel vectors between all receiving antennas and BS. Let . . ,H v K } be the collection of parameter matrices across the network. Through (11), the objective of each device k is to determine the sparse AoDs, which is equivalent to estimating the rows support set ofH v k ∈ C G t ×N r . Each device k, ignoring the joint sparsity channel modelling, F on its own, locally (without cooperation) or the aggregated optimization function K k=1 J k (H v k ) could be used in a centralized manner, as in [20]. However, in this paper, we follow a third direction that leads to a fully distributed approach. More specifically, by taking into account the joint row sparsity between thē H v k s of different devices, individual but related cost functions J k (H v k ) can be adopted, thus, allowing us to formulate the problem as a multi-task one [42]. The relatedness lies in the fact that the support set of a device k is S k = S g ∪ {S c j k }, e.g. with global and partially common elements. The common elements with the S l s of its neighborhood L in k will be taken into consideration in the collaboration step of the proposed approach, where the (joint) support set is estimated through adaptive weights voting. Hence, the optimization problem can be formulated locally as (15) Accurate retrieval of the sparse channel matrixH v k is only guaranteed when the measurement matrix satisfies specific properties from CS like the Restricted Isometry Property (RIP) and mutual coherence. As this matrix is dependent on both the training symbols and the transmit (overcomplete) sparsifying matrix D t , the symbols can be properly designed to acquire a more optimized measurement matrix to ensure that the channel is reconstructed successfully with a high probability. Such issues have been addressed, for example, in [16] and [44], but, here, the symbols are selected randomly, which still makes the measurement matrix ''well-behaved''.

V. VOTING-BASED MULTI-TASK DISTRIBUTED SOLUTION
In this section, the main stages of the proposed algorithm, WDiSOMP, are analysed. Since the dictionary D t in (10) is predefined, the channel estimation is equivalent to the estimation ofH v k in (11), assuming, in the transmission environment, the existence of sparsity division captured in (7) or equivalently in (10). More specifically, WDiSOMP was designed to estimate the sparse channel in a collaborative distributed manner. The collaboration stage concerns only the support sets S k s, while the non-zero values of the path gains are individual and, thus, estimated via Least-Squares (LS) locally. Our solution employs a weighted majority voting mechanism with adaptive weights that can be estimated either at the support set or index level. With the incorporation of this step, the proposed algorithm only needs to know the total number of departing paths or angles L AoD .

A. PROPOSED SOLUTION
The algorithm operates in stages (see Algorithm 1). The first stage initializes the process, and the second one is executed iteratively, consisting of a collaborative (through voting) and a local step (where the SOMP runs to make an individual estimate of the sparsity support set) and the final one that concerns channel estimation is executed locally.
In the Initialization stage, the BS, for T time slots, transmits to all devices in the network a number of training symbols, represented by matrix X. Then, each device k collects all measurements in Y k and executes SOMP (assuming the overcomplete sparsifying matrix and an empty support set) to get an initial local estimate of its complete support setŜ k .
In the Iterative stage, each device k, ∀k, transmits itsŜ k to its neighbors denoted by the set L out k , and receives the estimatesŜ l 's from its neighbors, ∀l ∈ L in k (communication step). At iteration i of the algorithm, each device exploits the receivedŜ l 's and engages a weighted majority voting mechanism to determine the i most voted indices. Then, SOMP is executed locally to acquire the remaining non-zero indices, up to L AoD , and againŜ k s are communicated over the network. These steps are repeated for L AoD iterations. It is assumed, here, that all devices communicate via a connected network and can exchange their support sets indices by engaging in a suitable D2D communication protocol [45]).
Each device k exploits the estimated setŜ k on (11) to retain only these specific columns of , apply LS to find the associated non-zero values ofĤ v k and through (10) its channel matrix.
The above stages and the corresponding steps are also illustrated in Figure 3, where the information flow is depicted per device k. The BS broadcasts the same training data to all devices in the network. Then, each device exploits the acquired measurements to initialize the process and iteratively (via sharing -collaboration) estimate the desired support set. Finally, each device k utilizes the estimated support set to recover the sparse channel representation and, thus, the channel matrix. The aim of adopting a weighted voting mechanism was to make the algorithm WDiSOMP robust to different conditions. In particular, this step (step 1 of stage II) was motivated by the fact that a device may not have the same impact on the support set retrieval, which can happen for several reasons. First, even if the devices have the same sparsity profile, they may face different communication conditions (i.e., noise level). Also, different common support sets among device groups may exist on the unknown sparsity patterns. Finally, the (hidden) sparsity order of the support sets and the grouping of devices are not usually known. For these reasons, two weighting methods are selected in the majority voting mechanism. The first method can cope with these cases with satisfactory performance [33] except for a case [34] where part of the support set of a device in the network is partially shared with different subgroups of devices, such that all devices' support sets have the same cardinality. For this purpose, the second method is suggested.

Algorithm 2:
WMVoting More specifically, at each iteration i of the algorithm, device k employs a zero elements vector z k,i of size G t and activates the positions indexed by each receivedŜ l,i , adding the weight a lk at the relevant positions for all l ∈ L in k . Afterwards, the device k, through function 'bestindices', finds the set of indices corresponding to the i largest amplitude components of the final z k . These steps are captured in Algorithm 2. The weights a lk may be updated assuming either of the two following adaptive weights methods similar to the one in [46].

1) WEIGHT METHOD BASED ON SUPPORT SET AS SINGLE ENTITY
We introduce the instantaneous metric that measures the number of different elements in the involved sets. We further smooth the quantity b lk,i through a first-order filter, say, where v ∈ (0, 1) is a small positive factor. Then, the combination weight assigned for each index j by device k to device l is approximated by At this point, it should be noted that the total number of common indices could be measured to determine the b lk,i s and the corresponding weight a lk,i . In either case, all indices of a neighbour are graded with the same weight assigning the same significance to all indices even if some of them aren't included in the device's support set. This mechanism can efficiently identify devices' groups with i) similar sparsity patterns and different noise levels (per group) and ii) non-overlapping but common sparsity patterns inside the groups [33]. To manage the existence of overlapping sparsity profiles, like the one depicted in Figure 1, we assumed a per index weighting method in the majority voting mechanism.

2) WEIGHT METHOD BASED ON PER-INDEX SUPPORT SET SIMILARITIES
Here, our aim is to give an individual weight for each index of the support set (L AoD in number) of all devices. To achieve this, each device k votes with '1' the support set indices that belong to its neighbor l and its own support set. Then, a candidate index approved by the majority of neighboring devices wins. At each iteration i, each device k builds a matrix Z k ∈ R G t ×|L in k | (initialized with zero values), which is updated by ones only in positions dictated byŜ l,i ∩Ŝ k,i−1 = ∅ and fed as input in (19). At iteration i, the weight that device k assigns to its neighbor l per support set index j is estimated as follows for l ∈ L in k , k ∈ K and all j ∈Ŝ k ∩Ŝ l . The elaborated weight methods are exploited in the voting mechanism in Algorithm 1 to enhance the channel estimation performance. In the following section, we will analytically estimate the error bounds of the proposed estimator.

VI. PERFORMANCE EVALUATION
In this section, we analyze the performance of the proposed estimator from various aspects. More specifically, the analysis will concern i) the iterations of the algorithm, ii) the computational complexity, iii) the communication overhead of WDiSOMP compared to the rest considered schemes, iv) the channel estimation error in terms of the support set reconstruction quality and, finally, v) the impact of angular grid resolution assuming an overcomplete dictionary to minimize the channel quantization error.

A. REQUIRED ITERATIONS BEFORE TERMINATION AND COMPUTATIONAL COMPLEXITY
In Algorithm 1, the iterations of WDiSOMP are fixed, constant by construction, and equal to L AoD . Also, they show how many times the devices exchange their support set indices (communication step) in the network. The communication step includes local processing, which is highly dependent on the available energy of the device, especially for devices with limited resources (e.g., battery-dependent) to run an energy harvesting process. In this study, we assume that the devices satisfy the minimum power requirements to solve the assigned tasks. SOMP is executed locally at each iteration of WDiSOMP, exploiting the obtained part of the support after voting to build an estimate of the full support set until its size is equal to L AoD . The number of internal iterations of SOMP is exactly L AoD − i (since |Ŝ k,voting | = i). As a result, in WDiSOMP running time, the total iterations of SOMP is Moving on to the computational complexity of the proposed algorithm, we consider that operations like addition, subtraction, division and multiplication of any two numbers have computational complexity O (1). Also, the complexity for two matrices multiplication of size M × N and N × P is O (MPN ).
Let us focus on communication round i. As we see in Figure 3, the WDiSOMP algorithm executed locally at each device k has two specific processing steps that actually contribute to the overall computational complexity. In particular, first, the complexity of SOMP is on the order of O(N r G t Ts i ) [47], where s i = L AoD −i is the size of the support set that is estimated at iteration i. Second, the suggested voting mechanism contributes a small cost to the complexity of the algorithm due to the adaptive weights estimate. Both weighting methods should first compute the involved b lk,i s. In the first method, the cost mainly derives from the fact that each device k should measure the number of different indices between its support set and one of its neighbors (|Ŝ l,i \ S k,i−1 |). In the second, per index method, the respective cost stems from i) the intersection operation (Ŝ l,i ∩Ŝ k,i−1 ) among the support set of device k and its neighbors, and ii) the term vZ k (j, l), j = 1, 2, . . . , L AoD whose activation depends on the previous step. In either case, the cost is in the order of the support set size or, else, sparsity level, i.e., O(L AoD ). After this sub-step, the cost of estimating the desired a lk,i for each one of its K −1 neighbors (where K −1 captures the |L in k |) and itself involves simple operations, K − 1 sums (denominator) and 1 division, thus the cost is O(K ). In the former method, the weights a lk,i are the same for all indices, while the latter estimates a weight per index.
Having available the weights a lk,i , the computational complexity of the majority voting (Algorithm 2) is determined by: i) the number of additions to the zero-elements vector z k , which are equal to L AoD (the support set size or sparsity level) for each of the K − 1 neighbors and the device itself, or, else, KL AoD in total, and ii) the sorting method used in the function 'bestindices' to find the i most voted indices. In our case, the number of elements to be sorted is small and equal to the support set level L AoD .
Finally, at Stage III of Algorithm 1, each device k uses an LS estimator to acquire the non-zero involved coefficients (i.e., the channel gains). This stage adds a cost which depends on the computation of the pseudo-inverse of matrix , keeping only that columns dictated byŜ k , i.e. O(L AoD T 2 + L 3 AoD ), and then by its multiplication with the measurements' matrix imposing an additional cost of O(L AoD TN r ).
It is noted here that, apart from the voting mechanism, the complexity of the other two local processing methods (namely, SOMP and LS) depends on the adopted algorithms. As this paper aims to study, for the first time, a fully-distributed channel estimation algorithm enhanced by carefully designed voting mechanisms that improve the cooperation among the devices, two well-known simple algorithms have been chosen. However, it should be clear that other candidates with lower complexity can also be selected.
Before moving on to the next section, let us compare the complexity of the proposed algorithm with the ones that will be compared in the simulations section. WDiSOMP has a slightly higher complexity than DiSOMP due to the employed voting mechanism. However, as demonstrated in the experiments, its performance is greatly improved, while there is no requirement for knowing beforehand the sparsity structure of the involved channels. As for the locally executed SOMP and the centralized SSOMP (namely, the one executed after all data are collected to a central point), the computational complexity is on the order of O(N r G t TL AoD ) and O(KN r G t TL AoD ), respectively. Again, as it will be demonstrated, their performance is worse. In the case of the centralized solution SSOMP, we assume that each device k transmits to the BS separately the real and complex part of the measurements inȲ k ∈ C T ×N r , assigning q bits per value, which corresponds to 2N r Tq bits from each device to BS. Then the BS, by employing SSOMP, utilizes the measurements from all devices to estimate the joint support set, whose indices are fed back to the devices delivering L AoD log 2 (G t ) bits. Hence, the total number of bits amounts to Moving on to the distributed algorithms, let us, first, assume a network topology in which each node k transmits to its L out k neighbors the L AoD indices of the candidate support set derived by applying SOMP locally. In (W)DiSOMP, only L AoD indices from {1, . . . , G t } (where G t denotes the number of atoms) are transmitted, and each index is supposed to be encoded with log 2 (G t ) bits, which entails L AoD log 2 (G t ) transmitted bits. Each device transmits in each iteration the full support set to its neighbors, and this process is repeated L AoD times until the full sparsity level is achieved. Given that the proposed approach is fully distributed, |L out k | = K − 1, the total transferred bits amount to From (22), it is shown that the transmission overhead of the distributed schemes is a quadratic function of the sparsity parameter L AoD , considering the rest parameters constant. In Table 2, we show the T c o in terms of the feedback quantization bits q, for N r = {1, 2, 4, 8}. It is observed that the T c o cost is significantly impacted by the increase in receiving antennas and feedback quantization bits. In Figure 4, we capture the impact of training symbols T in the T c o , T d o of SSOMP and WDiSOMP, respectively, assuming devices of single, two and four antennas. We have set the resolution of the AoDs dictionary (e.g., the number of atoms) equal to G t = 512( N t = 128). The communication overhead of the distributed solutions is independent of the receiving antennas and thus kept at a constant level. Also, it is observed that the distributed solutions prevail for N r > 2. These parameters only affect the complexity of the centralized solution, making the required communication cost prohibitive when the involved devices have many antenna elements.  Table 3, the centralized and distributed solutions are assessed in terms of the atoms G t for various q. These outcomes show a subtle increase in the overhead of SSOMP and a higher increase in the one of (W)DiSOMP. In Table 4, the overhead of the distributed solutions is evaluated in terms of the sparsity level, selecting proper L AoD compatible for mmWave systems [17], [19], [48] and setting fixed values at the rest parameters. The same table presents the performance of the centralized solution. Although the increase in the number of receiving antennas is accompanied by an essential increase in T c o , this isn't verified with the G t and L AoD that are involved in the second term of (21).

Moreover, in
From the above, we conclude that the distributed approaches, such as the one presented here, might be preferable in many cases encountered in practice (e.g., cases with low sparsity and/or non-single antenna users).

C. IMPACT OF SUPPORT SET ON RECONSTRUCTION QUALITY
The greedy approaches are based on the LS solution to estimate the complex values corresponding to the estimated VOLUME 10, 2022  support set. Their accuracy may deteriorate if part of the support set isn't correctly identified. To evaluate the proposed CS-based channel estimator and determine a lower error bound, we formulate the problem for the support set reconstruction assuming an oracle estimator, which considers perfect knowledge of the true AoDs.
First, we reform the measurements model assuming (3) to acquire the followinḡ withH k a = (A r H k a ) H ∈ C L AoD ×N r and a = X H A t ∈ C T ×L AoD the measurement matrix for the Least Squares (LS) problem. The problem in (23) is similar to the one in (11) if we keep the L AoD known atoms (or, else, AoDs) involved in the channel, D t is reduced to A t (response matrix structure). Each column of (23) is captured byȳ k,r for r = 1, 2, . . . , N r and is written asȳ withh k,r a =H k a (:, r) ∈ C L AoD ×1 ,n k,r =N k (:, r) ∈ C T ×1 . Given (24), the oracle LS approach estimates the L AoD entries ofh k,r a , requiring H a a be full rank, i.e., T ≥ L AoD and assuming that AoDs are exactly known, bȳ Then, the oracle estimate of H k , if in (10) we keep the L AoD rows ofH v k and D t is substituted by A t , is given by ].
An accurate recovery of the L AoD -row matrixH k a is equivalent to simultaneously recovering N r vectors {h k,r a } N r r=1 . Considering the oracle estimator, which estimates the channel gain under the assumption that A t is known, the MSE can be written as whereh k,r e =h k,r a −h k,o,r a . Then, keeping the equivalent form of channel estimation error in (27), the MSE can be written as The inequality holds due to the Rayleigh-Ritz ratio [49].
Combining (24) with (25) and assuming that E n k,rn From (27)- (30), we conclude that the MSE is lower-bounded as Assuming an orthonormal basis, and specifically the DFT matrix, to construct the AoDs dictionary A t in a , it holds that A H t A t = I L AoD and thus λ min A H t A t = 1, simplifying (31) to Denominator Tr H a a is equivalently written as a ). To attain the lower bound, H a a should be a diagonal matrix with identical diagonal elements. Previous condition holds if and only if the eigenvalues λ i 's are equal. In particular, if XX H = T I and exploiting A H t A t = I L AoD , Tr H a a = L AoD T and therefore The bound in (33) increases linearly with the sparsity level L AoD and the receive antennas N r , while it is inversely proportional to the number of pilots T . The oracle LS exploits the exact support-AoDs information; thus, under the previous conditions, its MSE will be the lower bound of WDiSOMP and every other estimator, which ignores the true AoDs and identifies them through a learning process under noisy conditions. Furthermore, when part of the estimated support set isn't correct, there will be a term in the error that captures the impact of the incorrect indices. This term is expected to become powerful in high SNR conditions adding a systematic deviation from the real support or channel and a respective floor. As part of erroneous indices is shrunk, the floor will gradually disappear, and the estimators' error will tend towards the lower bound. Such an approach could lead to designing more effective weighting methods than the ones elaborated here, suitable for multi-task joint sparsity to correct the imposed error (e.g., due to quantization). Although this study is out of scope here, it constitutes a challenging task which would deserve further investigation.

D. IMPACT OF DICTIONARY RESOLUTION
The aim of this subsection is to study the impact of the dictionary resolution, exploiting its mutual coherence, which will help us determine an upper bound for the error of the quantized channel matrix H in (6). The rationale behind the estimation of the upper-bound is to quantify it based on the quality of D t and , which depends on the number of grid points G t and is measured by the mutual coherence. Assuming an overcomplete dictionary D t to minimize the quantization error, the symbols matrix should be properly designed to ensure the successful retrieval of the sparse channel matrixĤ v k by the SOMP (locally). For this purpose, we exploit (10) to write the error, in an equivalent form, as follows whereH v k = h v k,1 |h v k,2 | . . . |h v k,N r . An upper bound for the quantized channel estimation error is derived, assuming that the supports ofĥ v k,r andh v k,r are similar. Hence, the channel estimation error in (34), based on the Cauchy Schwarz inequality, satisfies First, we will focus to find an upper bound for the term with exponential elements as in (4), is equal to Based on (36), Exploiting the properties of the mutual coherence, µ G S k = µ G S k , thus the off-diagonal entries ofG S k are upper-bounded by µ (D t ). From this observation, all the off-diagonal entries ofG S k can be replaced by the µ (D t ), constitutingG S k . Then, the Gershgorin circle theorem is utilized to find an upper-bound of the λ max I +G S k . In particular, it holds that λ max (G S k ) ≤ λ max (I +G S k ) ≤ 1 + (L AoD − 1)µ (D t ). From this inequality, (35) is rewritten as whereμ = 1 + (L AoD − 1)µ (D t ). Inμ, the term (L AoD − 1)µ (D t ) denotes the imposed overhead in the sparse channel estimation error due to the discretization with resolution G t higher than the transmitter antennas N t . VOLUME 10, 2022 It depends on the mutual coherence of the dictionary D t , which, in turn, depends on the selected resolution G t . When G t = N t , µ (D t ) = 0, andμ = 1. However, for G t N t , µ (D t ) attains values close to 1, makingμ and thus the upper bound of the channel error in (38) on the order of L AoD . In (38), ĥv k,r −h v k,r 2 2 represents the sparse channel vector estimation error of the SMV problem in (12). The proposed algorithm (WDiSOMP) aims to estimate the MIMO channel by exploiting the commonalities in the support sets of the sparse channel matrices of nearby devices. All devices execute SOMP locally (using the measurements collected from all antenna elements), exchange the estimated support sets with their neighbors' and apply the weighted voting mechanism to estimate their support set. To ensure support recovery, the amplitudes of all nonzero elements must be large enough to overcome the noise. Hence, we define h v k,r,min = min m∈S k h v k,r (m) to derive a condition that the (S)OMP correctly identifies the support S k . Suppose that the (where constant β > 0) occurs simultaneously for all antenna elements and thus h v k,r, holds. An upper-bound of this estimation error is derived by considering the Lemma 3 for OMP performance guarantees in [51]. When the support S k is successfully recovered, our estimator operates as the oracle from which the corresponding complex gains are estimated as h v,o k,r S k = † S kȳ k,r and its error is written as where the last inequality is derived by using the definition of B r and (1−(L AoD −1)µ( )) 2 by applying the Gershgorin circle theorem to λ min H S k S k . Combining (38) with (39), the upper bound is as The upper bound of the squared error in (40) shows that the sparse channel recovery performance is affected by the mutual coherence of the measurement matrix , for which µ ( ) < 1/(2L AoD − 1) should hold in order to ensure reliable estimation of the support set in noisy conditions. The upper limit relates to the fact that some of the L AoD involved AoDs in the channel matrix may occupy off-diagonal entries of the Gram matrixG, with magnitudes, at most, equal to µ (D t ). Given an overcomplete dictionary, the training symbols matrix should be properly designed to reduce the magnitudes of these off-diagonal entries. Table 5 lists the right term in (40) (upper bound) of the instantaneous channel error in terms of the mutual coherence µ( ) assuming an overcomplete DFT-based dictionary and β = 0.5, N t = 128, L AoD = 6. Assuming that the training pilots' design achieves µ( ) = 0.05, the obtained upper bound is higher in low SNR regimes and decreases linearly as the noise variance reduces (high SNR). Also, the upper bound increased by 48%, when the G t = 256 with µ (D t ) = 0.637 increased to G t = 512 with µ (D t ) = 0.9. This bound is maintained at low levels as low as possible, and close to zero is the µ( ). To attain µ( ) values close to zero, X should be appropriately constructed, which is out of scope in this work. In our study, the symbols are designed as random Gaussian variables, thus ensuring a ''well-behaved'' measurement matrix and, hence, correct channel reconstruction with high probability.

VII. SIMULATIONS AND DISCUSSION
In this section, the elaborated algorithm is evaluated for the problem of channel estimation in a ULA mmWave massive MIMO system with fully-digital transmitters and receivers by employing SOMP locally.
To be more specific, the proposed distributed scheme is evaluated under two different voting mechanisms in a multitasking scenario. Moreover, its behavior will be compared to the individual SOMP (which is executed locally by each device), DiSOMP (which engages a simple majority voting mechanism) and the Structured SOMP (SSOMP), which is executed in a centralized manner at the BS. The performance measure, namely, Normalized Mean Squared Error (NMSE), is defined as  (17), (19)).
In the following, the NMSE behavior of these methods will be assessed i) in terms of the number of training symbols T = [12,40] for a specific Signal-to-Noise-Ratio (SNR) value, i.e., SNR = 20dB, and ii) for a fixed T = 30 in terms of SNR in the range [0, 30].
We assume a network of K = 10 devices and a BS with N r = 2 and N t = 128 antenna elements, respectively. Following the relevant literature, e.g. [20], [40], a DFT-based overcomplete, predefined dictionary D t is assumed (due to the similarity of the array response vectors with its columns) with resolution G t = 512 atoms. Note that the AoDs and AoAs were selected from [0, π], and the AoAs were randomly assigned. We assume that there are L g = 2 global interest AoDs that contribute to all devices' channels and two different groups of L c = 4 commonly shared AoDs (or channel paths) that contribute to devices' channels 1-5 and 7-10, respectively. Also, 1 out of 10 devices, and specifically device 6, partially shares its L c = 4 AoDs with different subgroups. Table 6 presents the AoDs values that were considered to create the devices' channels and the corresponding grouping. The elements of the training matrix X ∼ CN (0, 1 T ) are i.i.d. random variables [20]. Each device has no prior knowledge of the sparsity order apart from the total sparsity level (L AoD = 6), which is similar to all devices. Figure 5 depicts the average NMSE over all involved devices. It is noticed that WDiSOMP1, which assumes voting on support set level, approaches the behavior of local SOMP. However, WDiSOMP2 achieves superior performance than SOMP, and particularly DiSOMP, due to the employment of the voting method on the index level. For the latter, the floor in the performance is attained for low values of T and remains unchanged even when measurements T are high enough. This behavior is justified by the unknown sparsity structure, which WiDSOMP 1, 2 can learn via the suggested voting mechanisms. Also, WDiSOMP, in either case, attains the performance of SSOMP (the centralized solution), which, nonetheless, needs to know the number of global L g and common L c indices in the sparsity pattern division, as in [20]. Figure 6 demonstrates the channel estimation performance at the two subgroups and device 6, focusing on WDiS-OMP 1, 2. It is reminded that these subgroups comprise devices 1-5 and 7-10, respectively, whose support set is not partially shared and device 6, which shares 4 out of 6 indices in its support set with the devices from different subgroups. It is observed that the behavior of the algorithm follows, as expected, the one observed in Figure 5, namely, follows the average NMSE. The performance of WDiSOMP2 in devices 1-5 enhanced slightly, while device 6, which captures the challenging multi-tasking condition, has considerably benefited from the new weighting mechanism that operates at the index level. The same holds for devices 7-10. Now, we will focus on the second experiment of this study. In Figure 7, we show the average performance of all methods in terms of SNR for T = 30. Then, in Figure 8, the performance of WDiSOMP1 and WDiSOMP2 for the devices 1-5, 7-10 and 6 is captured. Both in this case, it is verified that the behavior of the proposed algorithm, under the per-index weighting method before voting, is superior to exploiting the support set as a global entity (i.e., not per index).
In Figure 9, we isolate the average behavior of WDiSOMP and local SOMP in terms of the noise level for T = {30, 60}. The theoretical lower error bound, presented in Section VI, is also plotted to validate the performance of the proposed scheme for the mmWave channel estimation. More specifically, the lower bound was derived based on (31). For T = 30, WDiSOMP1 has a similar performance to SOMP with a performance gap by WDiSOMP2 for high SNR. When the available measurements per device were doubled (T = 60), the two versions of WDiSOMP presented comparable behavior and, in high SNR, were better than SOMP (with a smaller TABLE 6. Multitask device grouping according to AoDs sparsity patterns for L g = 2, L c = 4.  gap in their performances than previously). Also, in high SNR, the floor in the performance of WDiSOMP relates to the reconstruction error imposed by the mistaken support indices that the devices estimate (to find then the corresponding AoDs) compared to the oracle LS estimator, which assumes prior knowledge of them to estimate the channel. This experiment indicated that, in medium-to-high SNR, WDiSOMP benefited from the per index weighting method when the number of measurements was not enough. In low SNR, it was considerably reduced the distance between WDiSOMP1, WDiSOMP2 and the lower bound. However, they didn't approach the lower bound since the matrix X isn't properly designed, making the eigenvalues λ i s of H a a different, even if it still holds that Tr H a a = L AoD i=1 λ i ( H a a ) = L AoD . As mentioned in Section VI.C, the eigenvalues equality is a necessary condition to approach the lower bound, which in our case isn't satisfied.
From the above simulations, it is noted that WDiSOMP, with the use of either weighting method, can efficiently exploit, under different conditions, the inherent common sparsity structures and its performance is not impacted when various related device groupings coexist.

VIII. CONCLUSION
In conclusion, the channel estimation problem is addressed by a distributed approach that exploits the joint sparsity among proximal devices' sparse channel matrices to estimate the desired parameters collaboratively. The downlink multi-device MIMO channel estimation is solved via WDiSOMP, a fully distributed method that exploits a generalized model for the sparsity support of the involved channel matrices. The enhanced performance of WDiSOMP has been evaluated under the per index weighted voting mechanism when the structure of the sparsity support is unknown and non-uniform under a multi-tasking framework. Also, the transmission overhead of the proposed algorithm was investigated and compared to the corresponding cost of the centralized solution. The outcomes revealed the efficacy of the method under specific configurations of the involved parameters.
A promising future direction is to examine and broaden the proposed method in hybrid architectures [16], [41] since, in this work, fully-digital transmitters and receivers are considered. Furthermore, another goal is to enhance the performance of the proposed algorithm by focusing on the proper training matrix design. This analysis will help to attain the performance bound of the oracle LS estimator and ensure reliable channel recovery under the overcomplete dictionary (as already mentioned in the performance analysis). Another extension of the study could focus on alternative weighting methods in the voting process (quantifying the part of the error related to the deviation of the MSE from the lower bound) to optimize the mmWave channel estimation performance (under such a complicated joint sparsity framework). Finally, a well-known challenging direction is to reformulate the suggested fully distributed cooperative approach under a CS-based federated learning framework in order to benefit from the advantages and capabilities (such as privacy) that this technique provides, exploiting the recent research effort in this field.