Machine Learning-Based Self-Interference Cancellation for Full-Duplex Radio: Approaches, Open Challenges, and Future Research Directions

In contrast to the long-held belief that wireless systems can only work in half-duplex mode, full-duplex (FD) systems are able to concurrently transmit and receive information over the same frequency bands to theoretically enable a twofold increase in spectral efficiency. Despite their significant potential, FD systems suffer from an inherent self-interference (SI) due to a coupling of the transmit signal to its own FD receive chain. Self-interference cancellation (SIC) techniques are the key enablers for realizing the FD operation, and they could be implemented in the propagation, analog, and/or digital domains. Particularly, digital domain cancellation is typically performed using model-driven approaches, which have proven to be insufficient to seize the growing complexity of forthcoming communication systems. For the time being, machine learning (ML) data-driven approaches have been introduced for digital SIC to overcome the complexity hurdles of traditional methods. This article reviews and summarizes the recent advances in applying ML to SIC in FD systems. Further, it analyzes the performance of various ML approaches using different performance metrics, such as the achieved SIC, training overhead, memory storage, and computational complexity. Finally, this article discusses the challenges of applying ML-based techniques to SIC, highlights their potential solutions, and provides a guide for future research directions.

In the past few decades, researchers have drawn attention to canceling the SI in IBFD systems.Generally, the SIC can be performed in analog and/or digital domains.Analog domain cancellation can be performed passively at the radio frequency (RF), i.e., propagation level, using antenna isolation [21], beamforming [28], polarized antennas [44], circulators [45], and/or hybrid junction networks [46].Instead, analog domain cancellation can be carried out actively by generating a pre-processed copy of the SI signal, which is exploited to cancel the original SI signal at the Rx chain.Analog domain cancellations are often incapacitated to suppress the SI signal to the Rx noise floor level.As a consequence, additional focus has been directed to canceling the SI at the baseband level using digital domain cancellation [47], [48], [49], [50], [51], [52], [53], [54], [55], [56].At low or moderate transmit power levels, the digital domain cancellation is typically performed using linear cancelers, which reconstruct an estimated copy of the SI signal based on techniques such as least-squares (LS) channel estimation [47], [49], [53].However, at high transmit power levels, such cancellation only becomes insufficient to entirely suppress the SI to the Rx noise floor due to the stringent non-linear behavior of FD transceiver's components, such as the power and low-noise amplifiers (PA and LNA) [47], [49], [52].Thus, non-linear digital cancellation is applied with the linear cancellation to bring the SI to the Rx noise floor level.The non-linear SIC is conventionally performed using model-driven approaches, e.g., polynomial models, which are shown to fit well in practice; however, they need many trainable parameters that, in turn, translate to higher computational requirements [57].
The major contributions of this work are as follows: r We have highlighted the main challenges and potential research directions for successful adoption of ML approaches for canceling the SI in FD transceivers.The rest of this article is organized as follows.Section II introduces the ML-based FD system model.Section III summarizes the traditional approaches for SIC in FD transceivers.Section IV reviews the up-to-date contributions that apply ML approaches for SIC.Simulation results are presented in Section V, challenges and future directions are summarized in Section VI, and finally, concluding remarks are drawn in Section VII.The detailed organization of this article is depicted in Fig. 1.

II. ML-BASED FD SYSTEM MODEL
The system model consisting of an FD transceiver with single transmit and single receive antennas, RF, and digital cancellation stages is illustrated in Fig. 2. At the Tx chain, the digital baseband samples, denoted by x(n)-with n as the sample index-are firstly distorted by the in-phase and quadrature-phase (IQ) imbalance of the mixer and then by the non-linearities of the PA.The digital equivalent of the baseband transmitted signal at the output of the Tx chain can be expressed as [99], [100], [101] x t (n) with x IQ (n) as the IQ mixer's output signal and (.) * as the complex conjugate operator, whereas M PA , h m,p , and P are the memory depth, impulse response, and non-linearity order of the PA, respectively.In (1), p is an odd number, i.e., the odd-order non-linearities are only taken into account, e.g., p ∈ {3, 5. .., 9}, as the even-order non-linearities are out-ofband and they are filtered by the Rx's analog and digital filters [100].The transmitted signal x t is propagated through an SI channel, forming an inevitable SI at the Rx chain.As a consequence, the received signal at the output of the Rx chain, i.e., at the output of the analog-to-digital converter (ADC), can be written as [127] where w(n) ∼ CN (0, σ 2 ) denotes the thermal noise, which is complex-valued Gaussian distributed with zero mean and variance σ 2 , y SoI (n) indicates the received signal of interest (SoI), and y SI (n) represents the SI signal, which can be expressed as [99], [100], [101]  with h m,q,p as the impulse response of an overall channel containing the total effect of all transceiver impairments, e.g., PA non-linearities, IQ imbalance, and SI channel, and M i as the memory effect introduced by the PA, SI channel delay spread at the Rx, etc.
To better evaluate the capabilities of the SI cancelers to suppress the SI signal properly, we assume, for simplicity, that there is no SoI from any other FD transmit receive points (TRPs) and no mutual interference from any base station transmitting at the same frequency [90], [96], [97], [99], [100], [101]; hence, the received signal at the Rx chain's output will end up with the SI signal plus noise.The objective of the digital SI canceler is thus to suppress the SI to the Rx noise floor level.To that end, we firstly estimate the linear SI channel (i.e., causing the linear SI component) using the traditional LS channel estimation, which is performed for the case of single transmit and single receive antenna as follows [99], [100], [101]: ĥ = X tr H X tr −1 X tr H y tr , ( with (.) -1 and (.) H as the inverse and conjugate transpose operators, respectively.The channel estimate ĥ ∈ C M i ×1 while X tr ∈ C (N tr −M i )×M i , and y tr ∈ C (N tr −M i )×1 are respectively formed as and with N tr as the number of training samples and (.) T as the transpose operator.Upon estimating the SI channel ĥ, the linear SI component can be respectively reconstructed in the training and testing phases as follows: y ts SI,lin = ĥ ⊗ x ts , ( where ⊗ indicates the convolution operator.
is formed from the training samples as is constructed similarly to x tr from the testing samples (not from the training samples), and by replacing N tr with N ts , where N ts represents the number of testing samples.Noting that, upon performing the convolution, the sequences y tr SI,lin are resized to be aligned with the dimension of y tr .
The non-linear SI component, employed to train the ML approaches, e.g., NNs and SVRs, is obtained by subtracting the linear component from the original SI signal 1 as follows: Since the ML approaches are typically trained using real-valued inputs,2 we separate the real and imaginary parts of X tr and construct the input feature map, for the case of the ML algorithms trained using the input samples only.However, for those trained with the input and output samples, z(n) will include a part of the output samples, as will be discussed later in Section IV.Upon constructing the input feature map, Z tr nl , we separate the real and imaginary parts of ỳtr SI,nl to serve as labels for training.Thus, during the training phase of the non-linear canceler, the input feature map Z tr nl is utilized with {ỳ tr SI,nl } and {ỳ tr SI,nl } to generate the modeling functions, f (.) and f (.), associated with approximating the real and imaginary parts of the non-linear SI signal, respectively.The real and imaginary parts can then be respectively predicted in the testing phase as where Z ts nl is the non-linear canceler's testing matrix, which is formed similarly to Z tr nl , but with replacing N tr by N ts .Based on the aforementioned description, the non-linear SI signal is obtained by summing the real and imaginary parts as The estimated SI signal, i.e., after applying the linear and nonlinear cancellations, can be expressed as and the residual SI signal can be written as The total SIC achieved upon applying the linear and nonlinear cancellations can be quantified in dB as with y(n) and y res (n) as the n th samples of y ts and y ts res , respectively.

III. TRADITIONAL APPROACHES FOR SIC IN FD TRANSCEIVERS
Canceling the SI in FD transceivers can be performed using various techniques that span the propagation, analog, and/or digital domains [28], [43], as summarized in Fig. 3.The following subsections briefly review such SIC approaches, discussing their advantages, disadvantages, and/or challenges.

A. PROPAGATION DOMAIN SELF-INTERFERENCE CANCELLATION
Canceling the SI within the propagation domain is typically performed at the early stage of the FD transceiver, i.e., it revolves around the Tx and Rx antennas.Propagation domain cancellation can be accomplished passively using techniques such as antenna separation, coupling networks, phase control, cross-polarization, and/or surface treatments [28], [43], as shown in Fig. 3.Alternatively, it can be done actively using techniques such as active coupling networks, active cross-polarization, and/or Tx beamforming [43].Additionally, antenna interfaces, such as balanced duplexers and circulators, can also be employed, as shown in Fig. 3. Applying the SIC within the propagation domain has the advantage of refraining the SI signal from saturating the front end of the FD Rx; however, in some cases, it may lead to the suppression of the desired signal (i.e., SoI) [28].Also, it can come at the cost of adding a hardware circuity to the FD transceiver.Hence, the focus is directed to additionally canceling the SI in other signal domains, e.g., analog and digital domains.

B. ANALOG DOMAIN SELF-INTERFERENCE CANCELLATION
Canceling the SI within the analog domain is performed in the analog circuits between the antennas and digital conversion stages [28], [43].Analog domain cancellation approaches have been classified based on their architecture, location, and tunability, as depicted in Fig. 3 [43].One of the common architectures for analog domain cancellation is to use digitally-assisted techniques based on auxiliary transmit chains [43].Digitally assisted analog domain cancellation has the advantage of preventing the SI signal from saturating the ADC, especially in mobility channel environments.However, it can result in an auxiliary transmit noise floor desensitization problem at the Rx.In addition to the Rx desensitization, the processing in the analog domain can be very costly and challenging to scale up into a higher number of antennas (i.e., multiple-input multiple-output (MIMO) scenario) [28].The focus is thus directed to additionally canceling the SI in the digital domain, considering that the propagation and analog domain SIC have sufficient performance to provide the optimal dynamic range to the Rx's ADC.

C. DIGITAL DOMAIN SELF-INTERFERENCE CANCELLATION
Canceling the SI in the digital domain is performed after the ADC using techniques such as channel modeling and/or Rx beamforming, as shown in Fig. 3. Digital domain approaches, applying channel modeling techniques, use the fact that the Rx of any IBFD TRP has knowledge of its transmitted signal in order to model the transceiver's impairments.Specifically, in channel modeling-based SIC, linear, widely linear, and reference-based models are applied to approximate the SI channel effects.Additionally, non-linear models, such as Wiener, Hammerstein, Wiener-Hammerstein, and parallel Hammerstein, are employed to model the transceiver's nonlinearities, as shown in Fig. 3. Digital domain cancellation has the advantage that the processing becomes relatively easy to perform and less hardware-expensive compared to the analog domain cancellation [28]; however, it can come at the cost of increasing the computational complexity of the FD transceiver [57]   From the previous discussion, applying the traditional approaches for SIC in FD transceivers can come with challenges, such as imposing extra hardware, higher cost, and/or additional computational complexity.In contrast, applying the ML approaches for SIC in FD communications can relax such requirements, as reported in [90], [95], [96], [97], [99], [100], [101].Given these potentials, more research efforts have recently been spurred to cancel the SI in FD transceivers using ML approaches.This article provides an in-depth survey of using the digital domain SIC based on ML non-linearity modeling techniques to tackle the SIC problem in FD transceivers.

IV. ML-BASED APPROACHES FOR SIC IN FD TRANSCEIVERS
Fig. 4 summarizes the up-to-date contributions for applying ML-based approaches for SIC in FD transceivers.As can be seen from the figure, the SIC in FD systems can be performed using traditional ML approaches, such as NNs and SVRs.Also, advanced ML techniques, such as TC, Tensor-Flow graphs, and random Fourier features (RFFs), integrated with online learning, have been investigated for modeling the SI in FD transceivers.Other ML approaches, such as dynamic regression (DR), GMMs, DU, LL, and adaptive projected subgradient method (APSM), have also been studied for SIC.Among the different ML approaches applied for SIC, one can notice that NNs are the most popular due to their proven capabilities in modeling non-linearities with reduced complexity compared to other ML techniques.In this section, we aim to review and summarize the up-to-date research progress in applying ML-based approaches for SIC in FD transceivers. 4

A. NEURAL NETWORK (NN)-BASED SIC APPROACHES
Broadly speaking, canceling the SI in FD transceivers using ML mostly relies on NNs to make use of their potential compared to other ML approaches.As can be seen from the right-hand side of Fig. 4, a broad range of NN architectures, starting from typical NNs reaching to customized NN architectures, such as grid-based NNs, hybrid-layers NNs, and adaptive NNs, have been introduced for SIC in FD transceivers.The following subsections review and summarize the recent advances in applying NNs to SIC in FD transceivers.

1) TYPICAL STRUCTURES
The first attempt to apply NNs for SIC in FD transceivers is done in [90], where a shallow feed-forward NN (FFNN) is utilized to approximate the non-linear SI signal.The FFNN in [90] is constructed-similarly to the real-valued time delay NN (RV-TDNN) in [128]-from an input layer fed by real-valued inputs consisting of current and past samples of the input signal to consider the FD system's memory effect. 5he current and past samples are then transferred to a hidden layer to detect the FD system's non-linearities and finally to an output layer to estimate the target non-linear SI signal, as can be observed from Fig. 5(a-i).Simulation results show that the RV-TDNN could be beneficial from memory storage and computational complexity perspectives when compared to the polynomial model-a general form of the widely utilized parallel Hammerstein model [122]-at a similar SIC performance [90]. 6The hardware implementation of the NN-based cancelers is investigated in [95], [96], where the RV-TDNNbased canceler is proved to be efficient in terms of area and energy consumption when compared to the polynomial-based canceler at a similar performance.
In [97], a typical recurrent NN (RNN) is introduced for canceling the interference in FD transceivers.The RNN [97] is trained similarly to the RV-TDNN using real-valued inputs consisting of current and past samples with memory.Contrary to the RV-TDNN [90], the RNN employs both forward and recurrent connections to enhance the learning capabilities [97], as can be seen from Fig. 5(a-ii).Applying a shallow RNN-with a single-hidden layer-for canceling the SI in FD transceivers can be beneficial from memory and computational complexity perspectives when compared to the typical RV-TDNN at a similar cancellation performance [97].
In [97], [98], a complex-valued time delay NN (CV-TDNN) is investigated for canceling the FD system's SI.As can be observed from Fig. 5(a-iii), the CV-TDNN has a similar network architecture to that of RV-TDNN [90], while employing only one neuron instead of two neurons at the output layer.As its name implies, the CV-TDNN is trained using CV inputs and labels instead of the real-valued ones utilized in the case of RV-TDNN and RNN.Simulation results show that a shallow CV-TDNN-based canceler could be beneficial in terms of computational complexity when compared to its RV-TDNN and RNN counterparts at a similar SIC performance [90], [97].

2) GRID-BASED STRUCTURES
In [99], two grid-based NN structures, termed as ladder-wise grid structure (LWGS) and moving-window grid structure (MWGS), are introduced for modeling the interference in FD transceivers.The LWGS and MWGS are trained using CV data and built by a grid of connections-analog to nodes in the fully-connected NNs-between the input, hidden, and output layers' neurons, as shown in Fig. 5(b).As their names imply, the LWGS constructs the connections between the layers' neurons based on a ladder-wise topology, while the MWGS employs a moving window technique to arrange the connections, as can be seen from Fig. 5(b-i) and (b-ii), respectively.Using such a grid topology, the LWGS and MWGS exploit a fewer number of connections between the input and hidden layers' neurons to reduce the number of required parameters and, as a consequence, relax the computational complexity compared to the fully-connected NN counterparts.Simulation results indicate that the LWGS and MWGS [99] could achieve a comparable performance to that of CV-TDNN [97] while being beneficial in terms of memory storage and computational complexity.

3) HYBRID-LAYERS STRUCTURES
In [100], two hybrid-layers NN architectures-referred to as hybrid convolutional recurrent NN (HCRNN) and hybrid convolutional recurrent dense NN (HCRDNN)-have been introduced for learning the FD system's SI.The HCRNN and HCRDNN are trained using RV inputs and built using a combination of different NN layers, such as convolutional, recurrent, and dense layers, as shown in Fig. 5(c).The HCRNN and HCRDNN exploit the advantages of each layer in their network design to make use of their combined characteristics to improve the learning capabilities compared to the typical and grid-based NN architectures [90], [97], [99].In particular, the HCRNN relies on a convolutional layer to use the weightsharing property to reduce the number of required parameters and, consequently, relax the computational complexity.Further, it depends on a recurrent hidden layer to use its ability to learn the temporal behavior.On the other hand, the HCRDNN relies on an additional dense layer-added after the convolutional and recurrent layers-to build a highly predictive NN model with low computational complexity requirements.The HCRNN and HCRDNN [100] are shown to be beneficial from the computational complexity perspective while achieving a similar SIC performance compared to the typical and grid-based structures, albeit at the cost of increased memory requirements [90], [97], [99].

4) OUTPUT FEEDBACK STRUCTURES
In [101], two output feedback (OF)-based NN structures, namely two-hidden layers NN (2HLNN) and dual-neurons two-hidden layers NN (DN-2HLNN), have been introduced for canceling the SI in FD transceivers.As their names imply, the OF-based NN structures exploit a part of the output samples-fed back through a buffer to the input layer-to be utilized as features for training.In other words, the OF-based NN structures are trained using an input feature map that considers not only the input samples as features for training but also the output samples, as shown in Fig. 5(d).Feeding part of the output samples for training helps to consider the effect of over-the-air SI propagation delay spread, which in turn enhances the learning capabilities, and as a consequence, improves the SIC performance compared to the NN structures only trained by the input samples.In the 2HLNN, a full connection is established between the input features-including both input and output samples-and the first hidden layer's neurons, as shown in Fig. 5(d-i).However, in the DN-2HLNN, the input features are not fully connected traditionally to the first hidden layer's neurons.The features related to the input samples are connected to one neuron to recognize the input signal's memory effect, while those related to the output samples are connected to another neuron to recognize the output signal's memory effect, as shown in Fig. 5(d-ii).Simulation results [101] reveal that the DN-2HLNN could be beneficial from memory storage and computational complexity perspectives while achieving a similar SIC performance to that of the LWGS, MWGS, HCRNN, HCRDNN, and 2HLNN [99], [100], [101].

5) ADAPTIVE STRUCTURES
In [102], a channel adaptive NN structure, referred to as channel-robust NN (CHRNN), has been integrated with an LS-based linear canceler to model the SIC in FD transceivers over time-varying SI channels.In more detail, in [102], a linear canceler is trained continuously in each frame to estimate the channel coefficients, and a pre-trained NN is then utilized to construct the non-linear SI signal based on either raw or processed channel coefficients, as shown in Fig. 5(e-i) and (e-ii), respectively.For the former, the pre-trained NN is fed directly with the estimated channel coefficients obtained by the linear canceler, whereas for the latter, the pre-trained NN is fed by a processed version of the estimated channel coefficients [102, eq. ( 7)].Simulation results indicate that CHRNN learns well when it is fed by processed channel coefficients rather than the raw ones.Further, the results reveal that the CHRNN-based canceler could lead to time reductions in computational complexity while attaining a similar performance to that of the polynomial-based canceler, adapted to handle time-varying SI channels [102].

6) DEEP STRUCTURES
The concept of DL has also been introduced for modeling the interference in FD transceivers.In [97], deep versions of the typical RV-TDNN, RNN, and CV-TDNN, as shown in Fig. 5(f), have been introduced to model the SIC with lower memory and complexity.Using deep rather than shallow NNs is motivated by the fact that a deep NN with a small number of neurons in each layer, i.e., lower memory storage and computational complexity, can typically generalize better than a shallow NN with a large number of neurons in one layer [89].Simulation results show that a deep CV-TDNN could be beneficial from memory storage and computational complexity perspectives while achieving a similar performance to that of a shallow CV-TDNN [97].However, this is not applicable in all cases, as using a deep RNN increases the memory storage and computational complexity compared to the shallow RNN due to using many recurrent connections.Finally, adapting deep RV-TDNN for SIC results in decreasing the complexity while augmenting the memory storage compared to its shallow counterpart [97].The concept of DL has also been studied for SIC in FD systems in other contexts, such as [103], [104], [105].

B. SUPPORT VECTOR REGRESSOR (SVR)-BASED SIC APPROACHES
Despite being extensively used for SIC in FD transceivers, the NN-based cancelers are prone to some inherent characteristics of NN models, such as intolerable training complexity and less generalization when few examples are available for the training process.To overcome such bottlenecks, the SVRs, variants of support vector machines, have recently been introduced as alternatives to NNs for modeling the SI.The initial attempt of applying the SVRs for SIC is presented in [106] for frequency division duplex (FDD) transceivers-not for FD transceivers-where an SVR model is employed to generate a replica of the undesired transmit leakage-based second-order intermodulation distortion (IMD2) signal.Applying SVRs for SIC in FD systems came after in a few works in [107], [108].The following subsections review and summarize the few attempts to apply SVRs to cancel the SI in FD transceivers.

1) NESTED-BASED APPROACHES
The first attempt to apply SVRs for SIC in FD transceivers is made in [107], where a non-linear time-delay SVR (TDSVR)based canceler is integrated with a linear canceler-in a nested scenario-to suppress the SI signal down to the Rx noise floor level.The nested TDSVR (NTDSVR), shown in Fig. 6(i), is trained using an input feature map that considers the real and imaginary parts of the current and past input samples.Besides, the odd higher-order terms of the input samples (with memory) are also considered for training.The output labels for training the NTDSVR are created by first estimating the SI channel; thereafter, an inverse filtering is applied to remove the effect of the linear SI channel [107].Upon eliminating the channel effect, the output samples, denoted by ỹSI,nes in Fig. 6(i), and including the impact of non-linearity only, are served as labels to train the NTDSVR.After the non-linear SI component is reconstructed, the linear channel is then applied for linear component reconstruction.The estimated SI signal, including the linear and non-linear components, is then subtracted from the original SI signal to perform the SIC.Simulation results show that the NTDSVR-based canceler is beneficial in terms of SIC performance enhancement compared to the conventional non-linear polynomial-based cancelers [107].
2) RESIDUAL-BASED APPROACHES a) RTDSVR: The second attempt to apply SVRs for SIC in FD transceivers is investigated in [108], where a residualbased TDSVR (RTDSVR) is introduced.The input feature map to train the RTDSVR is constructed similarly to the nested approach [107].However, the output labels are created differently based on the residual output signal after applying the linear SIC, as can be seen from Fig. 6(ii).Particularly, in the residual scheme, the linear SI channel is first estimated, and then the linear SI signal's component is fully reconstructed.The estimated linear SI signal is then subtracted from the original SI signal, and the residual SI signal, denoted by ỹSI,nl , and involving the non-linear component only, is utilized for training the RTDSVR.Upon reconstructing the non-linear SI, it is combined with the linear one before being subtracted from the original SI to perform the SIC.Simulation results reveal a superiority of the RTDSVR to improve the SIC compared to the NTDSVR, especially for low or moderate transmit power levels [108]. 7) OF-TDSVR: Investigating the effect of feeding back part of the output samples to be exploited as features for training the SVR-based cancelers has not previously considered in the literature and is examined for the first time in this article, in which an SVR model, referred to as output-feedback timedelay SVR (OF-TDSVR), is integrated with a linear canceler in a residual scheme to suppress the SI signal.Similar to the OF-based NN structures, the OF-TDSVR is trained using an input feature map that considers both input and output samples as features for training, as shown in Fig. 6(iii).As proved for NNs, feeding part of output samples for training helps to consider the effect of over-the-air SI propagation delay spread, which in turn can enhance the learning capabilities and, subsequently, improve the SIC performance compared to the existing SVR-based cancelers-trained only by the input samples.Also, it may be beneficial for reducing the training overhead compared to the existing SVR literature benchmarks.

C. ADVANCED ML-BASED SIC APPROACHES
Advanced ML approaches, such as TC, TensorFlow graphs, and RFFs, integrated with online learning, have recently been introduced for SIC in FD transceivers.The details of such advances are provided in the following subsections.

1) TENSOR COMPLETION (TC)
In [109], a canonical system identification (CSID) approach, based on a low-rank tensor constraint optimization problem, is utilized to approximate the non-linear SI signal as in the case of NNs and residual-based SVRs.In more detail, the CSID approach formulates the SIC problem as a low-rank tensor decomposition problem to be solved using an alternating least squares optimization algorithm.Simulation results [109] indicate that the CSID-based cancelers could achieve similar performance to that of the polynomial and NN-based cancelers [90], [96].Meanwhile, they can be beneficial from the computational complexity perspective at the cost of higher memory storage requirements.

2) TENSORFLOW GRAPHS
In [110], TensorFlow graphs, recent advances in ML, are introduced to cancel the SI in a real-time software-defined radio (SDR).Generally, graphs are exploited in ML to enable ML researchers/developers to write an abstracted version of their ML techniques in the form of data-flow graphs, which can then be utilized and applied to any of the ML algorithms [111].Based on such graphs, in [110], the SIC is performed in real-time SDR based on an NN that employs a Google Ten-sorFlow graph.Simulation results reveal that the TensorFlow graph-based approach could achieve a SIC that can reach the hardware limit and surpass existing digital non-ML-based SIC approaches in the literature [110].

3) RANDOM FOURIER FEATURES (RFFS)
In [112], the RFFs and the least mean-squares (LMS) algorithm are integrated with online linear regression to perform the SIC in FD transceivers.Principally, RFFs are utilized to scale up kernel-based ML techniques by providing a nonlinear transformation of input data to a higher dimensional feature space.So, non-linearities can be efficiently modeled using linear-based techniques in the original space, resulting in scalable, fastly-converged, and computationally efficient solutions [113], [114].Based on this, in [112], the input samples are first transformed using RFFs, then the residual SI signal, after applying the linear SIC, is used with the transformed input to approximate the non-linear SI signal using an LMS-based canceler.The estimated signal is then subtracted from the original SI to obtain the residual SI signal; thereafter, an estimation vector is updated online based on that residual and using an RFFs-based observation matrix.Simulation results show that an online RFFs-LMS-based canceler could be beneficial from SIC and complexity perspectives compared to batch learning algorithms involving NTDSVRs [112]. 8

D. OTHER ML-BASED SIC APPROACHES
Seeking more advantages in other ML approaches investigated in other disciplines, the DR, GMMs, DU, LL, and APSM have been explored for SIC in FD transceivers.The details of such approaches are provided in the following subsections.

1) DYNAMIC REGRESSION (DR)
In [119], a classical DR model is introduced for canceling the interference in FD transceivers.Generally, DR models are exploited in ML problems to identify how related a certain output is to an input and allow future output forecasting.Based on this, in [119], a classical DR model is utilized to represent the memory effect caused by the amplifiers in FD systems.Upon estimating the DR coefficients, the SI signal is jointly estimated in time and frequency domains and is 8 Although the RFFs are integrated with online regression in [112], they are utilized with various ML algorithms in other disciplines, such as [115], [116], [117], [118].
subtracted from the original SI signal to perform the digital SIC.Simulation results reveal that the DR-based SIC approach could achieve a high digital SIC performance and effectively attenuate the SI signal close to the Rx noise floor level.Besides, the DR-based SIC approach is validated using a real-time SDR platform and is able to properly provide a demonstration via video streaming [119].

2) GAUSSIAN MIXTURE MODELS (GMMS)
In [120], an ML approach based on GMMs clustering is introduced to design an FD transceiver, which can detect the desired signal (i.e., SoI) directly without using digitaldomain cancellation or even channel estimation.As the name implies, GMMs clustering uses a mixture, i.e., a superposition, of Gaussian distributions to fit the training data and assign the data points to a certain cluster based on their conditional probabilities [121].In more detail, in [120], the received signal is firstly clustered, and a one-to-one mapping of the symbols, based on a GMMs clustering and an expectation-maximization (EM) algorithm, is utilized to perform the signal detection in each cluster.Simulation results reveal that an FD transceiver, utilizing the GMMs clustering, could achieve a comparable bit error rate with that of FD transceivers employing maximum likelihood detectors when perfect channel knowledge is considered and a better one when the LS/LMS channel estimation is used [120].However, this transceiver is limited to operating scenarios when low-order modulation techniques are employed.

3) DEEP UNFOLDING (DU)
In [122], an ML approach based on DU is introduced for canceling the interference in FD transceivers.DU involves converting the model-based methods, requiring iterative optimization algorithms for solving, into layer-wise structures analog to that of NNs [123], [124].This enables fusing the iterative optimization methods with NNs' libraries/tools to cover a wide range of tasks and applications.The concept of DU is applied for SIC in [122], where a cascade of nonlinear blocks-involving the impact of PA and IQ mixer non-linearities-is exploited with the traditional backpropagation algorithm to approximate the SI signal.Simulation results corroborate that the DU-based SIC approach could be beneficial from memory storage and computational complexity perspectives when compared to the literature benchmarks, e.g., polynomial-and CV-TDNN-based cancelers, at a similar cancellation performance [122].

4) LAZY LEARNING (LL)
In [125], an ML approach based on LL is introduced to perform the SIC in cellular wireless networks operating with FD transmission.As their names imply, the LL-based models postpone the generalization to the training data until a system query is performed.Based on this concept, in [125], offline and online stages are exploited to generate the interference database and transmit the data, respectively.In the offline phase, the FD system's output signal excluding the SoI, is recorded in a database.However, in the online phase-in which the system is fully operated with the SoI-a suitable SI value is looked up in the offline-generated database with the help of a learning approach to perform the digital SIC.Simulation results show that the LL-based SIC approach could be effectively utilized for canceling the interference and enabling the FD transmission in cellular wireless networks [125].

5) ADAPTIVE PROJECTED SUBGRADIENT METHOD (APSM)
In [126], an ML SIC approach based on parallel APSM is introduced for canceling the interference in FD transceivers.Specifically, in [126], a hybrid kernel is first constructed by combining linear and non-linear Gaussian kernels.This kernel is then adapted to a parallel APSM approach where a non-linear function-approximating the SIC problem-is extracted using projection.Simulation results show that the hybrid kernel-based APSM approach could properly model the SI compared to a SIC method employing the normalized LMS filtering [126].Moreover, it can also be parallelized, i.e., it can perform parallel processing to reduce the system latency.
Thus so far, we have surveyed the up-to-date contributions that apply ML-based approaches for SIC in FD transceivers, as summarized in Table 1.The adaption of a particular MLbased approach for SIC depends on the system demands, such as the achieved SIC, training overhead, memory storage, and computational complexity.The following section will help to select a suitable ML-based approach for SIC in FD systems.

V. SIMULATION RESULTS AND DISCUSSIONS
In this section, we provide a case study to compare the performance of the prominent ML approaches, surveyed in Section IV, with that of the polynomial canceler for two test setups (i.e., two training datasets) and using various dataset sizes.Specifically, we evaluate the prominent ML approaches in terms of the achieved SIC, PSD performance, training overhead, memory storage, and computational complexity and compare them with those of the polynomial-based canceler.

A. SELECTED APPROACHES
First, from the NN-based approaches shown on the righthand side of Fig. 4, we select the typical NN architectures, i.e., RV-TDNN, RNN, and CV-TDNN [90], [97]; being the first literature benchmarks to apply ML approaches for SIC in FD transceivers.Further, we select the OF-based NN architectures, i.e., 2HLNN and DN-2HLNN, as proved to be efficient in terms of memory storage and computational complexity when compared to the other NNs [101].Second, from the SVR-based approaches shown on the upper hand-side of Fig. 4, we select the RTDSVR [108] as it is shown to outperform the NTDSVR [107], especially for low or moderate transmit power levels.Additionally, we consider the investigated OF-TDSVR to be compared in reference to the existing NN and SVR benchmarks.Third and last, from the advanced and other ML approaches, shown on the lower-and lefthand sides of Fig. 4, we select the TC [109] and DU [122] approaches, as proven to be efficient in terms of memory storage and/or computational complexity when compared to the RV-TDNN and CV-TDNN, respectively.In the following subsections, we will evaluate and compare the previously selected approaches based on two test setups and using various performance metrics, such as the achieved SIC, PSD performance, training overhead, memory storage, and computational complexity.9

B. MEASUREMENT SETUP
The measurement setup utilized to capture the datasets employed for training the prominent ML-based approaches selected in Section V-A is described in Fig. 7. Herein, an FD testbed, employing one transmit antenna and one receive antenna (1T1R), was set up in an indoor lab environment to generate two datasets [90], [96], [109].The first dataset [90] applies an orthogonal frequency division multiplexing (OFDM) signal with a quadrature phase-shift keying (QPSK) modulation and 10 MHz bandwidth, while the second [96], [109] uses a QPSK-modulated OFDM signal with 20 MHz bandwidth.The average transmit power is set to 10 dBm and 32 dBm in the first and second datasets, respectively.The transmitted and received data are captured at 20 MHz and 80 MHz sampling rate for the first and second datasets, respectively.It is worth noting that using a higher sampling frequency enables the ML approaches to model the higherorder intermodulation distortion terms to efficiently suppress the SI, especially when high-transmit power levels are utilized.
At the Rx side of the FD testbed, total analog (i.e., passive and active) cancellations of 53 dB and 65 dB are applied in the first and second datasets, respectively, to refrain the SI signal from saturating the FD-sensitive Rx chain.The digital received data after the ADC is then captured and retrieved to a personal computer (PC) for offline post-processing.In order   2.

C. PARAMETERS SETTING
The goal of this work is to find the peak performance of each SI canceler, e.g., polynomial, NN, SVR, TC, and DU.In other words, we aim to find the maximum SIC that each canceler can attain.Then, we compare the different cancelers in terms of the training overhead, memory storage, and computational complexity required to achieve their maximum SIC.To that extent: 1) for the polynomial canceler [90], we have optimized the non-linearity order P and memory length M i ; 2) for the NN-based cancelers, e.g., RV-TDNN, RNN, and CV-TDNN, etc. [90], [97], [101], we have optimized the memory length M i along with the NN's hyperparameters, such as the number of hidden layers' neurons n h , batch size (BS), learning rate (LR), activation function, and training optimizer; 3) for the SVR-based cancelers, i.e., RTDSVR and OF-TDSVR [108], we have obtained the optimum value for the memory length M i , regularization term C, margin , along with the kernel hyperparameter, namely γ ; 4) for the TC approach [109], we have tuned the memory length M i , along with the optimization problem's hyperparameters, such as the tensor rank F , number of quantization levels I, regularization parameter ρ, and the smoothness factor μ n ; 5) for the DU approach, we have optimized the memory length M i , and the LR and BS of the follow the regularized leader (FTRL) optimizer as in [122].The ranges for hyperparameter tuning and the optimal values for hyperparameters over the first and second datasets are summarized in Tables 3 and 4, respectively.

D. PERFORMANCE COMPARISON
In this subsection, we assess the performance of the prominent ML-based SIC approaches in terms of their SIC, PSD, training time, memory storage, and computational complexity and compare them with those of the polynomial model.Afterward, we evaluate the efficiency of each canceler according to system demands.All the SIC approaches considered in this analysis are trained using the datasets described in Section V-B, and with parameter settings optimized in Section V-C.

1) SIC PERFORMANCE
The total SIC achieved by different ML-based SIC approaches compared to the polynomial model upon tested using the first and second datasets, and with 2000, 3000, 4000, and 5000 samples is shown in Fig. 8(a) and (b), respectively. 11From the figures, one can observe that in the first dataset, where a low average transmit power is employed, the polynomialbased canceler achieves the highest cancellation performance compared to other cancelers for most of the dataset sizes.However, in the second dataset, where a high average transmit power is utilized, the RV-TDNN-based canceler provides the highest cancellation among the other cancelers for all dataset sizes.It can also be inferred from the figures that the RTDSVR achieves the lowest cancellation performance among the others, even if a low or high transmit power is utilized.Further, one can notice that employing a part of the output samples as features for training the SVR models can enhance the cancellation performance compared to the existing RTDSVR, i.e., the OF-TDSVR attains a significantly higher SIC than the RTDSVR benchmark.In sum, the polynomial canceler could be a good choice when a low transmit power is utilized, i.e., low transmit power generates less non-linearity SI signals.
However, when a higher transmit power is employed, the RV-TDNN could be a better choice, i.e., high transmit power generates higher non-linearity SI signals. 12  11 In this work, we provide a case study to compare the performance of different ML approaches with the polynomial canceler when achieving the maximum SIC (i.e., peak-performance) at short dataset sizes, e.g., 2000, 3000, 4000, and 5000 samples.However, in our previous works in [99], [100], [101], we have compared the different ML approaches with the polynomial canceler when attaining a similar SIC (i.e., equi-performance) at a large dataset size, e.g., 20,000 samples.Accordingly, some of the results obtained in this work may differ from those reported in [99], [100], [101]. 12Although all SI cancelers achieve a high non-linear cancellation in the second dataset compared to that attained in the first, as a result of having increased non-linearity, we interestingly note that the total SIC achieved in the

2) PSD PERFORMANCE
The power spectra of the residual SI signal after applying the different ML-based SIC approaches compared to that of the polynomial-based canceler when tested using the first and second datasets and with 5000 samples, as an example, is shown in Fig. 9(a) and (b), respectively.From Fig. 9(a), one can observe that the polynomial-based canceler is able to suppress the SI signal with the lowest gap to Rx noise floor among former is lower than that in the latter, as can be seen from the sample results in Table 5.This is due to the degradation of the linear canceler's performance with increased non-linearity.

TABLE 5. SIC of Different Approaches When Trained Using 5000 Samples of the First and Second Datasets
values achieved by the polynomial, RV-TDNN, and RTDSVR cancelers, as an example, match those reported in Table 5.

3) TRAINING OVERHEAD
In this subsection, we assess the training time, i.e., fitting time, required by each SI canceler to complete the training process.Specifically, for the polynomial-based canceler, we evaluate the training time needed to estimate the polynomial model's coefficients based on the LS algorithm.For the NNand DU-based cancelers, we calculate the training time as the average training time required over different random seeds.For the SVR models, we approximate the training time as the maximum between the times needed to fit the SV R and SV R , associated with estimating the real and imaginary parts of the non-linear SI signal, respectively, as shown in Fig. 6.Finally, for the TC-based canceler, we evaluate the training time required for fitting the low-rank tensor decomposition problem.Based on the aforementioned, in Fig. 10(a) and (b), we depict the training time of all the ML-based cancelers compared to the polynomial model upon tested using the first and second datasets, respectively.From the figures, it can be observed that the polynomial-based canceler requires the lowest training time among the others even if low or high average transmit power is employed, i.e., even if it is trained using the first or second dataset.Further, one can notice that the RTDSVR shows a good training time, i.e., it requires a lower training time than all other cancelers except the polynomial-based canceler.One can also observe that the SIC enhancement provided by the OF-TDSVR comes at the cost of increasing its training time compared to the RTDSVR benchmark.Additionally, it can be noticed that the TC-and DU-based cancelers require significantly higher training than the others, making them unfavorable choices for SIC, especially for operating scenarios where the training time is of interest.Finally, it can be observed from the figures that typically, as the dataset size increases, the training time of all SI cancelers increases as well.

4) MEMORY STORAGE
In this subsection, we assess the memory storage of different ML approaches in terms of the total number of parameters required in the inference stage and compare it with that of the polynomial model.Specifically, the number of parameters of the polynomial-based canceler is calculated as 2M i + 2M i {( P+12 )( P+1 2 + 1) − 1} [90].Further, the number of parameters of the typical RV-TDNN, RNN, and CV-TDNN is respectively evaluated as 2M i (n h + 1) + 3n h + 2, 2M i + n h (n h + 5) + 2, and 2M i + 2(M i n h + 2n h + 1), with n h as the number of hidden neurons [90], [97].The number of parameters of the OF-based NN structures, i.e., 2HLNN and DN-2HLNN, is respectively calculated as 2M i + 2{n h1 (M i + M o + n h2 + 1) + 2n h2 + 1}, and 2M i + 2(M i + M o + 4n h2 + 3), with n h1 and n h2 as the number of neurons in the first and second hidden layers, respectively [101].The number of parameters of the SVR models, i.e., RTDSVR and OF-TDSVR, employing a radial basis function (RBF) kernel, is evaluated as 2M i + N sv + N sv + 8, with N sv and N sv as the number of support vectors required to approximate the unknown functions of SV R and SV R , respectively [108], [129].the number of parameters for the TC-DU-based is respectively given by 2{M i (2F I + and 2{M ( P+12 ) + 2}, with F and I indicating the tensor and number of quantization levels in the TC approach, respectively [109], summary of the total parameters utilized to evaluate the memory storage of various SI cancelers shown in Table 6.
Based on the aforementioned, we depict the number of parameters required by the various SI cancelers when tested by the first and second datasets in Fig. 11(a) and (b), respectively.From the figures, one can observe that the DU-based canceler requires the lowest number of parameters compared to the others for both datasets and for all dataset sizes.The SVR-based cancelers, i.e., RTDSVR and OF-TDSVR, require the highest number of parameters among the others in the first dataset, as their parameters basically depend on the number of support vectors, i.e., N sv and N sv , which in turn depend on the number of training data [129].Thus, one can notice from Fig. 11(a) and (b) that as the dataset size increases, the SVR models' parameters significantly increase as well.Finally, it can be inferred from the figures that the RNN-based canceler requires the highest number of parameters compared to the others in the second dataset as a result of using many recurrent connections.

5) COMPUTATIONAL COMPLEXITY
In this subsection, we evaluate the computational complexity of various ML-based SIC approaches in terms of the total number of floating-point operations (FLOPs) required in the inference stage and compare it with that of the polynomial model.Particularly, the number of FLOPs of the polynomialbased canceler is calculated as 10M i + 10M i {( P+12 )( P+1 2 + 1) − 1} − 2 [90].Besides, the number of FLOPs of the typical RV-TDNN, RNN, and CV-TDNN are respectively evaluated as 10M i + n h (4M i + 5), 10M i + 2n h (n h + 9  2 ), and 10{M i (n h + 1) + 6  5 n h } [90], [97].Further, the number of FLOPs of the 2HLNN and DN-2HLNN are calculated as 10M i + 10{n h1 (M i + M o ) + n h1 n h2 + 6  5 n h2 } and 10M i + 10(M i + M o + 16  5 n h2 ), respectively [101].On the other hand, the number of FLOPs of the SVR models, i.e., RTDSVR and OF-TDSVR, employing an RBF kernel, are respectively evaluated in the worst case as 10M i + 4dM i ( )Q and )Q, with d and Q as the degree (e.g., d = 3 for RTDSVR and d = 1 for OF-TDSVR) and the number of testing samples, respectively [108].Finally, the number of FLOPs of the TC and DU approaches are respectively given by 8M i (2F + 1) − 3F − 7 and 10{M i ( P+1 2 ) + 2} [109], [122].A summary of the number of FLOPs utilized to asses the computational complexity of various cancelers is shown in Table 6. 13ased on the aforementioned, in Fig. 12(a) and (b), we depict the FLOPs required by various SI cancelers when tested using the first and second datasets, respectively.From the figures, one can observe that the DU-and TC-based cancelers require the lowest number of FLOPs for all dataset sizes in the first and second datasets, respectively.Further, the polynomial-, RV-TDNN-, and DN-2HLNN-based cancelers require a reasonable number of FLOPs when compared to the others for all dataset sizes.Finally, it can be inferred from the figures that the SVR-based cancelers, i.e., RTDSVR and OF-TDSVR require an intolerable computational complexity compared to the others, as their FLOPs depend on the number of support vectors, N sv and N sv , as well as the number of testing samples Q [108].

6) CANCELER EFFICIENCY
In the previous subsections, we evaluated the performance of each SI canceler in terms of its SIC (or PSD), training overhead, memory storage, and computational complexity.Based on this analysis, we have found that some of the cancelers outperform in terms of SIC performance, and some are promising in terms of training time, memory storage, and/or computational complexity.So, the question is how to select a certain ML-based SIC approach to fit a target application, i.e., meet system criteria.This subsection will help to address the above question to select a suitable SIC approach depending on the system requirements.
As the challenge in the SIC problem is to find an SI canceler that maximizes the achieved SIC while minimizing the training time, memory storage, and computational complexity requirements, we have devised an efficiency measure η to evaluate each canceler based on the aforementioned metrics as follows: where w C ∈ {0, 1}, w τ ∈ {0, 1}, w ∈ {0, 1}, w F ∈ {0, 1} represent the cancellation, training, storage, and complexity weighting factors, respectively, which take either 0 or 1 values depending on the system requirements. 14Moreover, η C , η τ , η , and η F indicate the cancellation, training, storage, and complexity efficiencies of each canceler, which can be respectively expressed as ) ) with C as the total SIC achieved by each canceler over a certain dataset, while C max and C min are the maximum and minimum SIC attained by any of the cancelers within this dataset, respectively.Similarly, τ is the training time needed by each canceler over a certain dataset, whereas τ max and τ min are the maximum and minimum training time required by any of the cancelers within this dataset, respectively.Likewise, represents the number of parameters required by each of the cancelers over a certain dataset, while max and min indicate the maximum and minimum parameters needed by any of the cancelers within this dataset, respectively.Finally, F represents the number of FLOPs required by each of the cancelers over a certain dataset, whereas F max and F min denote the maximum and minimum number of FLOPs required by any of the cancelers within this dataset, respectively.Based on the above, we have assessed the efficiency η for various SI cancelers over the first and second datasets in Table 7.It can be observed from the table that the polynomial model achieves the highest efficiency among the other SI cancelers for most of the test cases in the first dataset; i.e., the polynomial-based canceler is efficient for the test cases where a low average transmit power is utilized, and the non-linearity is not severe.However, in the second dataset, where a high transmit power is used, the RV-TDNN-based canceler achieves the highest efficiency among the others for most of the test cases.One can also notice from Table 7 that the polynomial-based canceler requires a large number of training examples to achieve the highest efficiency, e.g., the polynomial-based canceler is unable to attain the highest efficiency when being trained using 2000 samples of the first dataset.In addition, one can infer from the table that the RV-TDNN works well in the test cases where the training overhead is not of the system demands, e.g., the RV-TDNNbased canceler is unable to attain the highest efficiency in the second dataset for all test cases where w τ = 1 and the polynomial-based canceler becomes a better choice in such test cases.
In sum, upon testing several ML-based approaches for SIC in FD transceivers, using two test setups and over short dataset sizes, we can conclude that the model-driven approaches, i.e., polynomial-based canceler, can be a good choice in operating scenarios where a low transmit power is employed; however, at high transmit power levels, the data-driven ML approaches, i.e., RV-TDNN-based canceler, can be a better choice.

VI. CHALLENGES AND FUTURE RESEARCH DIRECTIONS
The previous sections provided a comprehensive overview of applying ML-based approaches for SIC in FD transceivers.Suitable SIC approaches have also been selected for SIC, depending on the system criteria.Although the literature works surveyed in this manuscript provide a significant role in empowering the application of ML techniques for SIC in FD transceivers, more efforts remain to be made to adopt such techniques in practical wireless systems employing FD transmission.The following subsections delve into the main challenges of applying ML-based approaches for SIC in FD transceivers and provide a guide for future research directions.

A. CONSIDERING THE EFFECT OF SOI WHILE PERFORMING THE SIC
The existing ML-based SIC approaches consider the cancellation of the SI signal only, i.e., no signal from any remote FD or half-duplex TRPs is considered.However, in practical situations, i.e., real-time FD systems, the SIC in one FD node has to be done while an SoI from another TRP is received and demodulated.Initial works in [127], [130] investigated a joint detection of the SI and SoI and proved that an NN-based SI canceler is beneficial to enhance the signal demodulation.Despite the potential of the works in [127], [130], there are still more issues remaining to be addressed, and the point of detecting the SoI while performing the SIC is open to improvements from both performance and complexity perspectives.For instance, one issue is that all ML-based approaches surveyed in this manuscript are trained and verified using time-domain samples, i.e., they are completely working in the time domain.However, if the SoI signal employs any of the frequency-domain modulation formats, e.g., OFDM modulation, performing the SIC could be done in the frequency domain; this would be similar to the fifth-generation new radio or future 6G demodulation pilots (demodulation reference signals uplink or downlink) which are in specific time and frequency symbols [127].Thus, adapting the ML-based SIC approaches to work with frequency-rather than time-domain samples can be a direction for future investigation.

B. TACKLING THE TIME-VARYING SI CHANNELS
The existing ML-based SIC approaches use offline-trained ML algorithms to estimate the SI signal over a static SI channel.However, in practical situations, the movements of user equipment TRPs and/or environmental changes can vary the SI channel over time, and the ML algorithms may need to be retrained in order to adapt to the time-varying SI channel.Nevertheless, as presented in Fig. 10, some ML algorithms require a higher training time, i.e., they are not fast enough to be retrained during the FD transmission, which can lead to significant performance degradation.Initial works in [102], [131] investigate the effect of canceling the SI signal under time-varying SI channels.However, these are incipient works, and the point is open to improvements in both performance and complexity perspectives.For instance, applying reinforcement and online learning to iteratively tackle the time-varying SI channel can be a future direction of investigation.Scaling the performance and/or complexity as a result of employing reinforcement and online learning can also be considered for future investigation.

C. APPLYING ML APPROACHES FOR SIC IN FD MIMO SYSTEMS
The ML-based SIC approaches surveyed in this work are trained and verified using a single-input single-output (SISO) FD testbed.However, in recent communication standards, the MIMO technology has become a basic transmit/receive scheme.Hence, extending the above ML-based SIC techniques to MIMO rather than SISO FD transceivers is imperative.Typically, the complexity of the SIC approaches exponentially increases under MIMO operation where M transmit antennas interfere with N receive antennas.A straightforward approach-to process several SI signals in the digital domain-is to perform the SIC using separate SI cancelers, which consider the interfering signals from all transmit antennas; however, this results in excessive complexity.To address this issue, alternative approaches can be designed.For instance, exploiting the spatial correlation between the MIMO channels to develop a common SI canceler, i.e., not separate cancelers, can be a direction for future investigation in order to reduce the impractical computational complexity of the traditional MIMO SIC-based approaches [132].

D. TRAINING COMPLEXITY OF ML-BASED SIC APPROACHES
The computational complexity of the existing ML-based SIC approaches is typically evaluated and compared in terms of FLOPs required in the inference stage, i.e., upon performing and finalizing the training process.However, estimating the training complexity (in terms of FLOPs) is crucial and should be considered, especially for ML algorithms targeted to be integrated with online learning as described in Section VI-B.For instance, calculating the number of FLOPs required for performing the backpropagation in NNs, approximating the unknown function using optimization in SVRs, and solving the low-rank tensor decomposition problem in TC-based cancelers should be explored to provide insights about the feasibility of applying ML-based approaches for SIC in real-time FD transceivers.

VII. CONCLUSIONS
In this paper, we have surveyed the up-to-date contributions in applying ML approaches for SIC in FD transceivers.Based on a comprehensive review, we have found that canceling the interference in FD transceivers using ML has been initially performed by traditional approaches, such as NNs and SVRs.Advanced ML approaches, such as TC, TensorFlow graphs, and RFFs, integrated with online learning, have been employed for SIC as well.Further, other ML approaches proven in other disciplines, such as DR, GMMs, DU, LL, and APSM, have also been utilized for modeling the SI in FD transceivers.Upon surveying the literature, we have provided a case study to evaluate the performance of the prominent ML-based approaches over short dataset sizes and using two test setups employing different transmit power levels.Specifically, we have assessed the performance of the prominent data-driven ML-based approaches in terms of the SIC, PSD, training time, memory storage, and computational complexity and compared them with those of the model-driven approaches, e.g., polynomial-based canceler.Afterward, we evaluated the efficiency of the different SIC approaches based on the aforementioned metrics to select a suitable approach for SIC, depending on system requirements.Based on this study, we have found that the model-driven approaches, i.e., polynomial-based canceler, could be a good choice when a low transmit power is utilized (i.e., low nonlinearity exists).However, at high transmit power (i.e., high non-linearity exists), the data-driven ML-based approaches, i.e., RV-TDNN-based canceler, could be a better choice.We have finally identified the research gaps in applying ML approaches for SIC in FD transceivers, paving the way for future research directions, such as considering the SoI effect, extension to MIMO FD transceivers, and tackling the time-varying SI channels.

FIGURE 1 .
FIGURE 1. Organization of this paper.

FIGURE 2 .
FIGURE 2. ML-based FD system model with linear and non-linear digital cancellation stages.

FIGURE 4 .
FIGURE 4. ML-based approaches for SIC in FD transceivers.

FIGURE 5 .
FIGURE 5. NN-based approaches for SIC in FD transceivers.

FIGURE 6 .
FIGURE 6. SVR-based approaches for SIC in FD transceivers.The NTDSVR is trained using ỹSI,nes , which is generated after estimating the SI channel and performing the inverse channel filtering.However, the RTDSVR and OF-TDSVR are trained using ỹSI,nl , which is generated after linear SI estimation and reconstruction[108].

FIGURE 8 .
FIGURE 8. SIC of different ML-based SI cancelers compared to the polynomial canceler over the first and second datasets.
the other cancelers in the first dataset; it can provide a gap to Rx noise floor value of (90.8 − 88.7 = 2.1 dB), bringing the SI signal very close to the Rx noise floor level.It can also be inferred from Fig. 9(b) that the RV-TDNN-based canceler provides the lowest gap to Rx noise floor compared to the others in the second dataset; it attains a gap to Rx noise floor value of (85.3 − 81.3 = 4 dB), bringing the SI signal close to the Rx noise floor level.The low gap to Rx noise floor achieved by the RV-TDNN compared to the polynomial canceler comes from the fact that it can reduce the leakage of the carrier around the DC tone, as shown in Fig. 9(b) [96].Finally, one can observe from the figures that the SIC

FIGURE 10 .
FIGURE 10.Training time of different ML-based SI cancelers compared to the polynomial canceler over the first and second datasets.

FIGURE 12 .
FIGURE 12. FLOPs of different ML-based SI cancelers compared to the polynomial canceler over the first and second datasets.

TABLE 1 . Summary of ML-Based Approaches Applied for SIC in FD Transceivers 34
VOLUME 5, 2024Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 2 .
Measurement Setup Specificationsto post-process the captured data at the PC, a 3.7.5 version of Python is installed in a Windows environment, using the 5.1.5version of Spyder as the integrated environment for development, comparisons, and evaluation of different ML-based SIC approaches.
10Finally, for analyzing the performance of various ML-based approaches at different dataset sizes, we have split each of the above-mentioned datasets into four separate datasets containing 2000, 3000, 4000, and 5000 samples, respectively.In all test cases, the first 90% of samples are used for training (and validation, if any), while the last 10% are reserved for testing.The specifications of the measurement setup employed in this work are detailed in Table