From Torch to Projector: Fundamental Tradeoff of Integrated Sensing and Communications

Sensing and communications (S&C) have been historically developed in parallel. In recent decade, they have been evolving from separation to integration, giving rise to the integrated sensing and communications (ISAC) paradigm, that has been recognized as one of the six key 6G usage scenarios. Despite the plethora of research works dedicated to ISAC signal processing, the fundamental performance limits of S&C remain widely unexplored in an ISAC system. In this tutorial paper, we attempt to summarize the recent research findings in characterizing the performance boundary of ISAC systems and the resulting S&C tradeoff from an information-theoretical viewpoint. We begin with a folklore"torch metaphor"that depicts the resource competition mechanism of S&C. Then, we elaborate on the fundamental capacity-distortion (C-D) theory, indicating the incompleteness of this metaphor. Towards that end, we further elaborate on the S&C tradeoff by discussing a special case within the C-D framework, namely the Cramer-Rao bound (CRB)-rate region. In particular, S&C have preference discrepancies over both the subspace occupied by the transmitted signal and the adopted codebook, leading to a"projector metaphor"complementary to the ISAC torch analogy. We also present two practical design examples by leveraging the lessons learned from fundamental theories. Finally, we conclude the paper by identifying a number of open challenges.


I. INTRODUCTION
I SAC has recently been recognized by the international telecommunication union (ITU) as one of the vertices supporting the 6G hexagon of usage scenarios [1].By relying on unified hardware platforms and radio waveforms, such integration enables resource-efficient cooperation between S&C subsystems, supporting various emerging applications including vehicular-to-everything (V2X) networking, extended reality (XR), and digital twins.
Early endeavours on ISAC system design aim to extend the capability of existing infrastructures, hence developed into sensing-centric and communication-centric paradigms.Representative sensing-centric schemes include communicating using radar sidelobes and the per-mutation degree of freedom (DoF) in the waveformantenna mapping of multiple-input multiple-output (MIMO) radars [2].Communication-centric schemes are exemplified by sensing relying on orthogonal frequencydivision multiplexing (OFDM) waveforms [3].These schemes do not target the optimal S&C performance.To push the benefit of ISAC (termed as the "integration gain" [4]) to its limit, joint design schemes emerged more recently [5], which aim at conceiving novel joint signaling strategies from the ground-up, capable of accomplishing both tasks simultaneously.A natural, but easily overlooked question that arises is: What is the fundamental limit of the integration gain?
If we focus back on conventional individual S&C systems, we see that their fundamental performance limits originate from the resource budget.For example, the celebrated Shannon capacity formula for the scalar Gaussian channel exactly expresses the dependency of the communication performance on the available power and bandwidth.However, the performance limit of ISAC systems is vastly different.In its essence, the problem of ISAC system design is a multi-objective optimization problem.To elaborate, separated S&C design can be viewed as a special case of ISAC design, namely employing a time-sharing strategy.By utilizing the synergies between S&C, more favorable performance can be achieved.However, besides some rare occasions, it is highly unlikely that S&C would achieve their optimal performance at the same time, suggesting the existence of a fundamental S&C tradeoff in an ISAC system.Such a tradeoff may be framed by the Pareto boundary of the multi-objective ISAC system design problem.
For decades, S&C have been regarded as an information-theoretical "odd couple" that are mutually intertwined in profound ways [6].At large, the S&C tradeoff may be understood from the perspective of a signal-and-system duality.For instance, let us consider a simple linear Gaussian model given by Y = HX + Z (1) with X, Y, H and Z being the transmitted ISAC signal, the received signal, the channel, and the noise, respectively.For communication, the task is to decode the arXiv:2310.09749v2[cs.IT] 17 Oct 2023 message encoded in X, whereas for sensing, the task is to extract information from the channel H.In light of this, one may view H as the "transmitted signal" from the environment to be sensed, while viewing X as the "channel' that this "signal" would pass through.
While the connection between estimation and information theories has been well-studied in the context of, e.g., I-MMSE equation [7], the fundamental tradeoff still remains widely open in general.To this end, this article will summarize current understandings of different components in the S&C tradeoff in a coherent manner, through the prism of information theory [8]- [12].We shall commence with a folklore "torch metaphor" depicting the resource competition between S&C subsystems, followed by the general capacity-distortion theory suggesting the incompleteness of this metaphor.We then introduce the CRB-rate region which clearly indicates that the S&C tradeoff is two-fold: Apart from resources, S&C subsystems would also have different preferences on the input distribution.For each component in this tradeoff, we provide design examples illustrating its practical implications.Finally, we conclude this paper with some open challenges.

II. THE TORCH METAPHOR
The fundamental S&C tradeoff in ISAC has been exemplified (implicitly or explicitly) by the "torch metaphor" in some works [5], [13], as illustrated in Fig. 1.In this picture, the role played by the ISAC base station (BS) resembles a child holding a torch: If she points the torch towards the communication user, the user would receive the message, while the sensing target is left in the dark and hence can hardly be seen.On the other hand, if the target is maximally illuminated, the user would receive very weak signals, resulting in highly noise-corrupted messages.This metaphor is intuitive, and it provides us with the following basic understandings of the S&C tradeoff: • Power allocation across orthogonal or quasiorthogonal dimensions (in the case of Fig. 1 it is space, or angle, but it could be time, or frequency) offers an immediate way to tradeoff communication and sensing performance; • Since we are using a unified signal, when the communication user and the sensing target are "close to each other", the S&C tradeoff becomes less prominent.Inspired by these intuitions, most existing research contributions on ISAC system design focus on power allocation in a generalized sense, including beamforming in MIMO systems [5] and subcarrier power allocation in OFDM systems [14].Despite the effectiveness of these techniques, we would still wonder, whether the torch metaphor depicts the full picture of the S&C tradeoff.
Recently, it has been found that the torch metaphor does cover the full picture, when the sensing task is to detect the presence of a potential target, under some specific choice of sensing performance metrics [12].Specifically, in this scenario, during the n-th channel use, the ISAC BS would determine the absence/presence of the target (represented by a state η ∈ {0, 1}) based on an echo Y s,n , while the user aims for decoding the information conveyed in the transmitted ISAC signal X n , based on its received signal Y c,n .Both the echo and the user-received signal are contaminated by circularly symmetric Gaussian noises, denoted by Z s,n and Z c,n , respectively.Such a scenario may be characterized as where n = 1, 2, . . ., N , N is the coding block length, H c denotes the communication channel, while H s represents the target response matrix (also referred to as the sensing channel [10]- [12]).For communication, a natural performance metric is the communication rate given by under the assumption that the decoding error probability vanishes as N tends to infinity, and M N denotes the size of the communication codebook.For sensing, the performance metric chosen in [12], [15] is the error exponent 1 defined as where δ N denotes the maximum detection error probability over all codewords and target states, namely Under the aforementioned setting, the set constituted by all achievable (R, E) pairs (termed as the rate-exponent region) is shown to be characterized by [12] R ⩽ log where is the statistical covariance matrix of the transmitted ISAC signal, satisfying a power budget constraint Tr{ R X } ⩽ P .From ( 4) we observe a concrete version of the torch metaphor.Apparently, the communication-optimal R X would have its eigenvectors matching those of H c , while the eigenvalues determined by the water-filling power allocation [16].By contrast, the sensing-optimal R X would be aligned with the dominant eigenvector (possibly having multiplicity larger than 1) of H s [12].Therefore, as long as the eigenspaces of H s and H c do not match, or the water-filling strategy does not concentrate the power on the dominant eigenvector, there would be a S&C tradeoff.Furthermore, (4) also gives us an explicit sense in which the communication user and the sensing target are "close to each other": their distance may be characterized by a measure of discrepancy between the "communication subspace" span(H c ) and the "sensing subspace" -the dominant eigenvector of H s .In light of this, such a tradeoff is termed as the subspace tradeoff (ST) in [10], and thus the torch metaphor may be represented concisely as the statement of S&C tradeoff = ST.(5) To further reveal the nature of the ST, let us consider a tangible example, in which the BS is equipped with collocated transmit and receive uniform linear antenna arrays (half-wavelength spacing) of the size N s = M = 10, the user has a single antenna (i.e.N c = 1), while the sensing target is point-like.In this scenario, the communication and sensing channels are given by where α c and α s denote the scalar channel coefficients, θ c and θ s denote the bearing angle of the user and the target relative to the BS, respectively, while a(θ) is the array steering vector given by a(θ) = [1, e jπ sin(θ) , e j2π sin(θ) , . . ., e jπ(Ns−1) sin(θ) ] T .Correspondingly, the characterization of the rateexponent region (4) can then be simplified as where λ is a parameter controlling the S&C tradeoff.
The communication and sensing signal-to-noise ratios (SNRs) may be expressed as Using (6), we may readily obtain the rate-exponent regions for given configurations of (θ c , θ s ), as portrayed in Fig. 2. Note that the closeness between the S&C subspaces in this scenario is characterized by the intersection angle between a(θ c ) and a(θ s ).We may observe from Fig. 2 that the S&C tradeoff becomes more prominent as the angular separation between the target and the user increases.More intuitively, as can be seen from Fig. 2, in the θ c = 60 • , θ s = 90 • scenario, the sensing-optimal and communication-optimal beam patterns (corresponding to λ = 0 and λ = 1, respectively) have a large overlap.By contrast, in the θ c = 60 • , θ s = −30 • scenario, the sensing-and communication-optimal beam patterns are almost orthogonal to each other.This corroborates our intuition about the ST, that more synergies between S&C tasks would be witnessed as the corresponding subspaces become closer to each other.In the language of the torch metaphor, we may say that the "ISAC torch" in the case of Fig. 3a can simultaneously illuminate both the user and the target, while it has to apply a power-splitting strategy in the case of Fig. 3b. .Beamspace illustration of the subspace tradeoff (corresponding to the rate-exponent regions in Fig. 2).The "ISAC torch" can simultaneously illuminate both the user and the target in case (a), while it has to apply a power-splitting strategy in case (b).
Now that we see (5) holds true for the target presence detection problem under the sensing metric of error exponent, we would naturally ask in the sequel that: • Does (5) hold in general?
• If not, what is the key condition for (5) to hold?Is it the metric chosen (error exponent), the specific type of the sensing task (target detection), or both?

III. CAPACITY-DISTORTION THEORY
To answer these questions, one has to rely on a more general analytical framework, that applies to both estimation and detection tasks and can handle favorably all reasonable sensing performance metrics.The capacitydistortion theory [8]- [10] is conceived in the hope of building such a universal framework.

A. Rate-Distortion Region
As the name suggests, the capacity-distortion theory investigates the tradeoff between communication capacity and sensing distortion.Originally proposed by Shannon in the context of rate-distortion theory for lossy data compression [17], distortion refers to a wide range of functions taking the form of d(η, η), whose inputs are the true value of some quantity (in sensing problems, the sensing parameter) η and its estimate η.Due to the randomness of the communication message, the ISAC waveform (codeword) X N is also random.To reflect the sensing performance over a relatively long period of time, a common practice is to use the expectation of the distortion, instead of its instantaneous values, as the performance metric.For example, in estimation tasks, the squared Euclidean distance d(η, η) = ∥η − η∥ 2 is a widely applied distortion function, whose expectation is the mean-squared error (MSE).In binary (e.g.target presence) detection problems in which the task is to determine the value of a binary variable η ∈ {0, 1}, a valid distortion function is the Hamming distance given by d(η, η) = η ⊕ η, whose expectation is related to commonly used detection metrics, including the detection probability P D and the false alarm rate P FA , as follows: Note that for constant false-alarm rate (CFAR) detectors based on the Neyman-Pearson criterion with fixed P FA [18], minimizing the Hamming distance is equivalent to maximizing the detection probability P D .Besides capacity and distortion, there is yet another important ingredient in the capacity-distortion theory, namely the transmission cost.To elaborate, not all ISAC waveforms (or codewords) cost equally in terms of wireless resources.For example, the points in quadrature amplitude modulation (QAM) constellations having different amplitudes would yield different power consumptions.Apparently, as the overall resource budget increases, S&C performances can be simultaneously enhanced.Therefore, one has to discuss the capacitydistortion tradeoff under a specific resource budget.Similar to the sensing distortion, in order to take into account the randomness of the codewords, we typically use the expectation of the resource cost over all possible codewords as the measure of transmission cost.
Once the resource budget is given and the sensing distortion metric is chosen, the S&C tradeoff is expressed in terms of the largest achievable rate-distortion region.Formally speaking, given an expected resource budget B, the rate-distortion-cost triple (R, D, B) is said to be achievable (in the infinite block length regime), if  there exists a sequence of (2 N R , R) codes {X N |N ∈ N} encoding the message W ∈ {1, . . ., 2 N R }, and a state estimator function η : (Y s , X N ) → η, such that the following holds [10] as N → ∞, where b(•) denotes the instantaneous cost of single codeword.In addition, for generality, the values of the sensing parameters are allowed to vary across time, denoted by η i at the i-th channel use, constituting a parameter sequence η N .The capacity-distortion function given a specific resource budget B is then defined as For the simplicity of discussion, let us consider memoryless channels,2 exemplified by (2), and illustrated in Fig. 4. The effect of such channels can be expressed as produces a communication output Y c and a sensing output Y s , whereas their relationship with the channel input X can be different from the linear model (2).Moreover, the way that the sensing parameter η couples with the channel can also be different.For such channels, it is natural to consider per-block resource budgets and distortion metrics, given by The conditions (8) can then be concretized as In order to gain insights from (11), observe that when the sensing distortion constraint (11a) is absent, the capacity is given by the classical result of "capacitywith-cost" [20] C where The result (12) implies that the capacity in the presence of cost budget has a single-letter representation, which would extremely simplify further analysis.Furthermore, since the sensing channel is memoryless, the optimal estimator does not rely on historical observations, and hence the expected distortion can be written as a function of the channel input, i.e.E{d(η i , ηi )|X = X} = c(X).In light of this, the capacity-distortion function can now be viewed as a capacity with two costs, namely [10] Remarkably, the result (13) suggests that the expected sensing distortion may be alternatively viewed as a kind of "sensing-induced communication cost".This understanding enables us to extend even further the range that the capacity-distortion theory is applicable, to scenarios in which the sensing performance cannot be described by a proper distortion function.For example, the Cramér-Rao bound (CRB) widely used as the performance metric of estimation problems is not dependent on any specific estimator η.Sometimes it does not even depend on the true value of the parameter η.Therefore, CRB is not a distortion function, but it is related to the transmitted codeword X (as will be discussed later), and hence the capacity-distortion theory can still be applied in the generalized sense.
In this aforementioned scenario, we did not consider feedback.One may wonder whether designing the ISAC codebook by relying on feedback can further enhance the S&C performance.To this end, the state-dependent memoryless channel with delayed feedback (SDMC-DF) model has been investigated [10].The effect of an SDMC-DF can be expressed as [8] p where both sensing and communication rely on the same channel, which produces the output Y N .For sensing tasks, the channel input X i at the i-th channel use cannot be designed based on the real-time channel output Y i or the parameter value η i , since the design process has to be causal.Rather, one can only rely on a delayed feedback Z i−1 , which may be a function of the channel output Y i .
It turns out that, for SDMC-DF models, (13) also applies, with the slight modification that Y c is replaced with Y.To see this, first note that feedback does not improve the capacity of memoryless channels.For the sensing performance, it has been shown that the optimal sensing distortion is achievable by the simple letter-wise minimum expected posterior distortion estimator [10] taking the form of We may now see that the expected distortion also admits a single-letter representation and thus (13) still applies.
From the aforementioned discussion, we may conclude that a key condition for (13) to hold is the memorylessness of channels.Indeed, for channels with memory, even the capacity itself remains open.Furthermore, for such channels, online and offline estimators may have very different performances.These problems deserve further investigation.

B. Computing the Capacity-Distortion Boundary
From a pure theorist's perspective, the result ( 13) is a complete information characterization (as opposed to the operational definition (11)) of the capacity-distortion region.But wait!Remember that our expectation on the capacity-distortion theory in the first place was to help understand the nature of the S&C tradeoff, hopefully beyond the ST.However, we cannot even dig ST itself out of (13), not to mention any further insight.
To serve our purpose, a possible approach is to compute explicitly the capacity-distortion functions for some representative scenarios, in the hope of obtaining useful intuitions.It turns out that the renowned Blahut-Arimoto (B-A) algorithm [21] originally proposed for computing the unconstrained capacity can be applied to compute capacity-distortion functions with some modifications.To elaborate, given an initialization of the trail distribution r(X) for p X (X), the original B-A algorithm solving C = max p X (X) I(X; Y) is a fixed-point iteration repeating the following two steps in each round: 1) Update the trail distribution q(X|Y ) for the a posteriori distribution p X|Y (X|Y ) according to which is equivalent to solving the following constrained entropy maximization problem: for some constant t.Now that our objective function is the conditional mutual information I(X; Y|η), we shall replace (16) with In addition, since two extra constraints (13b) and (13c) are now enforced, we should replace (17) with where λ and µ are the Lagrangian multipliers corresponding to the constraints (13b) and (13c), respectively.From the above discussions, we may get a vague sense that the sensing distortion requirements render the resulting ISAC signal X "less random", as they impose constraints on the entropy maximization problem.Of course, this is not yet a rigorous statement with a welldefined meaning.To validate our intuition, let us consider the toy example of a real-valued, single-input singleoutput (SISO) Gaussian SDMC-DF channel model (with Rayleigh fading) where η i and N i are mutually independent zero-mean Gaussian variables with unit variance, while X i is the ISAC waveform satisfying the power constraint We consider the perfect feedback Z i = Y i and the quadratic sensing distortion d(η, η) = (η − η) 2 .In this scenario, the sensingoptimal estimator is the letter-wise minimum mean squared error (MMSE) estimator given by ηi,MMSE = E{η i |X i , Y i }, whose MSE can be calculated as where X is the single-letter representation of the channel input following the probability distribution p X (X).
Under the aforementioned formulation, the capacitydistortion boundary may be numerically computed using the modified B-A algorithm, as portrayed in Fig. 5.At the communication-optimal point 1  ⃝, the input distribution is Gaussian, and we have By contrast, at the sensing-optimal point 4  ⃝, the input distribution corresponds to binary phase-shift keying (BPSK) modulation, for which we have As we move along the capacity-distortion boundary from the communication-optimal point 1 ⃝ to the sensingoptimal point 4  ⃝, the corresponding input distribution p X (X) exhibits a smooth transition from the Gaussian distribution to the BPSK modulation, which agrees with our intuition that the ISAC signal becomes less random as the sensing distortion requirement becomes more stringent.
Note that there is not enough room to accommodate ST in the SISO scenario.Therefore, the S&C tradeoff in the aforementioned example follows solely from the difference in signal preference between communication and sensing tasks.Intuitively, communication tasks would favor signals with a higher degree of randomness, in order to pack more information into the signal.For example, the capacity for AWGN channels is achieved by using Gaussian-distributed signals, where the Gaussian distribution has the maximum entropy under power constraint.By contrast, sensing tasks favor signals that are deterministic in some sense, in order to better distinguish the received signals (echoes) coupled with sensing parameters taking different values.Such a tradeoff is termed as deterministic-random tradeoff (DRT) [11].Naturally, we would then wonder whether the DRT exists in more general scenarios, what effect it would exhibit, and how it would interact with the ST when it exists.Unfortunately, although the modified B-A algorithm is universal in principle, it is hardly applicable to the general settings in a practical sense, due to its enormous computational complexity.To elaborate, the modified B-A algorithm relies on numerical integration, which could be computationally prohibitive when the dimensionality of the signals or the sensing parameters is high.For example, if the number of samples per dimension is K, the total number of samples N MC that the widely used Monte Carlo integration method would require will be on the order of where N Y , N X and N η are the dimensionalities of Y, X, and η, respectively.The exponentially increasing sample complexity would easily surpass the capability of most computing devices, even for small-scale MIMO systems.Furthermore, the modified B-A algorithm cannot provide analytical solutions, which can be useful for practical system design.
Considering these difficulties, to push further our understanding of the S&C tradeoff, we might need to sacrifice the generality of the capacity-distortion theory to some degree.For example, we may try and find specific distortion functions (or non-distortion sensinginduced communication costs, as implied by ( 13)), that are both easy to analyze and sufficiently general to reflect ST and DRT.In what follows, we shall consider one such example in detail.

IV. CRB-RATE REGION
We now focus our attention from the distortion measure d(η, η) to a specific sensing metric, namely, CRB for target parameter estimation.Unlike the MSE that specifically relies on the employed estimator, CRB serves as a globally lower bound for all unbiased estimators (satisfying the regularity condition), which usually leads to more tractable analytical expressions, and is achievable at high-SNR regimes [18].Recalling the singleletterization of the expected distortion in (15), one may treat the CRB as a cost function of the transmitted codeword X, and consider the interplay between the CRB and communication rate as a special case of the C-D tradeoff.

A. Vector Gaussian System Model
We commence by re-examining the model in ( 2), which may be extended to a more generic form as where the sensing channel H s ∈ C Ns×M is now defined as a deterministic, possibly nonlinear function of the sensing parameter η ∈ R K , e.g., angular MIMO radar channel [22].If not otherwise specified, the transmitted codeword X ∈ C M ×T will be referred to as an ISAC signal matrix in this section.This model may be viewed as a special case of the generic model shown in Fig. 4. Following the preceding memoryless channel assumption, both the sensing parameter η and communication channel H c ∈ C Nc×M vary every T symbols in an i.i.d.manner.The discussion on channels with memory (e.g., Markov channels) are designated as our future works.For convenience, we also assume that the ISAC Tx has perfect knowledge on H c .Finally, Z c and Z s are zeromean white Gaussian noise matrices with variances of σ 2 c and σ 2 s , respectively.At this point, it would be worthwhile to concretize the separate channel models in (18) by illustrating the specific scenarios concerned in this section.As shown in Fig. 6, the ISAC Tx emits a dual-functional signal X to simultaneously communicate with a single communication Rx and sense one or more targets.The sensing Rx is either collocated with the Tx (monostatic sensing), or connected with the Tx with a wired link (bistatic sensing).In both cases, both the Tx and sensing Rx have perfect knowledge about the ISAC signal X.
Nevertheless, X is unknown to the communication Rx as it conveys useful information intended for the communication user.We are thus temped to model X ∼ p X (X) as a random matrix, whose realization is perfectly known at both the ISAC Tx and sensing Rx, but is unknown at the communication Rx.

B. CRB with Random but Known Nuisance Parameters
Since the communication performance of (18a) can be directly measured by the mutual information I(X; Y c |H c ), we are now in a position to rethink the sensing performance evaluation in ISAC systems.Indeed, in conventional radar systems, probing signals are typically deterministic, well-designed with good ambiguity properties.In ISAC systems, on the other hand, the transmit signal randomly varies from block to block given the communication data embedded in the signal.This imposes a unique challenge in defining the CRB, which now becomes a function of the random signal X.
One possible approach would be to treat X as a nuisance parameter, and either consider it as a part of the unknown sensing parameter, or integrate it out of the likelihood function describing the observations, leading to classical hybrid or marginal CRB expressions, respectively.Nonetheless, neither of these methods grasps the fundamental feature of the ISAC system, that the random signal X is known to the sensing Rx, and is thus loose in general.To that end, we resort to a Miller-Chang type CRB [23] by computing the CRB for a given instance of X, and take the expectation over X.For any weakly unbiased estimator of η, the MSE is lower bounded by the Miller-Chang Bayesian CRB in the form of where the expectation is taken with respect to X, and J η|X denotes the Bayesian Fisher Information Matrix (BFIM) of η given by (20) More precisely, the BFIM J η|X can be expressed as an affine map of the sample covariance matrix where and J P = σ 2 s T −1 J P , with the term J P being contributed by the prior distribution p η (η), i.e., the second term in (20).In particular, the matrices F i and G j are partitioned from the Jacobian matrix F := ∂ vec(H * s ) ∂η .By noting (19), it turns out that the Miller-Chang CRB is nothing but an equivalent "expected sensing distortion" discussed in Sec.III-A (though it is not a real distortion measure), and may hence be viewed as a sensing-induced cost imposed on signaling resources.Although with high dimensionality, one may still deduce useful results on the CRB-rate tradeoff by exploiting the affine structure of the BFIM, as discussed in the sequel.

C. CRB-Rate Tradeoff
The CRB-rate tradeoff can be characterized as the following Pareto optimization problem where α ∈ [0, 1] is a weighting factor controlling the priority of S&C performance.We highlight here that by moving the CRB from (23a) to (23b), ( 23) may be equivalently recast as a constrained capacity characterization problem with two cost functions as in (13).
Needless to say, fully depicting the Pareto boundary of the CRB-rate region would incur unaffordable computational overheads, as one has to numerically seek for optimal p X (X) by leveraging the modified B-A algorithm discussed in Sec.III-B.To reveal fundamental insights into the CRB-rate tradeoff, we are interested in the two corner points over the Pareto frontier as shown in Fig. 7, namely, P CS , the minimum achievable sensing CRB constrained by the maximum communication capacity, and P SC , the maximum achievable rate constrained by the minimized CRB.It is obvious that the line segment linked to two points forms a time-sharing inner-bound.In what follows, we briefly characterize the S&C performance at the two points.

D. P CS Performance Characterization
Let us first examine the point P CS .For point-to-point Gaussian channel, it is well-known that Gaussian distribution is the unique capacity-achieving input distribution (CAID) under an average power cost [24], in which case (23) reduces to a capacity characterization problem with only power constraint by simply letting α = 0.More specifically, at P CS , each column of X follows a circularly symmetric complex Gaussian distribution CN (0, R CS X ) in an i.i.d.manner, where the statistical covariance matrix R CS X is obtained by solving the following rate maximization problem It is readily observed that the optimal solution of ( 24) has the following eigenvalue decomposition structure where U c contains the right singular vectors of H c , and Λ c contains the optimal eigenvalues that can be attained from the water-filling method.Accordingly, the optimal ISAC signal structure at P CS is where the entries of D i.i.d.subject to CN (0, 1).
One would then raise the natural question: What is the sensing performance if a Gaussian signal is transmitted?From ( 19) to (22), it is obvious that the CRB is determined by the sample covariance matrix R X = T −1 XX H rather than the statistical covariance matrix R X = E(R X ).At P CS , since each column of X is i.i.d.Gaussian distributed, R X follows a complex Wishart distribution.Note the fact that tr [Φ (R X )] −1 is a convex function in R X .Upon denoting the CRB at P CS as ϵ CS , and based on Jensen's inequality, we have which holds for arbitrarily distributed R X .This suggests there is certain performance loss for sensing due to the Wishart distributed R X , since the Jensen lower bound is attained when R X = E(R X ) = R X , which holds for Wishart matrix only when T → ∞.
A more non-trivial result in [11] depicts the upper bound of ϵ CS , given by with M CS = rank( R CS X ), which clearly indicates that the maximum sensing performance loss at P CS is jointly determined by the number of sensing parameters K and the rank3 of R CS X .Note again that when T → ∞, the sensing performance is lossless since the upper bound converges to its lower counterpart.

E. P SC Performance Characterization
The performance characterization at P SC becomes more challenging compared to that of P CS , as the achieving strategy remains unknown in general.By denoting the achievable CRB at P SC as ϵ min , and using the Jensen's inequality again, we have holds for any complex semidefinite matrix R X satisfying the average power constraint, where R SC X is the solution of the deterministic CRB minimization problem (30) While problem (30) is a convex semidefinite programming (SDP), it is not strictly convex.As a result, the optimal solution is not unique.In fact, all the solutions of an SDP belongs to the subspace spanned by the maximum-rank solution, hence may be parameterized as where U s consists of the eigenvectors of the maximumrank sensing-optimal solution corresponding to the nonzero eigenvalues, while Λ s is a positive semidefinite Hermitian matrix.Provably, (30) admits an unique solution in most situations [11], where the equality in (29) holds iff That is, the sample covariance matrix R X becomes a deterministic matrix when the global minimum of the CRB ϵ min is attained.One may then wonder whether there is any communication DoF left in the ISAC signal at P SC .The answer is non-trivially yes.This is because a deterministic R X does not necessarily imply a deterministic X.The latter may still be a random signal conveying information, given by where 2 is deterministic, the communication DoFs at P SC are only contributed by the randomness of Q.
We are now ready to characterize the achievable communication rate at P SC .That is, seeking for the optimal distribution p Q (Q) over the set of all M SC ×T semi-unitary matrices, namely, the Stiefel manifold S(T, M SC ), such that the mutual information I(Q; Y c |H c ) is maximized.In the high-SNR regime, this is equivalent to solving the sphere packing problem over the Stiefel manifold, where the optimal p X (X) is uniform distribution, leading to the following asymptotic achievable rate [11] where converges to zero as T → ∞.
Observe immediately that when T → ∞, the communication DoFs are lossless, since even a Gaussian matrix would have asymptotically orthogonal rows with the increase of T , making it asymptotically equivalent to a semi-unitary matrix.

F. The Two-fold S&C Tradeoff: A Projector Metaphor
The above results clearly demonstrate the effect of both ST and DRT in an ISAC system.By comparing the left-most parts of X CS and X SC , we see that the communication-and sensing-optimal ISAC signals should be aligned to U c and U s , respectively, which may be regarded as the orthogonal bases of communication and sensing subspaces.Then the ST is nothing but to flow the signal power from span(U c ) to span(U s ), perfectly fitting the picture of the "ISAC torch" metaphor.More interestingly, by comparing the right-most parts of X CS and X SC , e.g., D and Q, we see that communicationand sensing-optimal signals adopt Gaussian and semiunitary codebooks, respectively, which again reflects the DRT.That is, when the ISAC system moves along the Pareto frontier from P CS to P SC , p X (X) gradually changes from Gaussian to less random distribution, and eventually becomes uniform distribution over semi-unitary matrices.In that sense, the DRT considered in Fig. 5 is simply a one-dimensional special case, since the semiunitary matrix reduces to constant-modulus signals in its scalar form, namely, the BPSK modulation in Fig. 5(e).
In addition to the tradeoff between input distributions, the DRT may also be observed in the attainable communication and sensing DoF at the two corner points.At P CS , the communication subsystem apparently acquires the full DoF of the Gaussian channel, namely, M CS , which is the rank of R CS X .As shown in (28), due to the Gaussian signaling, the sensing subsystem suffers from the DoF loss, or, equivalently, the reduction in the number of individual observations of the target, which is up to min{K, M CS }.At P SC , the sensing subsystem attains the full DoF of T thanks to the deterministic signal sample covariance matrix.In contrast, the semiunitary signaling incurs a communication DoF loss of To achieve the S&C performance tradeoff, it is critical to determine the steering direction of the ISAC signal.More importantly, what kind of codebook is transmitted through that direction also matters.With the above understanding, we may now correct the "ISAC torch" metaphor with a more comprehensive picture, i.e., the "ISAC projector".As shown in Fig. 8: A child (the ISAC Tx) holds a projector, and wishes to simultaneously illuminate a target (sensing) while sending an image to a receiver (communication).To form an image, the brightness of each pixel may be used to convey information.Nevertheless, those dark pixels result in imperfect illumination of the target.

A. DRT: Sensing with Random Signals
The lessons learned from the DRT inspired us to rethink the practical design philosophy of ISAC systems.In particular, one has to take the randomness of communication data into account while conceiving a sensing strategy, which is a unique challenge emerged in the context of ISAC.In this subsection, we investigate a novel precoding design for sensing with random signals.
Let us consider again the sensing model (18b), with the sensing parameters being the entries of the channel matrix H s , i.e., η = vec(H s ).For a given instance of X, the linear minimum MSE (LMMSE) estimator reads where R H = E(H H s H s ) represents the channel correlation matrix.The resulting estimation error is given as (37) Once again, the estimation error relies on the instantaneous realization of the random ISAC signal X.To depict the average sensing performance, we take the expectation of (37) over X, yielding We refer the term (38) to as the ergodic LMMSE (E-LMMSE) [25], as it may be understood as a time average over different realizations of X. Obviously, it is lowerbounded by the Miller-Chang CRB in (19).We now investigate a specific form of the ISAC signal by letting X = W S, where W ∈ C M ×M is a precoding matrix, and S = [s 1 , s 2 , . . ., s T ] ∈ C M ×T contains column-wise i.i.d.data symbols, satisfying E(s i ) = 0 and E(s i s H i ) = I.A fundamental question to ask is: What is the optimal precoder W that minimizes the E-LMMSE?In classical MIMO radar waveform design, strictly orthogonal signals are typically employed, namely, 1  T SS H = I, where S is a semi-unitary matrix, corresponding to the P SC point discussed in Sec.IV.
In such a case, it is known that the LMMSE-optimal precoder has a water-filling structure given by [26] where Q and Λ contain the eigenvectors and eigenvalues of R H , respectively, and µ 0 is a constant meeting the power constraint ∥W WF ∥ 2 F = P T .However, in the ISAC scenario, the water-filling solution (39) may not be optimal due to the randomness in S. Evidently, applying the Jensen's inequality to (38) yields It turns out that the water-filling solution (39) minimizes the Jensen lower bound in (40), rather than ξ Hs itself.Specifically, when T is comparable to M , and when non-unitary codebooks, e.g., Gaussian codebook (where SS H follows Wishart distribution), are employed, the orthogonality of S breaks and the Jensen bound is no longer tight.To see this, we show an example in Fig. 9a with M = 64, N s = 32, where the water-filling precoder (39) is applied to both Gaussian distributed and semi-unitary data matrix.The Jensen bound (semiunitary sensing performance) is attained when T ⩾ 2048.
To account for the random nature of the ISAC signal, one needs to develop novel precoders that directly minimize the E-LMMSE, rather than its Jensen lower bound.Here we briefly introduce two possible methodologies, namely, data-dependent and data-independent designs, both of which yield better performance than that of the water-filling precoder [25].
1) Data-Dependent Precoding: Despite its randomness, the fact that S is known at the ISAC Tx enables a precoding design based on given instances of S. Let us denote the nth realization of S as S n , and the tobe-designed precoder as W n .One may then directly minimize ξ Hs|X based on the known S n by solving the following problem (41) 4 To the best of our knowledge, most existing ISAC precoding literature overlooked the data randomness by assuming 1  T SS H ≈ I.While problem (41) is non-convex, it is provable that it admits an optimal closed-form solution [27].Consequently, one may minimize the E-LMMSE ξ Hs by minimizing every instance of ξ Hs|X .
2) Data-Independent Precoding: The optimality of the data-dependent precoder is achieved at the price of high complexity, as one has to solve for W n for every instance S n .To ease the computational burden, an alternative option would be to conceive a dataindependent precoder, where a single W is leveraged for every instance of S, leading to the following stochastic optimization problem which can be solved via stochastic gradient descent (SGD) algorithm in an offline manner, where massive training data samples may be locally generated based on the adopted communication codebook.
To validate the performance of the proposed sensing precoding design for random signals, we show their corresponding estimation errors in Fig. 9b with M = 64, T = 32 and N s = 32, where a Gaussian codebook is employed again for generating S. As expected, both precoding designs significantly outperform the classical water-filling approach (39).In particular, the computationally expensive data-dependent design achieves better average estimation performance (0.4-1.4 dB gain) compared to its data-independent counterpart, while the latter attains a favorable performancecomplexity tradeoff.This provides strong evidence that the data randomness is non-negligible in ISAC signaling.To achieve the S&C performance boundary, DRT-aware ISAC precoding techniques are yet to be implemented in practical ISAC systems.

B. Frequency-domain ST: Valuating Sensing Resources
Let us now turn our attention to the ST.In previous sections, we have illustrated the ST using spatial-domain examples.In these examples, roughly speaking, the ST is manifested as the preference for direct-illuminating beams, which holds for both the communication user and the sensing target.By contrast, when we consider nonspatial scenarios, the ST might take a less intuitive form, but it would also be more interesting and non-trivial.
To see this, let us consider the example of ranging waveform design, characterized by the simple observation model where y(t), s(t) and n(t) represent the received signal, the transmitted signal, and the noise with constant power spectral density (PSD) N 0 , respectively.The term τ = d/c denotes the propagation delay, with c being the propagation speed, and d being the distance to be estimated.For this model, the CRB reads is referred to as the "root-mean-square (RMS) bandwidth" [28], while What does (43) imply?Upon assuming that the signal is constrained to reside in the frequency interval of [0, f high ], we find immediately that the CRB-optimal signal is in fact a sinusoidal signal with frequency f high , which maximizes the RMS bandwidth.The intuition behind this result is that, as long as the integer ambiguity can be resolved, ranging methods based on carrier phase sensing would yield the optimal performance, as has been recognized in the literature of global navigation satellite system (GNSS)-based positioning [29].Of course, if not supplemented by further information, the integer ambiguity of a single-tone signal can never be resolved.However, CRB is known to be unable to capture the ambiguity phenomenon.To this end, we may use the Ziv-Zakai bound (ZZB) given by [30]  Although it does not admit a closed-form expression, we may observe from (44) that the ZZB can reflect the ambiguity phenomenon: It is a decreasing function with respect to (1 − R(x)).Since the normalized ACF R(x) achieves its maximum at R(0) = 1, (1 − R(x)) can be viewed as a measure of the sidelobe level.Intuitively, given a fixed noise level, a higher sidelobe level would make it less distinguishable from the mainlobe, and hence cause larger errors.
With the aid of ZZB, we are now able to understand the behaviour of waveforms that achieve (near-) optimal sensing performance.Numerically computed PSDs of ZZB-optimal waveforms (using the method in [30]) are plotted in Fig. 10.Observe that as the total SNR increases, the ZZB-optimal waveform would refocus its power from the low-frequency band to the high-frequency band.The reason is that when the total SNR is sufficiently high, one may effectively resolve the ambiguity caused by relatively high sidelobes, and hence the power is focused on the high-frequency band.By  contrast, when the total SNR is lower, one needs to lower the sidelobe level to fight against the ambiguity issue, which would inevitably widen the mainlobe, leading to low-frequency waveforms. 5ote that this is a remarkable observation suggesting that sensing tasks have a unique preference on the subspaces where the ISAC signal resides.This is in stark contrast to communication tasks, for which the optimal frequency-domain power allocation scheme is the water-filling strategy.In particular, the water-filling strategy assigns more power to frequency bands having higher SNR, and might even abandon some low-SNR bands when the total power constraint is stringent, or equivalently, when the total SNR is relatively low.In the language of ST, we may say that in the frequency domain, the communication subspace corresponds to those frequency bands with high SNR.By contrast, the sensing subspace is not solely dependent on the SNR; rather, as the total SNR increases, the sensing subspace would move from the low-frequency band to the highfrequency band, as portrayed in Fig. 11.
We may obtain further insights from the perspective of resource allocation.One of the resources of communication tasks is the DoF, which is manifested as the bandwidth in the frequency domain.For communication tasks, the value of the frequency bands only depends on their quality (in terms of SNR).However, for sensing tasks, the value of frequency bands depends not only on their quality, but on their location as well.In light of this, we may say that the DoF is not, in its nature, a sensing resource.We are thus motivated to ask the following question: What do we really mean when we say "sensing resources"?

VI. CONCLUDING REMARKS AND OPEN CHALLENGES
Unfortunately, at the moment of writing, we do not have a well-stated answer to this question.After all, in contrast to communication tasks always aiming to deliver information, sensing tasks have vastly diverse purposes, and hence may rely on different resources.Apart from this question, many important problems remain open in the context of S&C tradeoffs.To name a few: 1) How does the DRT manifest itself under generic sensing performance metrics?2) How do we characterize the S&C tradeoff when the channels are not memoryless?What are the performances of online and offline estimators in such scenarios?3) How do we design ISAC systems that are capable of achieving the entire capacity-distortion boundary (not just the corner points)?4) Can we unify S&C performance metrics?These challenging questions remind us that there is still a long way to go before the merit of ISAC can be fully understood and utilized.Nevertheless, the ST-DRT decomposition (i.e. the "projector metaphor") is likely to be a useful meta-intuition in future investigations of ISAC systems: the fundamental tradeoff in ISAC is manifested as the preference discrepancies between S&C tasks, concerning both the resources (ST) and the signal patterns (DRT).

Figure 1 .
Figure 1.Graphical illustration of the torch metaphor.

Figure 2 .
Figure 2. The rate-exponent region of the single-antenna user, pointlike target scenario, with communication and sensing SNRs equal to 10dB and 0dB, respectively.

Figure 3
Figure 3. Beamspace illustration of the subspace tradeoff (corresponding to the rate-exponent regions in Fig.2).The "ISAC torch" can simultaneously illuminate both the user and the target in case (a), while it has to apply a power-splitting strategy in case (b).

Figure 4 .
Figure 4.The structure of a generic ISAC system with memoryless channels, often considered in the capacity-distortion theory.

Figure 5 .
Figure 5.The capacity-distortion boundary of the real-valued SISO Gaussian channel scenario with B = 10dB, as well as Paretooptimal input distributions p X (X) along the boundary.

Figure 6 .
Figure 6.The ISAC scenarios described in(18), where the dualfunctional waveform X is known to both the ISAC Tx and sensing receiver (Rx).

Figure 8 .
Figure 8. Graphical illustration of the projector metaphor.
where (a) is due to the convexity of ξ Hs|X with respect to SS H , and (b) holds from the fact E(SS H ) = T I.The equality in (a) holds only asymptotically4 for T M → ∞, since1  T SS H ̸ = I due to the i.i.d.assumption in the columns of S.

Figure 9 .
Figure 9. Sensing with random signals.(a).Tightness of the Jensen bound with an increasing block length.(b).Estimation performance of the DRT-aware precoding designs.

Figure 11 .
Figure 11.Frequency-domain communication and sensing subspaces (for the ranging task) in low-and high-SNR regimes.