Forward-Aware Information Bottleneck-Based Vector Quantization: Multiterminal Extensions for Parallel and Successive Retrieval

—Consider the following setup: Through a joint design, multiple observations of a remote data source shall be locally compressed before getting transmitted via several error-prone , rate-limited forward links to a (distant) processing unit. For addressing this speciﬁc instance of multiterminal Joint Source-Channel Coding problem, in this article, the foundational principle of the Information Bottleneck method is fully extended to obtain purely statistical design approaches, enjoying the Mutual Information as their ﬁdelity criterion. Speciﬁcally, the forms of stationary points for two types of distributed compression schemes are characterized here. Subsequently, those acquired solutions are utilized as the centerpiece of the proposed generic, iterative algorithm, termed the Multiterminal Forward-Aware Vector Information Bottleneck (M-FAVIB) , for addressing the design optimizations. Leveraging an unfolding trick, it will be proven that both distributed compression schemes fall into the category of Successive Upper-Bound Minimization , ensuring their convergence to a stationary point. Eventually, the effectiveness of the proposed compression schemes will be substantiated as well by means of numerical investigations over some typical transmission scenarios.

Abstract-Consider the following setup: Through a joint design, multiple observations of a remote data source shall be locally compressed before getting transmitted via several error-prone, rate-limited forward links to a (distant) processing unit.For addressing this specific instance of multiterminal Joint Source-Channel Coding problem, in this article, the foundational principle of the Information Bottleneck method is fully extended to obtain purely statistical design approaches, enjoying the Mutual Information as their fidelity criterion.Specifically, the forms of stationary points for two types of distributed compression schemes are characterized here.Subsequently, those acquired solutions are utilized as the centerpiece of the proposed generic, iterative algorithm, termed the Multiterminal Forward-Aware Vector Information Bottleneck (M-FAVIB), for addressing the design optimizations.Leveraging an unfolding trick, it will be proven that both distributed compression schemes fall into the category of Successive Upper-Bound Minimization, ensuring their convergence to a stationary point.Eventually, the effectiveness of the proposed compression schemes will be substantiated as well by means of numerical investigations over some typical transmission scenarios.Index Terms-CEO problem setup, distributed quantization, information bottleneck, mutual information, noisy channels.

I. INTRODUCTION
T HE Information Bottleneck (IB) method was introduced in [1] as a task-based compression technique, and its initial applications are traced back to the context of Unsupervised Learning, where it was utilized as an information-theoretic approach towards Cluster Analysis [2].To put it briefly, its main idea is to extract a relevant summary of data by bringing a statistically correlated variable into play that determines the meaning of relevance.The data set is then summarized in such a fashion that its information content w.r.t. that target variable is mostly retained.The inevitable trade-off between the informativity and compactness of the obtained result is established by employing two Mutual Information (MI) terms for gauging each aspect.Interested readers are referred to [3]- [6] to gain further insights into this Variational Principle.Particularly, [5] provides an exhaustive discussion on the information-theoretic and learning features of the IB method and points out relevant connections to some classical problems like the Wyner-Ahlswede-Körner problem [7], [8], the efficiency of investment information [9], and also the Wyner-Ziv setup [10] under the Common Reconstruction (CR) constraint [11] with the logarithmic-loss distortion [12].
The underlying setup of the IB method can be immediately exploited to address the problem of Noisy Source Coding (NSC) [13]- [16].Specifically, utilizing the IB paradigm to compress an imperfect observation from a remote data source yields a purely statistical design approach.Besides, a certain instance of this framework tackles the problem of designing such quantizers that maximize the end-to-end transmission rate for a fixed source statistics, a criterion that is highly requested in all modern transmission systems.The footprints of the IB method are already traceable in various fields of advanced data communication schemes.Those applications encompass the design of receiver front ends' Analog-to-Digital converters [17], polar codes construction [18], and efficient discrete decoding concepts [19], [20] with excellent performance, to mention a few.
The multiterminal extensions of the original IB method have been considered as well (see, e.g., [21]- [27]).The underlying scenario for such distributed schemes usually sets about dealing with several noisy observations from a (set of) remote source signal(s) that have to be compressed (with possibly different rates) following specific strategies such that, collectively, the retrieved signals after decompression preserve as much information as possible about the (set of) source signal(s).In practice, it occurs rather frequently that the compressed signals at the quantizers' outputs have to be transmitted over second error-prone hops to be fed into a (distant) processing unit.Several instances for such scenarios incorporate the distributed inference sensor networks with imperfect connections to the fusion center [28], [29], the Cloud-based Radio Access Networks (Cloud-RANs) with noisy fronthaul links [30], [31], the cooperative relaying setups with the Quantize-and-Forward strategy [32], [33] and also devices with unreliable memories [34], [35].The aforementioned assorted applications can be subsumed under a more general framework, the Joint Source-Channel Coding (JSCC) [36], [37], wherein the impacts of the imperfect forwarding of the quantizers' outputs are taken into account within the quantization design formulation.A closer look at the pertinent literature on this topic reveals some general approaches: Among others, several techniques have been proposed for judicious assignments of some binary codewords to the quantizers' output signals [38], [39] or, concentrating on the squared-error distortion, specific modifications have been suggested [40], [41] for adapting the conventional Lloyd algorithm [42] to the JSCC setup.Furthermore, in [28], [43], [44], as the fidelity criterion, the MI has been considered to acquire quantization schemes, maximizing the end-to-end transmission rate.An extensive review of different approaches has been provided in [45], [46].Also, [47], [48] survey some distributed JSCC schemes over independent, multiple access, and broadcast channels.
Contributions: The full-format extension of the original IB method for NSC to uniterminal JSCC has been developed in [49], where an iterative algorithm, termed Forward-Aware Vector Information Bottleneck (FAVIB) has been proposed to tackle the underlying non-convex design problem.Herein, we concentrate on a particular instance of the multiterminal JSCC problem in which several observations from a given data source shall be compressed locally before getting transmitted over multiple noisy forward links to a (distant) processing unit.This can be reckoned as a straightforward extension of the well-known Chief Executive Officer (CEO) problem setup [50], in which the communications between the deployed agents and the CEO happen over several error-prone channels.As one tangible example out of many practical applications of such a distributed setup, one can think of achieving a highly reliable wireless transmission scheme by leveraging the spatial diversity obtained by the joint processing of the incoming signals from a densely employed network of radio access points with the overlapping coverage areas.Pursuing the IB philosophy, then we introduce the parallel and successive compression schemes, both enjoying a joint design of local quantizers and utilizing specific MI terms to quantify the informativity and compactness of their resultant outcomes.Subsequently, with a similar argument to [49], we characterize the form of stationary solutions regarding the considered distributed compression problems.Particularly, we find the stationary points obtained by the Lagrangian relaxation of the constrained optimization modeling of such problems.Thereupon, we propose a generic algorithm, the Multiterminal FAVIB (M-FAVIB), that leverages the derived stationary points to address both design optimizations.Applying an unfolding trick, we also prove that both schemes fall into the Successive Upper-Bound Minimization (SUM) framework [51], ensuring that they converge to a stationary point.In a variety of practical occasions, the proposed methods become crucial, i.a., in applications where the separate employment of the iterative, modern error-correcting techniques on the noisy forward links will be prohibited by stringent latency constraints, or when dealing with the hardware imperfections in which the separate utilization of those techniques results in a substantial overhead regarding the energy efficiency, giving rise to a plain waste of resources since they are utilized merely to take precautions against worst-case situations [35].Further, it should be mentioned that from a sheer theoretical viewpoint, since Shannon's source and channel separation's optimality [52] does not hold in general [53], e.g., when working in a non-asymptotic blocklength regime, devising such JSCC schemes becomes relevant.
Outline: The uniterminal IB-based JSCC will be briefly discussed in Section II as a prelude to the multiterminal extensions.In Section III, a specific distributed setup that frequently appears in a diversity of applications is considered, and, over that, the parallel and successive compression schemes are introduced.Subsequently, the stationary points are characterized for both cases, and, in Section IV, a generic algorithm, the M-FAVIB, is presented to address the design optimizations together with the convergence proofs to the stationary points.The effectiveness of the devised compression schemes is corroborated in Section V by means of numerical investigations over typical transmission setups.Finally, a bare concise wrap-up is given in Section VI.The proofs have been relegated to the Appendix.

II. IB-BASED JSCC: A BRIEF OVERVIEW OF THE UNITERMINAL CASE
In [49], the Noisy Source Coding (NSC) scenario of the Information Bottleneck (IB) method [1] has been extended to the Joint Source-Channel Coding (JSCC) setup for quantization of a single noisy observation (scalar/vector-valued). 1For that, the illustrated system model in Fig. 1 has been considered.A data source, x, is observed via a discrete memoryless (access) channel with the transition probabilities, p(y|x).The observation, y, is then compressed into the signal, z, ahead of transmission over another discrete, memoryless (forward) channel with the transition probabilities, p(t|z), to a (distant) processing unit.To formulate the quantizer design problem for this setup and fully aligned with the fundamental principle of the IB paradigm, two Mutual Information (MI) terms are employed.On the one hand, the relevant information, I(x; t), which is the MI between the source, x, and the forward channel output signal, t, is considered as the natural choice for gauging the informativity of the resultant outcome.On the other hand, the MI between the input and output of the compression unit, I(y; z), called the compression rate, is chosen as the term that quantifies its compactness.
with λ ≥ 0, denoting the counterpart relaxation of the upperbound, R, in (1).Utilizing Variational Calculus, it has been shown in [49] that for stationary points of the objective functional in (2), it holds (for each pair (y, z) ∈ Y × Z) with β = 1 λ and ψ(y, β), being a normalization function, ensuring the resultant mapping's validity.Further, an iterative routine, termed Forward-Aware Vector Information Bottleneck (FAVIB), has been proposed in [49] that, indeed, carries out the Fixed-Point Iteration method on (3).The convergence of FAVIB to a local optimum of its objective functional has been proven as well, and it has been shown that this compression scheme enjoys inherent error-protection, capable of obviating the need for channel coding on the noisy forward link.

III. IB-BASED JSCC: PARALLEL & SUCCESSIVE MULTITERMINAL EXTENSIONS
We consider the illustrated system model in Fig. 2 for multiterminal extensions of the uniterminal JSCC setup.Explicitly, a data source, x, is observed independently via J discrete, memoryless (access) channels.The outputs of these channels have to be locally compressed (yet following a joint design) and then transmitted over J discrete, memoryless (forward) links to a (distant) processing unit that should figure out the source signal, x.Analogous to the uniterminal scenario, we presume that the source distribution, p(x), the access channels' transition probabilities, p(y j |x), and the forward channels' statistics, p(t j |z j ), are available for all branches.We presume that the Markovian independence relation x ↔ y j ↔ z j ↔ t j applies per branch, j, and the counterpart signals of every two distinct branches are conditionally independent given the source signal, x.
To formulate the design problem(s) through an analogous approach to the original IB framework, one should specify the responsible terms regarding both the informativity and compactness of the resultant outcomes.In this work, we naturally choose the end-to-end transmission rate, I(x; t 1:J ), as the term gauging the information preservation.In contrast, there is no natural, unique choice for the other side of the trade-off, and, indeed, different meaningful expressions can be applied.Herein, we consider two distinct constraint sets for stipulating the compactness of outcomes (extending the raised ideas in [31]).Then, for each specific choice, we characterize the form of stationary points regarding all the individual quantizer mappings.Those solutions will be utilized as the backbone of the proposed M-FAVIB algorithm to address both design optimizations for this multiterminal IB-based JSCC setup.We also provide the convergence proofs to the stationary points of the objective functionals.

A. Parallel Retrieval: Compression Without Side-Information
As the first choice of the imposed constraint set, we consider the case where individual branches are subject to different rate limitations, and from a compression's perspective, no side-information will be utilized when treating each particular observation, y j , allowing for the pure parallel processing across branches.This fully discrete scenario has been analyzed in [25] for an identical setup to the one depicted in Fig. 2 with the major difference of presuming ideal (rate-limited) forward links.There, for characterizing the form of stationary points and tackling the design problem, the Multivariate Information Bottleneck (MIB) [21] framework was aptly tailored to the considered scenario.The conducted derivations here extend the provided results in [25] to this parallel IB-based JSCC setup.The design problem is then formulated as looking for the optimal set, P * = {p * (z 1 |y 1 ), • • • , p * (z J |y J )}, where Fig. 3.An illustrative example of parallel retrieval: Independent processing flow across branches from the codeword, z j , to the retrieved one, t j .Mappers are located in the compression units and demappers in the (remote) processing unit.
with 0 ≤ R j ≤ log 2 |Z j | bits, setting an upper-bound on the j-th compression rate, I(y j ; z j ). 2 Employing the LM method, the design problem (4) can be reformulated as an unconstrained optimization (up to the pertinent mappings' validity) with λ j ≥ 0, denoting the counterpart relaxation of the upperbound, R j , in (4).Aligned with the performed analysis in [1], the following theorem characterizes the form of stationary points 3 of the objective functional in (5).
Theorem 1 (Parallel IB-based JSCC): Assume the joint distribution, p(x, y 1:J ) = p(x) j p(y j |x), the forward channels' statistics, p(t j |z j ), and λ j are available for all j = 1 to J. The set of local quantizers, {p(z j |y j ) |j}, is a stationary point of the parallel IB-based JSCC Lagrangian if and only if for each pair with ψ Par.zj (y j , β j ), being a partition (normalization) function, ensuring the pertinent quantizer mapping's validity, and for the relevant distortion, d Par.(y j , z j ), it holds with β j = 1 λj and p(t 1:J |z 1:J ) = J j=1 p(t j |z j ), resulting from presumed Markov properties.
The proof has been presented in Appendix A. The provided solution in Theorem 1 incorporates the ideal forwarding scenario as a specific instance.Explicitly, for the case in which the forward channels are presumed to be rate-limited but error-free, the design problem is then to maximize the overall transmission rate that now gets reduced to I(x; z 1:J ), under the same set of constraints on individual compression rates.Through substituting, p(t j |z j ) = δ tj ,zj , for each pair, (z j , t j ) ∈ Z j × T j (with Kronecker Delta denotation 4 ), the acquired relevant distortion in (8) boils down to Par. (y j , z j ) = β j z -j which is identical to the one provided in [25].This verifies the Backward Compatibility of our deduced solution for this (parallel) IB-based JSCC problem to its NSC counterpart.
Coding Perspective: To acquire a crisp image about possible applications of this setup, one can consider a conventional coding scenario.For that, as depicted in Fig. 3, the processing flow from the compressed signal, z j , up to the retrieved one, t j , is presumed to be realized independently across branches.Each mapper allocates a bit-tuple, zj , to the codeword, z j , and, conversely, each demapper retrieves a codeword, t j , from the received bit-tuple, tj at the output of a Binary Symmetric Channel (BSC). 5Contrary to the case of ideal forwarding, it must be noted that, here, the apposite choice of bit-tuple mappings (applied in the compression units) becomes relevant in terms of the end-to-end information preservation.The respective demappings will be performed independently across branches in the (remote) processing unit.

B. Successive Retrieval: Compression With Side-Information
As the second choice regarding the imposed set of constraints, again we consider the case where individual branches are subject to different rate limitations but here, contrary to the previous setup, from a compression's perspective, the side-information is exploited when treating the observation, y j .The principal idea behind this scheme is fully aligned with the well-known Wyner-Ziv setup [10] for source coding in which a statistically correlated signal is utilized as side-information at the decoder.With the major difference of presuming ideal (rate-limited) forward channels, in [27], Fig. 4.An illustrative example of successive retrieval: Interconnected processing flow across branches from the codeword, z π(j) , to the retrieved one, t π(j) , in line with the well-known principle of "binning & detection" (see, e.g., [55]).a completely discrete scenario has been analyzed for an identical setup to the one illustrated in Fig. 2. There, to address the design problem, the proposed algorithm in [24] for the Distributed Information Bottleneck (DIB) has been generalized to allow for various rate constraints across branches.The conducted derivations here extend the provided results in [27] to this successive IB-based JSCC setup.The design problem is then formulated as looking for the optimal set, P * = {p * (z 1 |y 1 ), • • • , p * (z J |y J )}, where with 0 ≤ R j ≤ log 2 |Z j | bits, stipulating an upper-bound on the j-th conditional compression rate, I(y j ; z j |t 1:j−1 ).Here as well, by employing the LM method, the design problem (10) can be restated as an unconstrained optimization (up to the pertinent mappings' validity) λ j I(y j ; z j |t 1:j−1 ), (11) with λ j ≥ 0, being associated with the upper-bound, R j , in (10).Aligned with the performed analysis in [1], the following theorem characterizes the form of stationary points of the objective functional in (11).
Theorem 2 (Successive IB-based JSCC): Assume the joint distribution p(x, y 1:J ) = p(x) j p(y j |x), the forward channels' statistics, p(t j |z j ), and λ j are available for all j = 1 to J. The set of local quantizers, {p(z j |y j ) |j}, is a stationary point of the successive IB-based JSCC Lagrangian if and only if for each pair with ψ Suc.zj (y j , β j ), being a partition (normalization) function, ensuring the pertinent quantizer mapping's validity, and for the relevant distortion, d Suc.(y j , z j ), it holds with β j = 1 λj and p(t 1:J |z 1:J ) = J j=1 p(t j |z j ), resulting from presumed Markov properties.
The proof has been presented in Appendix B.
A closer look at the obtained relevant distortion in ( 14) reveals that, basically, it extends the derived solution in (8) for the case of parallel retrieval by two extra terms appearing due to the consideration of side-information for the compression rates of individual branches.In the upcoming section, the present common structure in the obtained results for both parallel and successive compression schemes will be leveraged for devising a generic algorithm, the M-FAVIB, to address the respective design optimizations.
The ideal forwarding scenario is encompassed as a specific instance in the derived solution in Theorem 2. Explicitly, in the case where the forward channels are presumed to be rate-limited but error-free, the design problem would be to maximize the overall transmission rate that now gets reduced to I(x; z 1:J ), under the adapted set of constraints on individual conditional compression rates, i.e., for the j-th branch it must apply I(y j ; z j |z 1:j−1 ) ≤ R j .By substituting, p(t j |z j ) = δ tj ,zj , for each pair (z j , t j ) ∈ Z j × T j (with Kronecker Delta denotation), the calculated relevant distortion in ( 14) boils down to which is the same as the one provided in [27].This confirms the Backward Compatibility of our deduced solution for this (successive) IB-based JSCC problem to its NSC counterpart.
Coding Perspective: As a tangible example of possible applications for this setup, one can consider a conventional coding scenario.Denoting by π(•) a specific demapping order, as illustrated in Fig. 4, the processing flow from the compressed signal, z π(j) , up to the retrieved one, t π(j) , is presumed to be realized in an interconnected fashion across branches.Therefore, the major difference compared to the previous setup is that the demappings are performed in a sequential manner across branches such that the already retrieved signals, t π(1):π(j−1) , can be utilized as an available side-information when applying the pertinent demapping to retrieve t π(j) , since all demapper blocks are located in the (remote) processing unit. 6In this fashion, contrary to the parallel setup, the present correlation between the signals of different branches will be leveraged as well.

A. Proposed Algorithmic Approach
In this section, we develop a generic, iterative algorithm to tackle both distributed compression design problems.To that end, it has to be noted that irrespective of the chosen compression scheme, the derived stationary solution for every pair (y j , z j ) ∈ Y j × Z j of each individual quantizer mapping has the following implicit form with r ∈ {Par., Suc.}. 7The right side of ( 16) can be interpreted as a functional featuring all the individual quantizer mappings, {p(z j |y j ) |j}, as its input arguments since they come into play when calculating the relevant distortion, d r (y j , z j ).This indicates that ( 16) can be viewed as (with F j denoting a specific functional) for the j-th branch.Going through all branches then yields a non-linear system that extends the structure of Multivariate Fixed-Point systems [56] to the field of functionals wherein the functions of multiple variables are substituted by the functionals of multiple mappings.Hence, the conventional iterative procedures can be applied for solving the obtained system as well.Herein, we propose an iterative procedure with the synchronous updating rule in line with the standard Jacobi method for solving linear systems [56].
The proposed algorithm, termed Multiterminal Forward-Aware Vector IB (M-FAVIB), with the pseudo-code presented in Alg. 1 proceeds as follows: Commencing with a set of random (yet valid) mappings, {p (0) (z j |y j ) |j}, for each pair, (y j , z j ) ∈ Y j × Z j , the updates are executed (till convergence Alg. 1 Multiterminal Forward-Aware Vector IB (M-FAVIB) Input: p(x, y 1:J ), β j > 0, p(t 1:J |z 1:J ), conv.parameter ε > 0 Output: A (soft) partition z j of Y j into (at most) |Z j | clusters Initialization: i = 0, random mappings {p (0) (z j |y j ) |j} while True do for j = 1 : • find the i-th update for distributions in d r (y j , z j ) from with, i, representing the iteration counter.The quantizer output probability, p (i) (z j ), and the relevant distortion, d r (y j , z j ), in (18) are calculated through exerting the current versions of all the individual quantizer mappings, {p (i) (z j |y j ) |j}, to marginalize the joint distribution, p (i) (x, y 1:J , z 1:J , t 1:J ).The updates are performed synchronously, i.e., at iteration i + 1, all the individual quantizer mappings of different branches, {p (i+1) (z j |y j ) |j}, will be updated based on the previous configuration of the same set, i.e., {p (i) (z j |y j ) |j}.In the following part, it will be proven that irrespective of the chosen compression scheme (i.e., parallel or successive retrieval), the M-FAVIB algorithm converges to a stationary point of the objective functional.Hence, as a commonly used workaround for avoiding poor results, the aforementioned procedure can be repeated with different starting points, {p (0) (z j |y j ) |j}, with the aim of retaining the best outcome.It should be reminded that, as its name suggests, the M-FAVIB algorithm is directly applicable to the case of Vector Quantization (VQ), following the same line of argumentation that already has been provided in [49] for its uniterminal counterpart.

B. Proof of Convergence
In this part, it will be shown that the design optimizations for parallel and successive schemes can be addressed by an alternating minimization w.r.t. the set of all quantizer mappings, P , and another set of apposite auxiliary probability distributions, Q, by introducing a tight variational upper-bound, Fr (P, Q), 8 Per iteration, for every realization, y j ∈ Y j , of the access channel output the sum of calculated terms in (18) (by ignoring ψ r z j ) over all output bins, z j ∈ Z j , acts as the normalization (partition) function, ψ r z j (y j , β j ), to ensure on the respective objective functional, F r (P ).The focal update step of M-FAVIB can then be interpreted as merging together the updates of P and Q.Through this unfolding trick, one principally shows that M-FAVIB falls into the category of Successive Upper-Bound Minimization (SUM) 9 [51], ensuring that it converges to a stationary point.Similar approaches have been presented in [24] and [26] for convergence proof of their proposed algorithms.
Lemma 2: FPar.(P, Q) is separately convex in P and Q. Proof: It follows directly from the application of log-sum inequality [54].
Lemma 3: For a fixed P , there exists a unique Q that minimizes FPar.(P, Q), given by where p(x|t 1:J ) is calculated from P .Proof: It follows directly from the proof of Lemma 1. Lemma 4: For a fixed Q, there exists a P that minimizes FPar.(P, Q), given by p * (z j |y j ) = p(z j ) ψPar.
zj (y j , β j ) exp − dPar.(y j , z j ) with ψPar.zj (y j , β j ), being a partition function, ensuring the pertinent quantizer mapping's validity and the relevant distortion, dPar.(y j , z j ), is calculated by dPar.(y j , z j ) = β j z -j 1:J p(z -j 1:J |y j ) t1:J p(t 1:J |z 1:J ) × D KL p(x|y j , z -j 1:J ) q(x|t 1:J ) .(25) 9 The principal idea of the SUM is to optimize a sequence of approximate objective function(al)s (satisfying certain mild assumptions [51]), rather than directly optimizing the original non-convex and/or non-smooth objective function(al).
Proof: It follows from the same line of reasoning as in the proof of Theorem 1, noting that δ E x,t1:J {log q(x|t 1:J )} δp(z j |y j ) = p(y j ) z -j 1:J p(z -j 1:J |y j ) × t1:J p(t 1:J |z 1:J ) x p(x|y j , z -j 1:J ) log q(x|t 1:J ).( 26) Merging together the optimal results for Q and P from the last two Lemmas, one directly obtains the focal update step ( 7)-( 8 Successive Retrieval: The design optimization in ( 11) can be reformulated as minimizing the functional λ j I(z j ; t 1:j−1 ) + H(x|t 1:J ), (27) over P .Defining the set of auxiliary probability distributions Q = {q(z 2 |t 1 ), • • • , q(z J |t 1:J−1 ), q(x|t 1:J )} and the functional where the equality holds iff q(z j |t 1:j−1 ) = p(z j |t 1:j−1 ) for j = 2 to J and q(x|t 1:J ) = p(x|t 1:J ).Lemma 6: FSuc.(P, Q) is separately convex in P and Q. Proof: It follows directly from the application of log-sum inequality [54].
Proof: It follows directly from the proof of Lemma 5. Lemma 8: For a fixed Q, there exists a P that minimizes FSuc.(P, Q), given by p * (z j |y j ) = p(z j ) ψSuc.
zj (y j , β j ) exp − dSuc.(y j , z j ) ∀(y j , z j ) ∈ Y j × Z j , (32) with ψSuc.zj (y j , β j ), being a partition function, ensuring the pertinent quantizer mapping's validity and the relevant distortion, dSuc.(y j , z j ), is calculated by Proof: It follows from the same line of reasoning as in the proof of Theorem 2, noting that and in the case of n > j δ E zn,t1:n−1 {log q(zn|t1:n−1) The focal update step ( 13)-( 14) of M-FAVIB regarding any of the involved quantizer mappings is directly obtained by merging together the optimal results for Q and P from the last two Lemmas. 10Hence, the convergence of M-FAVIB to a stationary point of the objective functional is ensured by [

C. Supplementary Mathematical Discussion
For the parallel retrieval, applying ( 7)-( 8) as the central update step of the M-FAVIB, in the limit of letting β j → 0, the design problem (5) boils down to minimizing the j-th compression rate, I(y j ; z j ) (presuming fixed p(z |y ) and λ for all = 1 to J and = j) w.r.t. the j-th quantizer 10 Regarding the second component in (33), replacing q(z j |t 1:j−1 ) by p(z j |t 1:j−1 ) = p(t 1:j−1 ,z j ) p(t 1:j−1 ) , the log term inside can be rewritten through log p(t 1:j−1 ,z j ) p(t 1:j−1 )p(z j ) = logp(t 1:j−1 |z j ) − log p(t 1:j−1 ).Furthermore, the term È t 1:j−1 p(t 1:j−1 |y j ) log p(t 1:j−1 ) can be ignored since it does not depend on z j , and, therefore, gets absorbed into the respective normalization function.
mapping, p(z j |y j ).In that case, each realization, y j ∈ Y j , is allocated to all output bins, z j ∈ Z j , equiprobably (state of full diffusion).In this fashion, the input and output of the j-th compression unit become statistically independent, and the respective compression rate, I(y j ; z j ), reaches its global minimum, i.e., zero.Further, for finite β j values, stochastic (soft) mappings, p(z j |y j ), are engendered in general, while in the asymptotic case of letting β j → ∞, the partition function, ψ Par.zj (y j , β j ), for each realization, y j , allocates all the probability mass into the specific bin that reveals the minimum dPar.βj value and, hence, induces the quantizer mapping to become deterministic (hard), i.e., p(z j |y j ) ∈ {0, 1} for each pair (y j , z j ) ∈ Y j × Z j (state of full concentration).To rationalize this, by presuming fixed p(z |y ) and λ for all = 1 to J and = j, and by letting β j → ∞ (λ j → 0), the design optimization ( 5) boils down to maximizing the overall transmission rate, I(x; t 1:J ), w.r.t. the j-th quantizer mapping, p(z j |y j ), that is a convex maximization task.To discern this, it should be noted that I(x; t 1:J ) is convex w.r.t.p(t 1:J |x) for a fixed p(x) [54].Furthermore, p(t 1:J |x) and p(t j |x) are related through p(t j |z j )p(z j |y j )p(y j |x), (37) which is also an affine transform.This, in turn, concludes the proof of the claimed proposition.Resorting to a well-known theorem from convex maximization theory asserting that a convex function that is defined over a closed and convex set obtains its global maximum at an extreme point of that set [58,Ch. 4], it is directly inferred that it suffices to focus on deterministic mappings.To realize this, one may recall that the space of valid mappings, p(z j |y j ), is a closed, convex polytope generated by the Cartesian product of |Y j | probability simplices [59].The extreme points of this polytope occur at its corners, corresponding to the Cartesian product of the corners of its constituent probability simplices, yielding a deterministic mapping per extreme point.
For the successive retrieval, analogous behavior is observed as well.Specifically, applying ( 13)-( 14) as the central update step of the M-FAVIB, (presuming fixed p(z |y ) and λ for all = 1 to J and = j) and by letting β j → 0 (λ j → ∞), the design problem (11) boils down to minimizing the j-th conditional compression rate, I(y j ; z j |t 1:j−1 ), w.r.t. the j-th quantizer mapping, p(z j |y j ).Like before, this is addressed by allotting each realization, y j ∈ Y j to all output clusters, z j ∈ Z j , equiprobably (state of full diffusion), resulting in the global minimum (i.e., zero) of the respective conditional compression rate, I(y j ; z j |t 1:j−1 ), since it is non-negative and upper-bounded by I(y j ; z j ).Similarly, for finite β j values, stochastic mappings, p(z j |y j ), are engendered in general, while in the asymptotic case of letting β j → ∞, as before, the partition function, ψ Suc.zj (y j , β j ), induces the quantizer mapping, p(z j |y j ), to become deterministic (state of full concentration).To justify this, a similar line of reasoning as in the previous case is also applicable here.Specifically, by presuming fixed p(z |y ) and λ for all = 1 to J and = j, and by letting β j → ∞ (λ j → 0), the design optimization (11) boils down to maximizing11 w.r.t. the j-th quantizer mapping, p(z j |y j ), that can be shown to be a convex maximization task.To that end, it only suffices to demonstrate that I(z n ; t 1:n−1 ) is convex w.r.t.p(z j |y j ) since λ n is non-negative and the sum of several convex functions will be convex as well.I(z n ; t 1:n−1 ) is a convex function of p(t 1:n−1 |z n ) for a fixed p(z n ) [54].Noting that the relation between p(t 1:n−1 |z n ) and p(z j |y j ), established by (39), shown at the bottom of the page, is also an affine transform, in turn, concludes the proof of the claimed proposition.

A. (Forward-) Awareness Vs. Unawareness
Regarding the depicted system model in Fig. 2, we presume equiprobable source signals from a standard 16-QAM (Quadrature Amplitude Modulation) constellation (σ 2 x = 10) over J = 3 branches.The access link between the source and each compression unit has been modeled as a discrete memoryless channel, approximating a discrete-time discrete-input continuous-output Additive White Gaussian Noise (AWGN) channel with the noise variance, σ 2 n .To acquire the transition probability matrices, rather than a prequantization of the output signal, 160 samples have been generated per branch, following a pure Monte Carlo approach.Denoting by N the allowed number of quantizers' output clusters, we consider a symmetric N × N forward channel model per branch that is characterized by the reliability parameter, θ, in the following fashion: Each input symbol is received correctly with the probability 1 − θ and erroneously (to every other output symbols) with the probability θ N −1 .Therefore, higher θ values indicate less reliable transmissions and vice versa. 12Further, note that for a specific reliability value, θ, the transition probabilities, p(t j |z j ), will be influenced by the particular choice of N .We consider a totally symmetric setup (having the same access channel noise variance, σ 2 n , and the forward channel reliability value, θ, across branches), and set all β j values (for j = 1, 2, 3) to 100.Then, the end-to-end transmission rate, I(x; t 1:3 ), is calculated as the performance indicator.
The required quantization is applied by the proposed M-FAVIB (parallel retrieval) and the baseline MultiIB algorithm from [25] to check whether integrating the forward channels' effects into the quantizers' design problem brings about some performance gain or not in comparison with the case, wherein the error-prone forward channels are totally neglected, and one simply aims at maximizing I(x; z 1:3 ).Since these algorithms are randomly initialized, to be fair, we choose an identical set of starting points, {p (0) (z j |y j ) |j}, for both approaches.To avoid poor results, each method has been repeated 100 times, and the best outcome has been retained.The obtained curves have been illustrated in Fig. 5.
Specifically, for two distinct values of the access channels' noise variance, namely, σ 2 n = 0.15, 0.20, (with I(x; y 1:3 ) ≈ 4 12 The capacity of this symmetric N × N forward channel model is equal to [60].
Fig. 6.End-to-end transmission rate, I(x; t 1:3 ), vs. allowed number of bins, N , equiprobable 16-QAM signaling (σ 2 x = 10), discrete AWGN access channels with the noise variance, σ 2 n j , for branch j, symmetric N × N forward channels with the reliability value, θ j for branch j, β j = 100 for j = 1, 2, 3, the convergence parameter, ε = 10 −3 .bits), the allowed number of output clusters (per branch), N , has been varied from 2 to 6 and the overall transmission rate, I(x; t 1:3 ), has been calculated for four choices of the forward channels' reliability value, namely, θ = 0, 0.1, 0.2, 0.3.For the particular case of θ = 0, corresponding to considering errorfree forward channels, the M-FAVIB (parallel retrieval) and the MultiIB [25] algorithms engender the same result.For the other three cases, corresponding to having error-prone forward channels, it is directly observed that, compared to the MultiIB algorithm that completely ignores the imperfect forwarding of quantizers' output signals, the proposed M-FAVIB provides larger end-to-end transmission rates through an increase in the allowed number of quantizers' output clusters.This clearly substantiates the fact that integrating the forward channels' effects into the design problem of the compression scheme is, indeed, beneficial.This was to be expected, noting that for the M-FAVIB, the design of compressed signals are such that, in addition to capturing well the information from the remote source, they also account for the errors occurring over the imperfect forward links, while for the MultiIB, by ignoring the impacts of the forward channels, they are designed solely to preserve information about the source.Comparing the respective curves of M-FAVIB with the pertinent results for the specific case of error-free forward channels, θ = 0, reveals that by increasing the output levels, the occurring information loss over imperfect forward channels is steadily reduced.
Through a more detailed inspection of the provided results in Fig. 5, it is observed that, contrary to the baseline curves, the performance of M-FAVIB remains almost the same for the chosen values of the access channels' noise variance, σ 2 n .The reason behind this is the fact that in the interplay between the influencing parameters, i.e., the access channels' noise variance and the forward channels' reliability parameter, the predominant confining factor is the reliability of the forward channels.Obviously, this does not happen for the baseline curves as they do not take into account the imperfections of the forward channels.The performance gap of the baseline and the M-FAVIB stems from the point that the baseline totally ignores the forwarding effects and only tries to maximize I(x; z 1:3 ).Thus, it does not fully leverage the available resources (the allowed forward rates) in the sense of attempting to preserve as much end-to-end rate, I(x; t 1:3 ), as possible.The saturation (flattening effect) occurring over the baseline curves can also be understood analogously.The lower the access channels' noise variance, σ 2 n , the lower the number of required quantizers' output clusters to come quite close to the maximum supportable value of I(x; z 1:3 ) and, consequently, the sooner the saturation effect and, thus, ending up to lower end-to-end transmission rates, I(x; t 1:3 ).

B. Joint Vs. Separate Design
In this part, we substantiate the fact that the joint design of local quantizers is, indeed, beneficial compared to the simplest approach of applying independent (separately designed) local quantizers across branches.For that, we consider the same setup as that of the previous part but with different (i.e., asymmetric) parameter specifications.Explicitly, we presume the equiprobable source samples out of a 16-QAM constellation.The access links are modeled as discrete memoryless channels (160 samples per branch), approximating discrete-time discrete-input continuous-output AWGN channels with the particular noise variance, σ 2 nj , for the j-th branch.The forward links are modeled as N × N symmetric channels each characterized by the particular reliability parameter, θ j , for the j-th branch.We set all β j values (for j = 1, 2, 3) to 100 and calculate the end-to-end transmission rate, I(x; t 1:3 ), as the performance indicator, when varying the allowed number of quantizers' output clusters, N , from 2 to 6. Here, we perform the same investigations for the M-FAVIB (parallel retrieval) and the case in which, per branch, the proposed FAVIB algorithm in [49] has been employed to maximize I(x; t j ) for j = 1, 2, 3, separately.To be fair, like before the same set of starting points, {p (0) (z j |y j ) |j}, has been used for both approaches, and to avoid poor results, each method has been repeated 100 times with the best outcome retained.The obtained results have been illustrated in Fig. 6.

C. Parallel Vs. Successive Retrieval
In the last part, we consider again the same setup as before, but with completely symmetric specifications regarding both the access and forward channels (analogous to the first round of investigations in subsection V-A), i.e., an equiprobable 16-QAM source signaling over J = 3 branches, each featuring a discrete memoryless access channel (160 samples per branch), approximating a discrete-time discrete-input continuous-output AWGN channel with the noise variance, σ 2 n (same value across branches), and a symmetric N × N forward channel model with the reliability parameter, θ (same value across branches).The quantizers' allowed output clusters are fixed to N = 4, and the β j values (for j = 1, 2, 3) are varied from 1.5 to 3.5.The end-to-end transmission rates and the forward (compression) sum rates have been calculated for two cases of utilizing the M-FAVIB with both approaches of the parallel and successive retrieval.The obtained results have been illustrated in Fig. 7.
Explicitly, in Fig. 7a, we fixed the noise variance of the access channels to σ 2 n = 0.25, and varied the reliability value of the forward channels, namely, θ = 0.05, 0.10, 0.15.Contrarily, in Fig. 7b, we fixed the reliability value of the forward channels to θ = 0.05, and varied the access channels' noise variance, namely, σ 2 n = 0.25, 0.50, 0.75 (with I(x; y 1:3 ) ≈ 4, 3.95, 3.75 bits, respectively).As the main takeaway, it can be immediately observed from both results that, under identical specifications, the utilization of available side-information can decrease the required overall forward rate for supporting a certain level of the end-to-end transmission rate compared to the parallel scheme wherein the correlations among different forward channels' outputs is totally neglected.This can be clearly rationalized by noting the fact that, due to the presumed Markovian properties, conditioning on the previous forward channels' outputs can either deduct from the current unconditional compression rate or keep it unchanged as it applies I(y j ; z j |t 1:j−1 ) = I(y j ; z j ) − I(z j ; t 1:j−1 ), and the MI is non-negative.

VI. SUMMARY
We concentrated on a particular multiterminal Joint Source-Channel Coding problem wherein several noisy observations from a remote source are compressed locally (yet following a joint design) prior to getting forwarded to a (distant) processing unit over multiple error-prone channels.Herein, we adapted the fundamental idea of the Information Bottleneck method and proposed two distributed compression schemes.For that, we obtained the form of stationary points regarding the individual local quantizers and also presented a generic iterative algorithm, the M-FAVIB, which extends the schemes MIB [25] and FAVIB [49].Applying an unfolding trick, we linked the proposed algorithm to the Successive Upper-Bound Minimization, thereby providing the proof of convergence to a stationary point.After an in-depth analysis of the proposed method, we also substantiated its effectiveness by means of numerical simulations.

B. Proof of Theorem 2
The successive IB-based JSCC Lagrangian, L Suc. , in ( 12) is a functional of all the individual local quantizer mappings {p(z j |y j ) |j}.To come into a stationary point of it, its derivative w.r.t.every quantizer mapping, p(z j |y j ), must be equated to zero.Associating a Lagrange multiplier, λ yj , for every realization, y j ∈ Y j , of the observation, y j , one can incorporate the validity conditions into the overall successive JSCC Lagrangian, L Ov. Suc. , for which it applies Suc.w.r.t.p(z j |y j ), only the corresponding derivative of its second term must be determined at this point as the pertinent derivatives of its first and last terms are already given in (45) and (43), respectively.To do so, it has to be noted that δp(zj |yj ) = 0, it is inferred from ( 43), ( 45), ( 52) and ( 53 Bringing the second component in (54) to the other side of equality, multiplying both sides by β j = 1 λj , exponentiating them, and, eventually, multiplying by p(z j ), it applies p(z j |y j ) = p(z j ) exp − d Suc.(y j , z j ) + β j λSuc.

yj
. (56) Enforcing the validity condition, zj p(z j |y j ) = 1, and noting that λSuc.yj is independent of z j , one can treat exp(−β j λSuc.yj ) as the partition function, ψ Suc.zj , to come into the form of (13).

Fig. 1 .
Fig.1.The considered system model for uniterminal Joint Source-Channel Coding.Access and forward channels are assumed to be discrete and memoryless.

Fig. 2 .
Fig. 2. Considered system model for multiterminal Joint Source-Channel Coding.All access and forward channels are assumed to be discrete and memoryless.

. ( 45 )
Due to the positivity of p(y j ) and by applying the stationary condition, δLOv.Par.
is a functional of all the individual local quantizer mappings {p(z j |y j ) |j}, to come into a stationary point of it, its derivative w.r.t.every quantizer mapping, p(z j |y j ), must be equated to zero.Associating a Lagrange multiplier, λ yj , per realization, y j ∈ Y j , of the observation, y j , one can incorporate the validity conditions into the overall parallel JSCC Lagrangian, × D KL p(x|y j , z -j 1:J ) p(x|t -j 1:J ) .(48) Bringing the second summand in Thus, by definition of the KL divergence, it holds− z -j 1:J p(z -j 1:J |y j ) t1:J p(t 1:J |z 1:J )D KL p(x|y j , z -j 1:J ) p(x|t 1:J ) −λ j log p(z j |y j ) p(z j ) + λPar.yj=0,(47)with λPar.yj , being equal to λPar.yj =