Self-Guided Belief Propagation – A Homotopy Continuation Method

Belief propagation (BP) is a popular method for performing probabilistic inference on graphical models. In this work, we enhance BP and propose self-guided belief propagation (SBP) that incorporates the pairwise potentials only gradually. This homotopy continuation method converges to a unique solution and increases the accuracy without increasing the computational burden. We provide a formal analysis to demonstrate that SBP finds the global optimum of the Bethe approximation for attractive models where all variables favor the same state. Moreover, we apply SBP to various graphs with random potentials and empirically show that: (i) SBP is superior in terms of accuracy whenever BP converges, and (ii) SBP obtains a unique, stable, and accurate solution whenever BP does not converge.


INTRODUCTION
C OMPUTING the marginal distribution and evaluating the par- tition function are two fundamental problems of probabilistic graphical models.Both problems are NP-hard to solve [4], which substantiates the need for efficient approximation methods.
Belief propagation (BP) provides an efficient way to approximate the marginal distribution and has a long success story in many applications, including computer vision, speech processing, social network analysis, and error correcting codes [13], [17], [25].It is, however, still an open problem to obtain a rigorous understanding of the limitations of BP for general graphs, where BP may fail to converge because: (i) multiple solutions exist, and it depends on implementation details to which one BP converges; (ii) one or all fixed points are unstable and messages oscillate far away from any fixed point [12], [23], [37].These limitations motivate the search for modifications of BP that overcome these issues in order to increase the accuracy and enhance the convergence properties.
One way to provide convergence-guarantees is to consider an equivalent optimization problem that minimizes the Bethe free energy F B .This, however, comes at the cost of an increased runtime complexity; polynomial-time algorithms only exist for restricted classes of problems and even approximating the global minimum might be problematic for graphical models with arbitrary potentials [3], [28], [39].Hence, the pursuit for methods that approximate the marginals with both runtime-and convergenceguarantees is still ongoing.
In this work, we present self-guided belief propagation (SBP) that aims to fill this gap.The observation that strong pairwise potentials reduce accuracy and deteriorate the convergence properties [15] inspired us to construct a homotopy; i.e., we first consider only local potentials (where BP is exact) and subsequently modify the model by increasing the pairwise potentials to the desired values.SBP thus solves a deterministic sequence of models that iteratively refines the Bethe approximation towards an accurate solution that is uniquely defined by the initial model.
We evaluate SBP for grid-graphs, complete graphs, and random graphs with Ising potentials and, compared to BP, we observe superior performance in terms of accuracy; in fact SBP achieves more accurate results than Gibbs sampling in a fraction of runtime.We theoretically demonstrate optimality of the selected fixed point for attractive 1 models with unidirectional local potentials.Additionally SBP enhances the convergence properties and excels for general models where SBP provides accurate results despite the non-convergence of BP.We further expect that the ease of use lowers the hurdle for practical applications.
The paper is structured as follows: Section 2 provides some background on probabilistic graphical models, belief propagation, and methods that minimize the free energy.Our proposed algorithm is presented in Section 3 and important properties are presented subsequently.We evaluate SBP and discuss empirical observations in Section 4 and provide a more formal analysis in Section 5. Finally we conclude the paper in Section 7.

BACKGROUND
In this section, we briefly introduce probabilistic graphical models and specify the models considered in this work.We further introduce the BP algorithm and its connection to the Bethe approximation.

Probabilistic Graphical Models
Let us consider an undirected graph G = (X, E), where X = {X 1 , . . ., X N } is the set of nodes, and E is the set of undirected edges.Then, two nodes X i and X j are joined by an edge if e ij ∈ E. We denote the set of neighbors X i by ∂( Let us define a probabilistic graphical model U = (G, Ψ) where Ψ = {Φ 1 , . . ., Φ K } is the set of all K potentials and X is the set of random variables.In this work we focus on pairwise models, i.e., all potentials consist of two variables at most, so that the joint distribution factorizes according to where each edge is only considerd once, i.e., e ij = e ji .We consider the following two problems: (i) evaluation of the partition function Z, which is the normalization function of the joint distribution.(ii) obtaining the marginal distribution where X m ⊂ X may be any set of RVs.Evaluating the partition function is equivalent to minimizing the free energy where min F = F * = − ln Z [43].Note that both problems considered are in fact equivalent as F obtains its minimum precisely for the marginal distribution, but are intractable in general [4].
Relaxing the problem by only approximating the marginal distribution and the partition function admits an elegant iterative algorithm.This method was discovered multiple times in different fields and is known as belief propagation (BP) in computer science, the sum-product algorithm in information theory, and the cavity-or the Bethe-method in physics (cf.Sec.2.3); we refer the reader to [18], [22] for a good overview.The observation that fixed points of BP are in a one-to-one correspondence with stationary points of the Bethe free energy (cf.[43]) paved the way for a better understanding and alternative approaches that minimize the Bethe free energy directly (cf.Sec.2.4).

MODEL SPECIFICATION
In this work we focus on Ising models, i.e., binary pairwise models where every random variable X i takes values from S = {−1, +1}.It is often more convenient to work with the mean (or magnetization) and the correlation instead of considering the singleton marginals P Xi (x i ) and the pairwise marginals P Xi,Xj (x i , x j ) explicitly.Let us define couplings J ij ∈ R that are assigned to each edge e ij ∈ E and local fields θ i ∈ R that act on each variable X i ∈ X.These parameters define the pairwise potentials Φ Xi,Xj (x i , x j ) = exp(J ij x i x j ) and the local potentials Φ Xi (x i ) = exp(θ i x i ).The corresponding joint distribution from ( 5) is consequently given by We distinguish two different types of interactions between random variables: if J ij is negative then the edge e ij is repulsive; if J ij is positive then the edge e ij is attractive.We call a model U attractive if it contains only attractive edges, 2 and refer to it as general model otherwise.

BELIEF PROPAGATION (BP)
BP approximates the marginals by recursively exchanging messages between random variables.The messages from X i to X j at iteration n + 1 are updated according to (6) and are normalized so that xj ∈S µ n ij (x j ) = 1.
Let µ n = {µ ij (x j ) n : e ij ∈ E, x j ∈ S} be the set of all messages at iteration n and let the mapping induced by (6) be denoted as If all successive messages remain unchanged, i.e., µ n+1 = µ n , then BP is converged to a fixed point µ • .We further write where BP • performs BP until convergence.The singleton marginals P Xi and pairwise marginals P Xi,Xj are subsequently approximated by where Z i , Z ij ∈ R * + guarantee that all probabilities sum to one.These approximations further constitutes the pseudomarginals The performance of BP does not only depend on G [37] but on the potentials Ψ as well [14], [23].On Ising models BP converges to a unique stable fixed point if the couplings J ij are weak (relative to θ i ).For attractive models with strong couplings multiple solutions exist and BP converges to one of them.For general models that contain strong couplings multiple solutions may exist and BP does not converge (even if the solution is unique) [15], [23].Note that these differences in behavior coincide with different phases in statistical mechanics (cf.[22], [33], [45]) If BP fails to converge and the messages oscillate, one can try to achieve convergence by either changing the update-rule [6], [16], [32], or by replacing the messages with a convex combination of the last messages [24].The latter method is known as damping (BP D ) where a damping parameter ∈ [0, 1) specifies the new update rule

THE BETHE APPROXIMATION & RELATED WORK
The Bethe free energy F B ( PX B ) := E B ( PX B ) − S B ( PX B ) is obtained by only considering the pseudomarginals (cf.(11)) where the energy E B ( PX B ) and the entropy S B ( PX B ) are defined by More specifically the Bethe free energy is given by where every stable fixed point P • X B corresponds to a local minimum F B • ; the converse, however, need not be the case, i.e., not every local minimum of F B corresponds to a stable fixed point [36].This correspondence between BP and F B led to a better understanding of BP and inspired plenty methods that minimize F B directly [41], [44].The minimization, however, is still highly non-trivial and requires good approximation methods in practice.Strong pairwise potentials reduce the accuracy of the Bethe approximation and are responsible for its non-convexity [14].One can therefore correct the entropy term ( 14) by accounting for the strong potentials; this admits convex relaxations that provide provable convergent message passing algorithms [7], [9], [20], [21], [34].There is, however, a trade-off between convergenceproperties and accuracy in general and the Bethe approximation often provides accurate results, if it can be minimized, and outperforms convex free energy approximations [21], [39].Thus, it is a relevant problem to directly approximate F B in a way that allows for efficient minimization.Polynomial runtime algorithms exist that approximate F B for restricted models: these include sparsity constraints [28] or require attractive models [39].If both properties are fulfilled, i.e., for locally tree-like attractive models the Bethe approximation is exact and can be optimized efficiently [5].Note that F B provides an upper bound on F for attractive models [27], [42].
We aim to efficiently approximate F B similar as in [39]: their approximation can be made -accurate; this, however, comes at the cost of giving up runtime guarantees for general models.Our work, on the contrary, provides an approximation in constant runtime (cf.Theorem 3 in Sec.5); the approximation error, however, can not be made arbitrarily small for general models.Both methods overcome their respective disadvantages when restricting the models; i.e., both methods do efficiently minimize the Bethe approximation for attractive models.

SELF-GUIDED BELIEF PROPAGATION (SBP)
We start by an intuitive justification of the proposed method and subsequently introduce SBP in detail.We further present practical considerations and pseudocode of SBP.We provide a formal treatment of the properties of SBP in Sec. 5.
The current understanding of BP is that strong (pairwise) potentials negatively influence BP.The overall number of iterations can be reduced by incorporating the potentials slowly [2].However, inspired by the recent observation that strong local potentials increase accuracy and lead to better convergence properties [15], we rather aim to only reduce the influence of the pairwise potentials that negatively influence BP.It is indeed worth considering whether an accurate fixed point emerges if we start from a simple model with independent random variables and slowly increase the potentials' strength [11].SBP achieves this by homotopy continuation, i.e., it solves the simple problem first and -by repetitive application of BP, keeps track of the fixed point as the interaction strength is increased by a scaling term.We present pseudocode of the algorithm in Sec.
) and the pairwise potentials at index m are exponentially scaled by The initialization determines the performance of BP if multiple fixed points exist; SBP always provides a favorable initialization by the preceding fixed point and performs the composite function .
This may lead to problems if the fixed point becomes unstable for some value m < M .If the messages start to oscillate and BP does not converge within a pre-specified number of iterations it cannot be used to keep track of the fixed point.
Instead, SBP provides the last stable fixed point in that case, i.e., µ . In other words SBP is an iterative algorithm that either provides a stationary point F B • , or an approximation thereof, if F B • is not stable with respect to BP.First, SBP relaxes the problem until all random variables are independent and the Bethe approximation is exact.Then, the problem is deformed into the original one by increasing ζ from zero to one.Consequently, F B is deformed such that the stationary point F B • emerges as a wellbehaved path (cf.Prop. 1 in Sec. 5).SBP keeps track of this solution with BP constantly correcting the stationary point.We illustrate how SBP keeps track of a fixed point in Fig. 1 for a problem where BP does not converge.Initially SBP obtains the pseudomarginals for ζ = 0 by running BP on the simple model.Then, SBP estimates the pseudomarginals of the desired problem by successively increasing ζ and running BP.Indeed, a smooth solution path emerges and SBP is capable of tracking it.Note that the fixed point becomes unstable for ζ > 0.7; SBP stops and provides the last stable solution as an approximation.Note that the approximated marginals are already close to the exact ones in this example; experiments show that this is often the case (cf.Sec. 4).

Practical considerations
In practice the runtime of SBP is influenced by the difference between two successive fixed points µ • [m] and µ  difference is primarily determined by the number of steps M .Ideally M should be as large as possible.This, however, increases the runtime (cf.Theorem 3); in practice we would choose M as small as possible but as large as necessary.Moreover, one can adaptively increase the step size if two successive fixed points are close, i.e., if µ [29, pp.23], [1]).Our experiments show that it is sufficient to use rather coarse steps; we used M ≤ 10 for all reported experiments.
Additionally, instead of initializing BP [m] with its preceding fixed point messages, i.e., µ 0 to reduce the overall number of iterations.We empirically observed that the benefit diminishes for k > 3.

Pseudocode
Pseudocode of SBP is presented in Algorithm 1.The maximum number of iterations for BP is given by N BP = 10 3 .We randomly initialize µ 0 and either use fixed step size or adaptive step size (adaptive stepsize = 1).The sequences of messages is contained in {µ Cubic spline extrapolation is applied in ExtrapolateMsg to estimate the initial messages of the subsequent model.
We further present the pseudocode for the adaptive step size controller in Algorithm 2.

EXPERIMENTS
We apply SBP to n × n grid graphs of different size, complete graphs with N = 10 random variables, and random graphs with N = 10 random variables and with an average degree of |∂(X i )| = 3.We consider attractive (Sec.4.2) and general (Sec.4.3) models for each of these graphs.Experiments were performed for these graphs in order to make the results comparable to previous work [21], [30], [31], [40].

Experimental settings
SBP is evaluated and compared to BP, BP D (BP with damping), and Gibbs sampling.The accuracy is evaluated by the mean squared error (MSE) between the approximate marginals PXi and the exact marginals P Xi , where the exact marginals P Xi are obtained by the Junction Tree algorithm [19].For binary random variables we can apply symmetry properties so that  where we approximate F B * by [39].We further compare the runtime of all methods by counting the overall number of BP iterations and the number of iterations for Gibbs sampling. 3We consider L = 100 models with random potentials for every experiment.The initial messages are randomly initialized 100 times for each of these L models, before applying BP with and without damping.We consider BP (and BP D ) as converged for a model if at least a single message initialization (out of 100) exists 3. Computing the acceptance-probability requires similar runtime as one BP message update for which BP converges.We report the convergence ratio, i.e., the number of experiments (or probabilistic graphical models) for which BP converged divided by the overall number of experiments L. SBP, on the other hand, allows to obtain an approximation of the terminal fixed point in case that this fixed point is unstable, which prevents BP and BP D from converging.
The reported error (MSE) and the number of iterations are averaged over all convergent runs of BP and BP D (i.e., BP • and BP • D ) while all runs that did not converge are discarded.On the contrary, we average the error and the number of iterations over all L models for SBP (SBP all ), Gibbs sampling (Gibbs all ), and for minimization of the Bethe approximation (F B * all ).For BP and SBP we set the maximum number of iterations to N BP = 10 3 and use random scheduling.For BP D we choose a large damping factor = 0.9 to account for the strong couplings and therefore increase the maximum number of iterations to N BP = 104 .Such a large damping factor helps to prioritize convergence over runtime -this admits comparison of marginal accuracy for a wide range of models.Carefully selecting a damping factor that depends on a given model may reduce the number of iterations until convergence but cannot not increase the accuracy; moreover, if chosen too small BP D may fail to converge at all.The accuracy of SBP is only marginally affected by its parameters and we use the following parameters for all experiments: M ≤ 10 , adaptive step size, and cubic spline extrapolation.Gibbs sampling is run for 10 5 iterations.

Attractive models
We consider grid graphs with N = 100 random variables (10 × 10), random graphs with N = 10 random variables, and complete graphs with N = 10 random variables.We generate L = 100 models for every value of β ∈ {0, 0.5, . . ., 5} and sample the potentials according to θ i ∼ U(−0.5, 0.5) and J ij ∼ U(0, β); i.e., overall we consider 1100 different parametrizations for each individual graph-structure.Note that BP is initialized 100 times for every considered model.We compute the MSE for every value of β and visualize the mean and the standard deviation of the MSE 4 as well as the number of iterations in Fig. 3a.Note that BP (magenta) converges rapidly for all graphs considered; hence, there is no additional benefit for BP D (green) that only increases the number of iterations.SBP (blue) only slightly increases the number of iterations as compared to BP and converges in fewer iterations than BP D .Note that SBP is guaranteed to capture the global optimum if all local potentials are unidirectional (cf. .But even if we do allow for random local potentials, we empirically observe that SBP consistently outperforms BP with respect to accuracy.This becomes especially evident for models with strong couplings: These models exhibit multiple stable fixed points [14] such that, depending on the initialization, BP often converges to inaccurate fixed points.

General models
General models traditionally pose problems for BP and other methods that aim to minimize the Bethe approximation.
First, in order to evaluate the performance of SBP we consider θ i = θ ∈ {0, 0.1, 0.4} and draw the couplings with equal probability from J ij ∈ {−1, 1}; the results are summarized in Tab. 1.Although BP and BP D fail to converge for most models we observe that SBP stops after only a few iterations and significantly outperforms BP in terms of accuracy.In fact, SBP achieves accuracy competitive with Gibbs sampling but requires three orders of magnitude fewer iterations.
Second, we further apply SBP to general graphs and evaluate whether SBP provides a good approximation of the pseudomarginals P * X B that correspond to the global minimum of the Bethe free energy F B * .Therefore we consider grid graphs (of size 5 × 5), which still allows us to approximate F B * -and the related pseudomarginals P * X B -reasonable well by [39].The results are summarized in Tab. 1 and show that SBP approximates P * X B within the accuracy of our reference method (MSE B ).We further report the number of times where SBP obtains the terminal fixed point, i.e., for U M , in Tab. 1 F B • (ζ M ) equals SBP .It becomes obvious that SBP approximates the terminal fixed point reasonably well, despite frequently stopping for ζ m < 1.Moreover, looking at the MSE reveals that SBP does not only approximate the pseudomarginals P * X B well, but concurrently provides an accurate approximation of the exact marginals P X B .
Third, we investigate how the approximation quality depends on the scaling parameter ζ m .Therefore, we depict the evolution of the MSE (to the exact solution) and MSE B (to the approximate solution) in Fig. 2. We observe that MSE B (blue) decreases monotonically with every iteration, which empirically verifies that SBP proceeds along a well-behaved solution path (cf.Prop.1).Note that MSE B decreases rapidly in the first iterations and SBP spends a major part of the overall runtime for slight improvements.The MSE to the exact solution, on the other hand, decreases first until it increases again as SBP incorporates stronger couplings.Stronger couplings tend to degrade the quality of the Bethe approximation in loopy graphs and lead to marginals that are increasingly biased towards one state [14], [37].This explains why the MSE to the exact solution increases as SBP converges towards the terminal fixed point.One could exploit this behavior and restrict the runtime by stopping SBP after consumption of a fixed iteration budget; this may even increase the accuracy with respect to the exact solution.
Finally, we aim to investigate the influence of the coupling strength: therefore we consider θ i ∼ U(−0.5, 0.5) and J ij ∼ U(−β, β) .For every β ∈ [0, 5] we execute L = 100 experiments and present the averaged results in Fig. 3b.Note that we restrict the results to β ≤ 2 on the grid graph because BP did only converge sporadically for models with stronger couplings.SBP requires only slightly more iterations than BP and fewer than BP D , even though we compare only to models where BP (or BP D ) converged.The benefits of SBP become increasingly evident as the coupling strength increases.Again SBP (blue) significantly outperforms BP • (magenta) and BP • D (green) on all graphs with respect to accuracy.

THEORETICAL PROPERTIES
Here we present some more formal arguments and discuss the properties of SBP to understand under which conditions the algorithm (presented in Sec. 3) can be expected to perform well.We refer to Sec. 6 for the proofs and only present the most important Theorems as well as their implications below.9)- (10).In particular, we refer to the start-and end-point by P • X B (ζ = 0) and P • X B (ζ = 1) respectively.The following example in Fig. 4 illustrates the solution set of a grid graph with attractive edges.This example exhibits a unique solution path according to our definition; note, however, that a second curve exists, which lacks a start point and is therefore of no relevance for any method that proceeds along a solution path defined by the homotopy in (19).

Properties of SBP
The following proposition summarizes the main properties of the solution path that is specified and followed by SBP.
Proposition 1 (Properties for attractive and general models).
Theorem 2 (Prop.1.2).Let P • X B (ζ) be the pseudomarginals that are uniquely defined along the solution path that originates from is stable.This is an immediate consequence of the fact that the convergence properties can only degrade along a given solution path [15].
Prop. 1 is of fundamental importance, but does not relate to the accuracy of the obtained stationary point.Assessing the quality of the Bethe approximation and the accuracy of BP for general models is still an open research question that is beyond the scope of this work.However, we further present Prop. 2 to discuss the accuracy of the obtained solution for attractive models.
Proposition 2 (Properties for attractive models with unidirectional fields).The solution path c(ζ) leads towards an accurate solution with P We start by generalizing Griffiths' inequality [8] to the fixed points of BP (Lemma 4) and subsequently provide Theorem 5-6 that discuss the accuracy of the fixed point obtained by SBP.Lemma 4. Consider two attractive probabilistic graphical models U 0 and U 1 with equal G and with all potentials specified by θ i > 0 and by J ij,0 and J ij,1 , where J ij,0 < J ij,1 for all e ij ∈ E. Let us consider a fixed point of BP with positive mean m . Consider an attractive model with θ i > 0. 5 Then, m • i (ζ) increases monotonically along the solution path c(ζ); in particular SBP minimizes the Bethe approximation error and is optimal with respect to marginal accuracy, i.e., Theorem 6 (Prop.2).Consider an attractive model with θ i = 0.Then, SBP obtains the exact solution P To conclude, for attractive models with θ i ≤ 0 or θ i ≥ 0, SBP either obtains the fixed point that corresponds to the global minimum F B * (Theorem 5), or it obtains the fixed point that corresponds to the exact solution, i.e., to F * (Theorem 6).For general models m • i need not increase monotonically along the solution path and it is not obvious whether SBP obtains the most accurate fixed point.However, the experimental results in Sec.4.3 at least corroborate Prop.1.3 that SBP does often converge to accurate fixed points.

DERIVATIONS OF SECTION 5
This Section contains all the detailed proofs for Sec. 5.

Proofs for Proposition 1
Proof [of Theorem 1]: First, let us obtain the singleton marginals PXi (x i ) = xj ∈S PXi,Xj (x i , x j ).For Φ Xi,Xj (x i , x j ) [1] = 1 it follows that marginalizing over (10) equates to PXi (x i ) = Φ Xi (x i ) 5. Note that equal results can be obtained for θ i < 0 because of symmetry properties.
Note that according to (6) µ • ji (x i ) is equivalent to the second line of ( 22) so that PXi (x i ) = Φ Xi (x i ) which equals (9).It follows that PXi (+1) = e θi /(e θi + e −θi ) = P Xi (+1).Proof [of Theorem 2]: First we show that the Bethe free energy F B (ζ) itself is an analytic function.Consider (15) with the pairwise potentials defined by (17).Then, the derivative with respect to ζ is given by PXi,Xj (x i , x j ) ln Φ Xi,Xj (x i , x j ) As an immediate consequence we observe that F B (ζ) is continuously differentiable 6 as ( 24) is a finite sum over finite terms. 7 We specifically consider Further note that the set of stationary points is finite [36] 8 and that pitchfork bifurcations may only occur if θ i = 0 [14], [26], in which case SBP obtains the exact solution (cf.Theorem 5).
Proof [of Theorem 3]: SBP increases ζ m as long as BP converges in less than N BP iterations, and stops otherwise.Consequently, BP corrects the accuracy of the fixed point for each value ζ m within a bounded number of iterations.The runtime of SBP is further determined by the choice of M , i.e., the step-size (cf.Sec.3.1).Assume that SBP converges for ζ m , then it does so in O(M • N BP ).
First, we show that for all e ij ∈ E 6. Strictly speaking F B (ζ) is an analytic function.7.This is in accordance with the fact that true phase transitions (singularities in the derivative of the free energy) can occur only in the thermodynamic limit, where (24) is an infinite sum that equates to infinity.
8. This is also required for the one-step replica symmetry breaking assumption [22,Sec.19].

3 . 2 .
More formally SBP considers an increasing length-M sequence {ζ m } where m = 1, . . ., M such that ζ m < ζ m+1 and ζ m ∈ [0, 1] with ζ 1 = 0 and ζ M = 1.This further indexes a sequence of probabilistic graphical models {U m } that converges to the model of interest U M = U.We further denote the fixed points of BP for U m by µ • [m] .Every probabilistic graphical model has a set of potentials

Fig. 1 :Fig. 2 :
Fig. 1: Illustrative example: SBP proceeds along the smooth solution path and obtains accurate marginals despite instability of the terminal fixed point.

First, we fix
our notation: we denote the pseudomarginals of U m by P • X B[m] = P • X B (ζ m ), and, with slight abuse of notation, we refer to the corresponding stationary point of the Bethe free energy by F B • (ζ m ) = F B ( P • X B (ζ m )).It is beneficial to study the behavior of SBP as M tends towards infinity.Therefore we consider the unit interval ζ ∈ [0, 1] to be the compact support of the functions F B (ζ) and PX B (ζ). SBP is inspired by the idea to proceed along a so-called solution path as ζ increases from zero to one in order to obtain the marginal distributions for the model of interest.Therefore, we shall consider a continuous homotopy function H(µ, ζ) : R |µ|+1 → R |µ| that is defined by H(µ, ζ) = µ − BP(µ) where Ψ = Ψ(ζ).
F B • (ζ) that emerges from F B • (ζ = 0) = F B * (ζ = 0); this start point is unique by Theorem 1.It follows by (24) that F B • (ζ) varies in a continuous fashion along the unique solution path for ζ ∈ [0, 1] .Further note that stationary points are in a one-to-one correspondence with fixed points of BP which completes the proof.

TABLE 1 :
RESULTS FOR GENERAL MODELS WITH J ij ∈ {−1, 1} ON GRID GRAPHS (N = 25 AND N = 100), COMPLETE GRAPHS (N = 10), AND RANDOM GRAPHS (N = 10).WE REPORT THE MSE TO THE EXACT MARGINALS AND THE MSE B TO THE BETHE APPROXIMATION, CONVERGENCE RATIO, AND THE OVERALL NUMBER OF BP ITERATIONS.ONLY CONVERGED RUNS ARE CONSIDERED FOR BP • AND BP • D BUT ALL RUNS ARE CONSIDERED FOR SBP all , GIBBS all , AND F B * all .
and the associated stationary point F B • (ζ) are continuous on their compact support ζ ∈ [0, 1].Theorem 3 (Prop.1.3).There exists some ζ m ≤ 1 so that SBP converges to P• X B (ζ m ) ∈ L in O(M N BP ).SBP is consequently capable of efficiently tracking the fixed point that emerges as ζ increases and requires M N BP iterations at most.SBP may, however, only converge to a surrogate model for ζ m < 1 and is not guaranteed to obtain the pseudomarginals of the desired problem.One can characterize this error by computing a bound on |F B [12, m ) − F B • (ζ M )| given the difference between Ψ [m] and Ψ [M ] (cf.[12,Th.16]).Corollary 3.1.SBP obtains the pseudomarginals of the desired problem if and only if the endpoint P