New Branching Filters With Explicit Negative Dependence

Particle filters are used to solve nonlinear filtering problems. We focus on the sampling step of a particle filter and present new algorithms that introduce explicit negative dependence between the number of particles reassigned at each location, with the goal of improving the performance of the filtering algorithm. We review partial and complete sampling in the context of both interacting and branching filters, that is, when the number of particles stays constant through all steps and when it does not. In particular, we use the quick simulation field algorithm to reproduce the variance structure induced by the minimal variance filter and create a new filtering algorithm. A numerical example is used to assess the performance of the new algorithms.


I. INTRODUCTION
Noisy, stochastic phenomena are ubiquitous in nature, and attempts to understand such phenomena are diverse and span disciplines. Yet, the task is difficult because observations are often incomplete with abundant random noise. Nonlinear filtering is one popular way of modeling such phenomena that accounts for both of these factors. Often, we represent an original signal by a sequence of random variables (X t ) t≥0 , and its observations by (Y t ) t≥0 , which are modeled as some function of (X t ) t≥0 , corrupted by noise: where V t is a noise term. The goal is to obtain an approximation for the best least-squares estimator, E[f (X t ) | Y t , . . . , Y 1 ], where f is a measurable, bounded function. Sequential Monte Carlo (SMC) methods are an important class of algorithms that achieve this goal, relying heavily on a sampling step in the process. This step is the focus of the present article.
Nonlinear filtering has a number of applications in different domains. Since the 60's, it has been a staple of the defense, search and rescue, and aerospace industries. Later, it also became an essential tool in image processing. In mathematical finance, it is widely used to calibrate models with unobservable factors or variables (volatilities, credit risk, The associate editor coordinating the review of this manuscript and approving it for publication was Liang Hu . instantaneous interest rates) using only observable quantities such as asset prices (see for example [14], [17], [18], [23]). Nonlinear filtering also finds applications in fields as diverse as biology [31], aeronautics [39] and epidemiology [13].

A. MOTIVATION
We adopt the treatment of nonlinear filtering in [11] and [20]. Let ( , F, P) be a probability space. Let a non-observable signal be represented by (X t ) t∈N : → E, where E is some complete, separable metric space. In this case, t could represent discrete time steps. Typically, it is assumed that X is a Markov process with a specified initial distribution p(x 0 ) and evolution equation that, given X t , returns X t+1 up to some noise term. Our indirect observations of the non-observable signal are of the form where h : E → R d is a known, measurable function, and the V t are independent, random vectors with a common, strictly positive, bounded density g that is independent of (X t ) t≥0 . For simplicity, we also define Y t := (Y 1 , . . . , Y t ), X t := (X 1 , . . . , X t ).
Given this setting, we try to estimate, track and predict the signal based on distorted, corrupted partial observations. In order to recall the concepts underlying particle filtering, both the signal and the observations will be seen as random variables; the values of the observations will only be fixed when algorithms serving to compute estimates are presented. Throughout the paper, random variables are denoted by uppercase letters, while we use lowercase letters to represent realizations thereof.
Our goal is to develop an estimator for the conditional probabilities P(X t ∈ A|Y t ) for all Borel sets A, or equivalently, for the conditional expectations E[f (X t )|Y t ], the best least-squares approximation of f (X t ) given all observations thus far, for all f : E → R that are bounded, measurable functions. Clearly, it would be convenient if we could directly and easily sample from the posterior distribution p(X t |Y t ), but the computational complexity of such a method is usually too great [11].
Sequential Monte Carlo (SMC) methods comprise an important class of algorithms that approximate the conditional probabilities P(X t ∈ A | Y t ), or equivalently, E[f (X t )|Y t ]. This is done by sequentially incorporating the observations Y t in the computation of the filters via the Bayes formula and importance sampling. The system of simulated particles that approximates the conditional distribution of the signal is updated with each new observation (see for example Chapter 10 of [1] for more details on particle filters).
In our framework, the importance distribution used to generate the particle system is a probability measure Q, under which we assume that the signal and observation process {(X t , Y t ), t = 0, 1, . . .} has the same distribution as the signal and noise process {(X t , V t ), t = 0, 1, . . .} does under P. It follows that under Q, the importance density of the observations is g and the observations and the signal are independent. The information given by the observations is incorporated into the likelihood process {L t , t = 0, 1, . . .} defined by for t = 1, 2, . . . and L 0 = 1, so that L t = L t−1 α t (X t−1 ). The likelihood process can be used to obtain the probability measure P from Q using Girsanov's theorem. It follows that the unnormalized filters of interest are given by E Q [L t f (X t )|Y t ], where E Q [·] denotes the expectation under the Q measure, which is approximated using the generated particle system. Each time step of a SMC algorithm is comprised of two parts. The first one is the mutation step, where the particles are evolved using the transition density under the importance distribution Q; this allows for the computation of the approximated filter. After a new observation is incorporated, the second step involves sampling from the empirical distribution of the particles in order to avoid weight degeneracy, which can negatively impact the performance of the algorithm.
The focus of this article is the aforementioned sampling step, where low-weight particles are eliminated and replaced by average weight ones. This step is necessary because the variance of the weights increases over time. In practice, as the particle system develops, there tend to be a few particles with very high weights and a lot of particles with very low weights. This leads to either wasted computational power, when many low-weight particles are propagated, or to poor estimates of the conditional expectation.
The sampling step is an important factor in the speed of SMC algorithms, and it can easily become performancelimiting when it is poorly conceived. Aside from the actual number of operations in the algorithm itself, a sampling method can also affect performance by influencing the number of particles propagated (this is the case in so-called branching particle filters). If the number of particles in the system grows too large, then more operations will have to be performed.
An ideal sampling method should even weights without introducing excess noise nor computations. Various existing sampling algorithms seek to strike such a balance between variance reduction and execution speed. The new algorithms introduced in this article were built with such a goal in mind.

B. PREVIOUS WORK
The bootstrap particle filter [15] was an important step in the development of fast SMC methods. Numerous improvements were made over the years to the sampling step of the bootstrap filter. Residual sampling is introduced in [29] to reduce sampling noise and execution time. Stratified sampling is introduced in [19] to reduce the variance of the uniform random variables involved in sampling. Combined sampling, a combination of residual and stratified sampling, is discussed in [10]. The minimal variance algorithm, which we discuss further in this work, is presented in [4]. All these sampling methods keep the number of particles in the system constant throughout the time steps. In this work, we will refer to algorithms based on such sampling methods as interacting filters. See [5] for a survey for convergence results and [8] for a recent exposition on interacting particle filters.
Other sampling algorithms return a random number of particles, and when they are incorporated in an SMC algorithm, the number of particles can vary at each time step. Such algorithms are often referred to as branching filters. Previous work on branching particle filters includes [2], [6], [9], [20]. Contrary to beliefs about particle instability (i.e., particle numbers either exploding or going to zero), [20] shows that a branching particle filter can be stable if the correct normalizing constant is used. Still, the stability in particle numbers, and indirectly the performance of the algorithm, is affected by the variance of the number of branches (or offspring) assigned to each particle in the sampling step, and by the variance of the total number of particles at each step. Adding negative VOLUME 8, 2020 dependence between the number of offspring assigned to each particle can help reduce this latter variance.
Intuitively, two random variables that are negatively dependent tend to move in opposite directions; if one takes a small value, the probability that the other one takes a large value is increased. This idea is closely related to negative lower orthant dependence, which occurs when two random variables X 1 and X 2 satisfy P( for all x 1 , x 2 . An imperfect but widely used measure of dependence is the coefficient of correlation, which we use later in this article. For more details on negative dependence, we refer the reader to Chapter 5 of [32]. In our context, inducing negative dependence between the number of offspring at adjacent locations ensures that the particles are more evenly distributed among all locations. To optimize this effect, it is desirable to maximize the negative correlation induced between the different locations. The concept of maximal negative dependence is well studied in two dimensions; a pair of countermonotonic random variable attains such a maximum. We say that two random variables (X 1 , X 2 ) are countermonotonic if there exist a random variable Z and real functions f and g, with f increasing and g decreasing, such that (X 1 , X 2 ) has the same distribution as (f (Z ), g(Z )) (see for example [33]). The concept of extreme negative dependence is not clearly defined for vectors of dimension higher than 2. In this general case, a type of extreme negative dependence structure for random vectors is joint mixability. We say that a random vector X = (X 1 , . . . , X d ) is jointly mixable (see [38] In this article, we exploit both concepts to identify and induce negative dependence between the number of offspring redistributed to each particle. In [20], negative dependence was produced by stratification using the Yates-Fisher shuffle or partial shuffle consisting of some exchanges. A key step in the new algorithms presented in this work is the generation of correlated discrete random fields. Indeed, we wish to generate a sequence of random variables with specified marginal probability mass functions and a given covariance matrix. The fact that our random variables must be discrete precludes the use of the variety of popular methods to generate Gaussian random variables (for example, [30], [34], [35], [37]). The methods that best fulfilled our criteria were the quick simulation fields (QSF) method of [26] and the list sequential sampling of [3]. These sampling methods are incorporated in specific branching algorithms in order to speed them up while keeping some control on the variance of the total number of particles in the system.

C. UNIQUE CONTRIBUTIONS
Many of the refinements to the bootstrap algorithm involve the use of residues, the use of negative correlation in the sampling step or the use of partial sampling. All classes of improvements reduce sampling noise and improve performance. However, the best ways to implement the negative dependence and partial sampling are unknown. In this work, we create new sampling algorithms by considering novel methods to induce negative dependence between the number of particles reassigned to each particle location in the sampling step. We also show that the use of partial sampling can be implemented in the new filters, as well as in existing interacting and branching filters, for performance improvement. This is explored in an extensive numerical experiment, through which we also show that our branching algorithms maintain respectable particle control.
Although our work is somewhat similar to [10] and [27], those two works do not consider branching filters, nor QSF and the list sequential method of [3] as ways to generate negatively dependent particles.
The paper is divided as follows. In Section II, we compare complete and partial sampling, and discuss the implementation of interacting and branching algorithms in each context. Section III presents a review of existing methods as further motivation for our new algorithms, which are also outlined in this section. Numerical experiments are presented in Section IV, and concluding remarks are in Section V.

II. COMPLETE AND PARTIAL SAMPLING
We denote by N t ∈ N the number of particles in the filter at the end of the t-th time step, after sampling; N t may vary across time steps, for example when branching filters are used (this is further explained below). Let (x i,t ) , the transition density of X , generated at the beginning of the t-th time step. At t = 0, we consider ( where each x i,0 is sampled from p 0 .
Each particle is assigned a likelihood i,t that weighs the particle based on how well it approximates the original signal according to the observations. This likelihood is given by where g is the common, strictly positive, bounded density of V t . The sampling step in particle filters is often necessary to avoid weight degeneracy, that is, to avoid ending up with a few particles having extremely high weight compared to the rest. In most common sampling methods, all particles are redistributed (interacting filters) or branched (branching filters) randomly. The probability that a particle appears in the new sample is proportional to its weight i,t .
Remark 1: Technically, we consider one-step predictor filters by incorporating the observation equation Y t = h (X t−1 ) + V t instead of the more commonly used tracking filter observation model Y t = h (X t ) + V t . Our setting describes the important situation where one must estimate the signal prior to receiving the current observation compared to the usual assumption that you may use the current observation in the signal estimate. All the algorithms and analysis given herein transfer seamlessly to the more common tracking filter setting by simply replacing α t (X t−1 ) with α t (X t ) and α t (x i,t−1 ) with α t (x i,t ) everywhere. The algorithms would stay the same except one has to evolve the particles prior to calculating the weights, i.e. switch steps 5 and 6 in Algorithm 1, to handle this α t argument change. There are two reasons why the one-step predictor was considered: i) It is a very important setting; ii) It is consistent with the pathspace convergence and other empirical results in [21] and [20] respectively. The pathspace convergence results obtained in [21] would have been, at least notationally, more difficult if the tracking observation model had been used.
In the rest of this section, we compare the situation where all particles are resampled at each time step to the one where only a subset of the particles is considered for sampling.

A. COMPLETE SAMPLING
We use the term complete sampling to refer to algorithms in which all the particles are resampled. That is, the sampling set at time t, denoted by C t , is {1, . . . , N t−1 }. We further define the normalized weights associated with (x i,t−1 ) for i ∈ C t , so that i∈C t a i,t = 1.
Complete sampling can be performed via interacting or branching algorithms. In this work, interacting sampling procedures refer to algorithms that redistribute the same number of particles as there were in the original sample, such as the multinomial or stratified sampling algorithms of [10] or the minimal variance algorithm of [4]. Since the same type of sampler is used throughout the time steps, the number of particles used in the filter remains constant, that is, N t = N 0 for all t.
We call branching samplers the algorithms in which each particle can be split into a random number of offspring, which are then used as the starting point for the next time step, without a guarantee that the number of particles remains constant across all time steps. In complete sampling algorithms, the average number of offsprings that each particle has is proportional to its normalized weight a i,t . The main difference between interacting and branching samplers is the constant particle requirement for interacting algorithms and the flexibility in particle numbers for branching samplers.
Filters that make use of branching samplers (herein called branching filters) have been criticized for particle instability. However, by forcing the expected number of particles to be equal to the initial number of particles N 0 at each time step, the algorithms of [20] show increased particle stability. In this work, the branching samplers we propose are based on the same idea, but we add explicit negative dependence between the branching decisions from one particle to the other to further control the variation in the number of particles.

B. PARTIAL SAMPLING
As stated previously, it may not be necessary to sample all particles at every time step. Indeed, results in [20] show that it may be advantageous to leave alone particles whose weights are neither too big nor too small as we avoid introducing excess sampling noise. If a weight is too small, we would like to eliminate that particle; if the weight is large, that particle should likely have multiple offspring. All of this must be done without introducing bias. There are other possible partial sampling schemes [12], but here, for simplicity, we use the following one from [20].
We define the sampling set C t as for some r ≥ 1 fixed and wherē In other words, we resample the particles whose likelihood is smaller than r −1¯ t or larger than r¯ t . That is, if its likelihood falls outside of a given interval around the N 0 -average¯ t , it gets resampled. Using the N 0 -average rather than dividing the sum of the likelihoods by N t−1 helps to keep the number of particles stable.
Interacting and branching samplers are applied differently to the partial sampling set C t . Indeed, interacting samplers use the normalized weights associated with the sampling set as defined by (1) and randomly redistribute the number of particles in C t to the locations of the resampled particles. The normalized weights are such that i∈C t a i,t = 1, that is, the sum of the normalized weights associated to the resampled particles should sum to 1. We also observe that the normalized weights used in interacting sampler algorithms depend on C t . In particular, the normalized weight a i,t for a given particle will change if complete, rather than partial, sampling is used (given that the sampling set is affected), or if the parameter r is modified.
In contrast, even when only a subset of particles are branched, branching samplers consider the weight of the particle as a proportion of the sum of the weights of all particles, resampled or not. To each resampled particle, branching samplers assign a number of offspring based on the ratio i,t /¯ t . This is true whether the sampling set contains all the particles or not; the composition of the sampling set does not change the distribution of the number of offspring. In other words, since the number of offsprings assigned to each particle is proportional to the weights normalized by the sum of all the particles' weights, not only those in the sampling set. This subtle but important difference between interacting and branching samplers affect their implementation in the general filtering procedure. We conclude this section with VOLUME 8, 2020 two important remarks that summarize the differences and similarities between the different types of algorithms.
Remark 2: In all cases, except for partial branching samplers, the weights normalized over the sampling set, (a i ) i∈C t , are used for sampling. This is indeed also the case in complete branching samplers; when the sampling set contains all the particles, we observe that for all i ∈ {1, . . . , N t−1 }, This similarity between interacting and branching complete samplers is further explained in Section III. Remark 3: For both interacting and branching algorithms, complete sampling can be seen as a special case of partial sampling (using r = 1), so one only needs to implement the partial sampling algorithm while ensuring that setting r to 1 is possible. However, our numerical results will show that complete sampling is rarely the best choice so r > 1 should generally be used.

C. IMPLEMENTATION
In light of the observations above, a general partial sampling algorithm is given by Algorithm 1 (below). The first part of the algorithm (lines 2 to 15) describes the mutation step and the treatment of the non-sampled particles; it is identical for interacting or branching filters. In the algorithm, and going forward, for a ∈ R, we denote a = max{z ∈ Z : z ≤ a} and {a} = a − a .
The ''if'' statement on line 16 splits the algorithm into two. Lines 16 to 21 refer to the use of an interacting sampler. Line 18 is intentionally left vague, as it should be replaced with a specific sampler. In the next sections, we discuss sampling algorithms in further details. Interacting samplers will be presented as subroutines that take as input the particles and their weights, (x i,t ,ˆ i,t ) i∈C t and return a vector of the same length containing the sampled values.
Lines 22 to 32 are used for branching samplers. Line 23 can be replaced by one of the branching algorithms presented in the next section. Herein, branching samplers take as input a vector of probabilities ({ i,t /¯ t }) i∈C t corresponding to each of the particles in C t , and return a vector of Bernoulli random variables (ρ i,t ) i∈C t of the same length, where each ρ i,t takes the value 0 or 1, that is ρ i,t ∈ {0, 1}. The number of offspring assigned to location i t (and therefore given valuex i,t ) is given by i,t /¯ t + ρ i,t , and the weights at these locations is reset to the N 0 -average. These steps correspond to lines 26 to 29. In the next sections, we detail the branching algorithms that can replace line 23.

III. NEW SAMPLING ALGORITHMS
In its simplest form, the sampling part of the filtering procedure consists essentially in sampling a fixed number n of independent random variables from the categorical (empirical) distribution obtained in the mutation step, or a subset thereof. In this section, we focus on a single time step, so we drop the reference to time t from the notation. We also let n denote Algorithm 1 General SMC Algorithm 1: procedure SMC(N 0 , T , r) T ∈ N is the number of time steps Initialize particles 2: end for Estimate the conditional expectation 8: Determine the sampling set 10: ∈ ( 1 r¯ t , r¯ t )} Count the particles for next iteration 11: 15: end for 16: if Interacting Filter then 17: Get normalized weights (a i,t ) i∈C t 18: end if 22: if Branching Filter then 23: Create mean 24: for i ∈ C t do 25: for j = 1, . . . , M i,t do Propagate offspring 27: x N t +j,t =x i,t 28: , from which normalized weights can be obtained as in (1), so that

A. REVIEW OF (SOME) EXISTING ALGORITHMS
In this section, we focus on complete sampling (i.e. we assume that the sampling set contains all the particles), since it better highlights the common theory behind interacting and branching samplers. A discussion on the application of branching algorithms in partial sampling is provided at the end of the section.

1) MULTINOMIAL BOOTSTRAP
Multinomial bootstrap (see [15]) is arguably the simplest sampling algorithm, and refers to the simulation of n independent random variables, each one taking the value x i with probability a i , i = 1, . . . , n, so that each random variable is drawn from a categorical distribution. The result of these n independent draws can be expressed as a vector M = (M 1 , . . . , M n ), where each M i takes a value in {0, 1, . . . , n} and represents the number of times the value x i is drawn, so that n i=1 M i = n. The vector M has a multinomial distribution with parameters (n, a 1 , . . . , a n ), so the random variables M 1 , . . . , M n are clearly not independent. Indeed, it can easily be shown that In other words, the simplest multinomial bootstrap procedure results in negatively correlated numbers of offspring at any two different locations i and j.

2) REDUCING SAMPLING NOISE
To improve the performance of the particle filter, it is desirable to control the variance of two quantities (among others): • The number of particles redistributed to each each individual site, M i ; and • The total number of particles reassigned, N =: n i=1 M i . While reducing the variance of each M i also reduces the variance of N , the opposite is not necessarily true. For example, as explained above, multinomial bootstrap yields a constant total number of offspring n, thus attaining the smallest possible variance for N , but does not control the variance of each M i . We remark that in this case, the random vector M is jointly mixable, since P n i=1 M i = n = 1. As mentioned above, joint mixability is a type of extreme negative dependence.
Even when P(N = n) = 1, as in interacting filters, reducing the variance of each M i while retaining unbiasedness is desirable in order to improve the performance of the filtering procedure (see [1] for more details). Many wellknown interacting algorithms, such as residual sampling [29], stratified sampling [19], as well as combined (interacting) sampling [10] reduce the variance of the individual number of particles assigned to each site. Our implementation of these algorithms is presented for reference in the appendix.
If one focuses only on decreasing the variance of the M i 's and relaxes the constraint that N is almost surely constant, then one can attain the lowest possible variance for each M i . Indeed, in order to keep the filtering procedure free of bias, sampling must be done so that E[M i ] = na i (interacting samplers) or E[M i ] = i /¯ (branching samplers) for each i. If we define by A a the set of integer-valued random variables with expectation a ∈ R, it can be shown (see for example Exercise 9.1 of [1]) that the random variable Y ∈ A a that attains the lowest possible variance is given by Y = a , with probability 1 − {a} a + 1, with probability {a}, and has variance {a}(1 − {a}). We recall that x denotes the floor of x, that is x =: max{z ∈ Z : z ≤ x}, and It follows that the lowest possible variance for the individual number of offspring at each location i is given by , and can be attained by letting where U 1 , . . . , U n are Uniform random variables on [0, 1]. The uniform random variables do not need to be independent to achieve this lower bound.
Branching-type procedures for sampling are based on this idea and attain the lowest possible variance for each M i by allocating na i offspring with probability 1 − {na i }, and na i + 1 offspring with probability {na i } at each location i. If this operation is performed independently at each location (for example using (3) Remark 4: In our implementation of branching filters (see Algorithm 1), the number of particles observed at the beginning of a given step, N t may differ from the initial number of particles N 0 . However, we use N 0 to determine the number of offspring assigned to each particle; that is, we set If, for each i, we let M i = na i + I i , where I i has marginal Bernoulli distribution with mean {na i }, then the variance of N can be reduced by adding negative dependence between the I i 's. This is done in [20] via stratification. The resulting combined branching algorithm is presented for reference in the appendix. In Section III-C, we introduce new branching algorithms that explicitly induce negative correlation between the I i 's.
There exists a specific dependence structure for the vector I = (I i ) n i=1 (or equivalently, for the M ) that allows both individual variances of each M i and the variance of N to be minimal. It stems from the minimal variance algorithm introduced by [4] (see also [1]) and ensures that Var . . , n} and that P(N = n) = 1, VOLUME 8, 2020 so that Var(N ) = 0. 1 [1] also explain that the resulting random vector minimizes Var The minimal variance algorithm keeps track of the number of offspring left to distribute and compares it with its theoretical average, in order to assign the offspring in a way that minimizes the variance. The algorithm is implemented via embedded ''if'' statements.
Remark 5: The minimal variance algorithm induces a vector of jointly mixable Bernoulli random variables. Indeed, n i=1 na i offspring are distributed in a deterministic manner, since each location i receives na i offspring with probability 1. Since the algorithm ensures that N = n almost surely, the number of offspring that are randomly re-distributed must be equal to n − n i=1 na i . In other words, n i=1 I i = n − n i=1 na i a.s., and therefore the vector I is jointly mixable.
Our first algorithm, presented in Section III-B, is inspired by the minimal variance algorithm. The use of ''if'' statements is replaced with the Quick Simulation Field (QSF) algorithm of [26], which allows quick simulation of a joint distribution with given marginals and correlation structure.

3) SPECIAL CASE: PARTIAL SAMPLING WITH BRANCHING ALGORITHMS
The discussion above does not apply directly to the case where branching samplers are used for partial sampling, since in this case, the sampling weights i /¯ do not sum to 1. Indeed, recall from Algorithm 1 that for i ∈ C, where C denotes the sampling set, where ρ i is a Bernoulli with mean { i /¯ }. It follows that the expected value of the total number of offspring assigned to particles in the sampling set is Recall that¯ , the N 0 -average, is calculated using all the particles at a given time step. It follows that the expected number of offspring is not necessarily equal to the number of particles in the sampling set, and so partial branching samplers cannot be included in the previous discussion. Nonetheless, branching algorithms developed using the ideas presented above can also be applied to partial sampling; this is demonstrated empirically in Section IV.

B. QSF-BASED MINIMAL VARIANCE ALGORITHM
The QSF algorithm (introduced in [24], see also [25], [26]) sequentially generates random variables with pre-specified marginal probability mass functions and covariance matrix, by modeling the problem as the simulation of random vertices 1 The pseudo-code for the specific version of the algorithm we discuss here is provided in the Appendix. on a graph with given edge weights. The vertices of the graph correspond to the random variables to simulate, while the edge weights correspond to the covariances between a pair of random variables/vertices. The full version of the algorithm uses auxiliary marginal distributions in order to enlarge the set of reproducible joint distributions. In this article, we use a simplified version of the algorithm to generate our random variables, which we recall here.
We consider a collection of n marginal probability mass function with finite support denoted by π i (·), i ∈ {1, . . . , n}.
, and introduce the following auxiliary functions: We also denote X i = (X 1 , . . . , X i ) and x i = (x 1 , . . . , x i ), i ∈ {1, . . . , n}, with X = X n . Following [26], we have that if . . , n}} are numbers such that the right-hand side of is in [0, 1], then the joint distribution of the random vector X obtained recursively using P(X 1 = x 1 ) = π 1 (x 1 ), (4) and has marginal distributions π i (·) and covariances Cov(X i , X j ) = s 2 ij , i, j ∈ {1, . . . , n}. The goal of our new algorithm is to reproduce the marginal distributions of M i and M 1:i−1 =: i−1 k=1 M k , as well as the correlations between the M i 's induced by the minimal variance algorithm. That is, following Section 9.2 of [1], we want where a 1:i = i k=1 a k . We also want to have the same covariances between M 1:i−1 =: i−1 k=1 M k and M i , i ∈ {2, . . . , n} as those that result from the application of the minimal variance sampling algorithm, given in the following proposition.  .

Algorithm 2 QSF Minimal Variance
Number of offspring at location i 7: Place offspring in new location 9: x N t−1 −h+k,t =x i,t  For the induction step, we denote U i as in the proof of Proposition 1 and let (U i ) n i=1 be i.i.d Uniform(0,1) random variables. To show (a), we note by (7), (8) and (9) that if {na i } + {na i+1:n } < 1, and To show (c), it suffices to observe that The result follows by (a) and the induction hypothesis.
To show (d), we first re-write (M 1:i − na 1:i ) 2 as where the last equality is obtained using the induction hypothesis and (b). The QSF minimal variance is built so that Cov(M 1:i−1 , M i ) = σ , with σ given by (10) and (11) if {na i } + {na i+1:n } < 1, and We also observe that where the fourth equality holds by (14).
If {na i:n } = 0, then by (14), {na i } = {na i+1:n } = 0 since both values must be non-negative. It follows from (12)  where the second equality holds by (15) and the last equality is true whether {na i+1:n } is 0 or not.

C. BRANCHING ALGORITHMS WITH NEGATIVE DEPENDENCE
The branching algorithms that we propose in this section ensure that the variance of the number of offspring at each location, M i , remain minimal for all i. However, we relax the condition that the total number of particles remain stable through time. Such a relaxation is considered with the goal of reducing computational time of the sampling procedure.

1) ANTITHETIC VARIATES
The lowest possible correlation between two random variables with given marginals can only be attained if the random variables are countermonotonic.
Here we propose to correlate the I i 's two-by-two so that each couple has a countermonotonic dependence structure. We do so by simulating n/2 Uniform(0,1) random variables (or (n+1)/2 if n is odd), and by generating two countermonotonic Bernoullis using each of the uniform random variables. The pseudo-code for this method is given in Algorithm 3.

2) LIST SEQUENTIAL SAMPLING
The sampling method of [3] can be used to generate a vector of dependent Bernoulli random variables with predetermined conditional correlations between each component. It is similar to a special case of the quick simulation fields algorithm of [26] applied to multivariate Bernoulli random variables, but it uses conditional covariances instead of unconditional ones.
The method stems from the experiment design and sampling literature; each Bernoulli to simulate can be seen as a unit which will either be sampled or not in the context of a survey. The inclusion probability of each unit is proportional to some quantity of interest and it can differ from one unit to the other. Introducing negative correlation between inclusion indicators can reduce the variability of the results of the survey.
The general idea of the method is to go through each unit one by one in a pre-specified order (so it is a type of list sequential sampling) and to decide whether or not this unit will be included in the sample; this is equivalent to simulating a Bernoulli, where 1 indicates inclusion of the unit. After each unit is sampled, the conditional laws (or inclusion probabilities) of all the yet-to-be-sampled units are updated, based on the value of the new simulated Bernoulli (or inclusion decision) and on correlations (or correlation-based weights) chosen by the sampler.
Therefore, for all i ∈ {1, . . . , n}, if we let p (0) i = { i /¯ } be the unconditional Bernoulli parameter (or unconditional inclusion probability), the updated parameters are given by To ensure that the updated conditional inclusion probabilities p A non-zero weight creates dependence between the i th and the j th random variables. It is explained in [3] that negative correlations will be obtained by choosing positive weights β (i) j−i . The maximal weight that can be chosen is therefore the upper bound in (16), given that the sum of the total allocated weights remains smaller than or equal to 1. This list sequential method using maximal weights is implemented in Algorithm 4. The parameter m, to be selected by the user, indicates the maximum number of random variables that will be negatively correlated with each ρ i .

A. MODEL
To assess the speed and stability of the proposed algorithms, we test on a common model from the particle filtering literature (see for example [28]), given by where U t is normally distributed with mean 0 and variance 10 and V t is standard Cauchy distributed. For this model, we calculate the error as where f is defined as

1) PROCEDURE
In all experiments, to determine an optimal value of r for sampling, we proceed in the following manner. We set the number of time steps to be T = 1000 and the number of trials to be 1000. We generate 1000 random copies of the signal, one for each trial, at the beginning of the experiments. We use these random copies for all experiments so as to be consistent when evaluating different algorithms and evaluating values of r for a given algorithm. The values of r we study range from 1 to 6. This range was chosen based on preliminary experiments to ensure that it includes the optimal value for each algorithm.
For each algorithm and for each value of r, we seek the execution time of the algorithm for a fixed performance. To do so, for each algorithm, we specify an initial number of particles N 0 = 150 and run the algorithm for 1000 trials. At the end of each trial, we calculate the error. If the average error over all trials is lower than the specified threshold, we then accept that average time. If the error is above the threshold, we increment the number of particles by 10 and repeat the experiment.
All the algorithms are implemented according to the pseudo-code included in this article. For the list sequential sampling (Algorithm 4), we set m = 3 to increase the amount of dependence between the particle locations without slowing down the algorithm too much.
We set the error threshold to be 14. These values are based on preliminary experiments which determined an error that was not so high to be unachievable given our limited computational resources, but not so low as to be reached trivially by all algorithms. All run times are recorded in milliseconds.

2) SUMMARY STATISTICS
The main summary statistics used to evaluate particle variation are standard deviation range and the average maximum and minimum number of particles. For every trial i and over all time steps in that trial, denote the maximum number of particles by N max,i , the minimum number of particles by N min,i , and the average number of particles by N i . Let N max denote the average of all N max,i 's, N min the average of all N min,i 's, and N the average of all N i 's. We report both N max and N min .
For each trial, the standard deviation of the number of particles is calculated after all T time steps, and then averaged across all the trials. The resulting statistics is denoted σ N . We then calculate the standard deviation range. The intuition for the standard deviation range is that we wish to know the range of the number of particles corresponding to two standard deviations both below and above the mean. To obtain the standard deviation range, we divide 4 times the standard deviation σ N by N :

3) CONFIGURATION
All simulations were coded in C++ and run in RStudio using Rcpp on a PC with a Dual Intel Xeon Processor E5-2650 v2 and 64 GB of RAM.

1) BRANCHING PARTICLE STABILITY
An important concern about branching methods is particle control. Here, we assess the variation in the number of particles resulting from the branching algorithms we consider. We first observe that in all three cases, σ increases with r. Increasing r means that fewer particles are branched, since increasing r widens the interval in which the particles are left untouched. As r increases, the weights of the particles considered for branching become either very high (and will likely create a large number of offspring) or very low (and will die off with a high probability). This explains the larger variation in the number of particles when r increases.
We also observe that a higher number of initial particles N 0 contributes to stabilizing the number of particles throughout the filter. Comparing Fig. 1a and 1b shows that the effect becomes significant for higher values of r. This trend confirms similar particle stability experiments in [20]. The risk of the number of particles exploding decreases as N 0 increases. This result might be surprising for those who believe that branching algorithms are doomed to particle instability, but is not so surprising once we note that, as described in [20], the expected number of particles at time t for our branching algorithms is always the initial number of particles N 0 rather than the previous number of particles N t−1 .
For lower values of r, antithetic variates and list sequential sampling appear to offer significantly higher particle stability than the combined branching algorithm. This may be due to the explicit negative dependence structure between the number of particles at certain locations. As r increases, this difference disappears, and combined branching offers greater stability. We find in the next section that the optimal parameter r for these three algorithms fall between 2.05 and 3.50, depending on the method. In this range, all three algorithms perform very similarly in terms of particle stability. It should be noted that while particle stability resulting from branching methods is exemplified here in a particular context, previous experiments in different settings have also lead to similar conclusions. In particular, [22] uses sequential branching Monte Carlo to simulate asset prices in the context of the Heston stochastic volatility model (see [16]) and show similar particle stability. Such property was also observed when using branching particle filters to calibrate the Heston model in [36]. We are therefore confident that the particle stability property we observe in the numerical example presented here translates to a variety of problems. Table 1 presents the results of the experiment described in Section IV-B1. As expected, the basic bootstrap is slower than the other algorithms and requires a higher number of initial particles to reach a similar precision. Fig. 3 shows that execution time of the basic bootstrap algorithm can be reduced by partial sampling, but remains high. It is interesting to note that the best value r for the bootstrap algorithm is much higher at 5.65 than for all the other algorithms (between 2.05 and 3.50). Such a high value results in much fewer particles being sampled, which speeds up the algorithm. Although 5.65 is identified as the optimal r, Fig. 3 shows that the run time of the algorithm (for the fixed performance that we selected) remains around the same level for values of r between 3.5 and 6.  The antithetic variates branching algorithm is fastest overall, both when considering the best performing r (in Table 1 and Fig. 2). Fig. 3 and 4 show that this result is robust for most values of r considered. This result is not surprising given the low computational complexity of the antithetic variates algorithm, combined with the fact that it requires the simulation of half as many random variables as there are particles to branch. Nonetheless, except for the bootstrap, the other algorithms considered are at most 10% slower than the antithetic variates. These run times are also not particularly sensitive to the choice of r, although they tend to increase for high values of r.

2) PERFORMANCE RESULTS AND DISCUSSION
It is important to remark that the problem considered here is chosen from the interacting particle filter literature, and that our results do not contradict other empirical results showing greater outperformance by branching algorithms in problems chosen to highlight the benefits of branching (see for example [21]).
All algorithms can be sped up by partial sampling, as shown in Fig. 2. The time improvement is most noticeable for the basic bootstrap algorithm. A slight U-shape can be observed in Fig. 4 for all algorithms except the bootstrap, indicating that values of r between 2 and 4 are optimal for this problem, with performance worsening for r outside of this range. We hypothesize that this observation is the result   of two phenomenons. First, when r is close to 1, too many particles are sampled, which introduces unnecessary noise in the system. In other words, we are unnecessarily perturbing particles that already accurately represent the signal. When r is too large, not enough sampling is taking place, which leaves low-quality particles untouched. Further insight can be obtained from examining Fig. 6: there seems to be a tradeoff between the value of r and the initial number of particles N 0 required to reach the error threshold. When r is high, fewer computational resources are expended sampling particles, but more particles are needed to reach the error threshold. When r is low, we possibly require fewer particles to reach the error threshold, but need to sample more of them. The new algorithms we introduced in this article fare at least as well as existing ones. However, it appears from Fig. 3 that the list sequential sampling algorithm may be slightly slower overall. The negative dependence it creates between the number of particles at different locations may be too weak to offset the added computational complexity. In contrast, the antithetic variates branching induces negative dependence between particles only two-by-two. As shown in Fig. 6, this somewhat less sophisticated negative dependence structure seems to require a slightly higher initial number of particles N 0 . However, the simplicity of its implementation makes it faster, which makes up for the increased number of particles.
It could also be argued that the version of the combined branching algorithm we implemented is new, since we modified it to remove the shuffle. Our results show that the possible bias induced by this omission does not worsen the performance, and that the modified algorithm performs similarly to the branching algorithms that include explicit negative dependence.
Our new implementation of the minimal variance dependence structure structure, which uses quick simulation fields rather than embedded ''if'' statements, performs as well as the minimal variance algorithm of [7]. When the best value for r is considered (in Table 1), both run times are very close. This result holds across values of r, see Fig. 4.
While minimizing the variance of the total number of particles is theoretically desirable, our results show that relaxing this requirement and letting N vary may result in better performing algorithms. The increased variance in the number of particles can be offset by the reduction in run time, especially when the variance of the particles at each location is kept minimal. In the problem we consider here, the performance of the branching algorithms is therefore not impaired by the variation in the number of particles.

V. CONCLUDING REMARKS
We proposed new sampling algorithms that explicitly create negative dependence between the number of particles reallocated to each location in the sampling step of a particle filter. One of these new algorithms establishes a novel way to impose the so-called minimal variance dependence structure via quick simulation fields. Our method may be easier to understand and our numerical experiments show that it performs at least as well as the original minimal variance algorithm. In order to improve the performance of the filtering procedure, all three new algorithms keep the variance of the number of particles at each location minimal, but some allow for variation in the total number of particles. Numerical results show that explicit negative dependence can be implemented efficiently and results in high performing algorithms.
We also recall the partial sampling scheme of [20] and propose to implement it in a wide range of interacting and branching procedures. In all the cases we considered, partial sampling reduces the run time of filtering algorithms for a fixed performance. We also show that the introduction of partial resampling into interacting particle filters significantly closes the gap between them and branching particle filters.
Although our numerical results are model specific, they contribute to the growing body of literature (see [20], [22]) providing evidence that branching particle filters are stable when they are correctly implemented. Further testing and application of branching filters and branching Monte Carlo methods in general is desirable to further the analysis of these methods.

APPENDIX
This section contains some of the algorithms mentioned in the main part of the text. They are the algorithms that are Generate U i a Uniform(0,1) 8: while V i ≤ p j do 10: j = j − 1 11: end while 12: x N t +k,t =x j+1,t while U k ≥ p j do 20: j = j + 1 21: end while 22: x N t +S+k,t =x j,t 23: end for 24: end procedure implemented for comparison purposes in Section IV. Similarly to the algorithms presented in Section III, we drop the reference to time to simplify the algorithms. for i ∈ C t do Deterministic number of offspring at location i. end if 14: for k ∈ {1, . . . , M i,t } do 15: x N t−1 −h+k,t =x i,t for i ∈ C t do 4: Generate U i , a Uniform i−1 n , i n 5: ρ i,t = 1 {U i ≤{ i,t /¯ }} 6: end for 7: end procedure The basic multinomial bootstrap algorithm [15] is detailed in Algorithm 5. The interacting combined sampler [10] is a combination of the residual [29] and stratified [19] methods. The implementation we consider is given in Algorithm 6. The minimal variance algorithm was introduced in [4]. In Algorithm 7, we present modified version of the one in [1], given in [20]. Finally, a modified version of the combined branching algorithm of [20], against which we compare our new branching algorithms, is detailed in Algorithm 8. The modification rests in the fact that we do not perform a full random permutation, as in the original version of the algorithm. Rather, we rely upon the natural particle reorderings in Algorithm 1 as a result of sampling to permute the particles.