Cooperative Localization for Multiple Soccer Agents Using Factor Graphs and Sequential Monte Carlo

This paper addresses the cooperative localization problem for a multiagent system in the framework of belief propagation. In particular, we consider the RoboCup 3D Soccer Simulation scenario, in which the networked agents are able to obtain simulated measurements of the distance and bearing to both known landmarks and teammates as well as the direction of arrival (DOA) of messages received from allies around the field. There are, however, severe communication restrictions between the agents, which limit the size and periodicity of the information that can be exchanged between them. We factorize the joint probability density function of the state of the robots conditioned on all measurements in the network in order to derive the corresponding factor graph representation of the cooperative localization problem. Then we apply the sum-product-algorithm (SPA) and introduce suitable implementations thereof using hybrid Gaussian-Mixture Model (GMM) / Sequential Monte Carlo (SMC) representations of the individual messages that are passed at each network location. Simulated results show that the cooperative estimates for position and orientation converge faster and present smaller errors when compared to the non-cooperative estimates in situations where agents do not observe landmarks for a long period.

required to track objects of interest (such as targets in surveillance applications) or continuously map their surrounding environment. In particular, the ability to estimate and share their own positions is an essential perception capability that agents must have in typical military and civilian applications. In this context, fully-distributed cooperative localization methods, e.g. in [3]- [15], are of special interest in practical applications since they often scale with the number of network nodes and do not rely on either a data fusion center or a central system administrator [5] -a single point of failure in conventional, centralized architectures.
In this paper, we focus on the cooperative localization problem considering a sensing environment based on the RoboCup 3D Soccer Simulation domain [16], in which individual humanoid soccer players are able to indirectly observe their corresponding locations with respect to (w.r.t.) known and uniquely identified landmarks, e.g. soccer field corners or goalposts, and w.r.t. allied players. 1 Moreover, each individual player obtains information about its desired displacement from its omnidirectional walking engine.
More specifically, players can measure the direction of arrival (DOA) of incoming messages transmitted by their allies scattered across the soccer field as well as the range and bearing to both allies and landmarks that are detected and unambiguously identified in their cameras' field-of-view (FoV). RoboCup rules (refer to [16]) impose particular rates at which each robot takes new measurements and specify how often one agent may broadcast information to its teammates.
The communication limitation imposed by the rules is a major concern, since the agents only transmit unidirectionally, at pre-allocated time slots and a limited number of bytes per slot. An algorithm for such a scenario cannot require multiple information exchange between robots at the time instants where mutual measurements are assimilated.

A. RELATED WORK
Estimation techniques are usually categorized as Bayesian and non-Bayesian. Two frequently employed non-Bayesian algorithms, least squares (LS) and maximum likelihood (ML), as well as their distributed implementation, are reviewed in [4]. For an extensive review of non-Bayesian localization techniques for wireless sensor networks, see also [17].
The main Bayesian techniques for sequential state estimation are based on the standard Kalman filter [18]- [20], and Sequential Monte Carlo (SMC) filters, also known as particle filters [21], [22]. Concerning algorithms for robot self-localization, they often rely on cameras to detect landmarks and use them as references for global positioning. In robot soccer, the most commonly used landmarks are goalposts and field lines, especially corners and intersections [23]. Reference [24] presents a survey of Kalman-based filters for robot vision applications -e.g. localization, motion control and tracking -whereas seminal work on particle 1 We use the terms allies and teammates interchangeably. filter based algorithms for robotics is presented in [25], [26]. Dellaert et al. [26] remark that, in general, the particle filter approach is more suitable for robot global positioning in scenarios where there can be ambiguities, such as robot soccer, due to the Kalman filter's limitations on propagating a multimodal probability density function. An application of Monte Carlo localization in robot soccer is shown in [27].
The approaches mentioned in the previous paragraph are non-cooperative, i.e. the scenario consists of a single robot taking measurements from landmarks in its surroundings and previous information on the map is known by the robot. Collaborative multi-robot localization was proposed in [9]- [15], emphasizing the need of decentralized approaches in face of the downsides of a fusion center (introduction of a single point of failure in the system), as well as the high processing and communication costs. Fox et al. [9] propose a method where each robot must keep a factorial representation, that is, its own local belief. In particular, those beliefs are stored as sets of particles, following the lead from previous works in Monte Carlo, single robot localization. During localization, detections that a robot makes about other ones are used to introduce additional probabilistic constraints, which are transmitted by the observer robot to the observed robot so that the latter can update its self belief.
An EKF-based approach is proposed in [10], whose main challenge is the need of all-to-all communication at all time instants to maintain local estimates of the cross-correlations and coupling between agents. Scenarios with communication restrictions were explored in [11], where persistent connectivity between the robots is relaxed, but they still need full duplex communication when updating their beliefs. A different approach based on consensus is proposed in [12], where agents are required to iteratively exchange information between themselves whenever a mutual measurement is taken.
A belief propagation framework for formulating the cooperative localization problem is proposed in [5] and explored in [13]- [15]. Each robot can be naturally represented as one node of a factor graph where a local belief is stored and propagated to the other nodes according to a message passing schedule defined by the sum-product algorithm (SPA) (see [28]- [31] for further details). Different implementations arise from this framework. For example, a parametric message passing scheme is proposed in [13], where the functions that specify the observation model are linearized and parameterized as Gaussian functions, and the computation of complex integrals in the SPA is reduced to simpler operations on their distribution parameters. On the other hand, a terrainaided navigation is presented in [14], where the integrals are computed using an SMC method. The resulting algorithm has the advantage of representing the underlying probability distributions more precisely, which can be an advantage in scenarios where there can be ambiguities, at the cost, however, of being computationally heavier. The factor graph formulation is also used in [15] to model and address specific problems of time of arrival (TOA) based systems, such as clock synchronization among others.
Other cooperative approaches have arisen in robot soccer in the sense that the team jointly tracks objects in the field, such as the ball, and then an individual agent uses the common view to improve its self-estimate. In particular, [32] presents a cooperative localization estimator based on least squares, which stores a batch of observations and then optimizes over the possible states. Reference [33] presents a realtime algorithm for tracking the ball using Bayesian sensor fusion and [34] presents an online and decentralized approach using particle filters to jointly perform object tracking and cooperative localization.

B. CONTRIBUTIONS AND PAPER OUTLINE
The main contribution of this paper is to propose a cooperative self-localization algorithm in an environment with communication restrictions, where the relative measurements are taken with cameras and DOA sensors as defined in [16]. Differently from [12], [15], those measurements do not require clock synchronization between the agents, although vision-based localization imposes its own challenges, such as the ambiguity the environment may present [9] and the fact the robots' vision can be blocked by other robots [23]. The way the camera and other sensor data are preprocessed to obtain useful information, either from landmarks or other teammates, is not in the scope of this paper.
We use the factor graph framework for translating our network topology into a belief propagation algorithm, in which the agents cooperatively propagate their marginal posterior distributions (or local beliefs) regarding their individual poses, i.e. the robots' positions and orientations w.r.t. a twodimensional Cartesian coordinate system bonded to the soccer field. Unlike previous works, we do not assume that the robots are capable of communicating with any other player in their team at any time instant, especially when assimilating cooperative measurements. Instead, the proposed belief propagation algorithm is designed to abide by the communication constraints imposed by RoboCup rules, where only one agent may transmit a limited amount of data at each time instant reserved for the team communication.
Additionally, we provide a feasible implementation of the SPA based on hybrid Gaussian-Mixture Model (GMM) / Sequential Monte Carlo representations of the propagated beliefs, which are more suited for scenarios where there can be ambiguities [9], [26]. While the local computations are performed using particle methods, the communication cost of the beliefs are substantially reduced by parameterizing them before transmission.
Compared to non-cooperative methods for localization, which rely solely on local data, the incorporation of mutual measurements to the self-localization problem reduces the number of detected landmarks required by a robot to achieve a single modal estimate and, as verified in this paper, leads to a significant improvement in the convergence of the errors and to higher accuracy in scenarios where the robots remain a long time without detecting any landmarks. Compared to other cooperative localization methods, our approach does not require that agents exchange information between themselves when they collect mutual measurements.

1) PAPER OUTLINE
The paper is divided into six sections. Sec. I is this Introduction. In Sec. II, we describe the signal model. In the sequel, we introduce the proposed factorization of the joint posterior distribution of the individual robot poses conditioned on all available network measurements and derive its corresponding factor graph (FG) representation. In Sec. III, we present the message passing schedule for our cyclic FG. In Sec. IV, we detail our implementation of the SPA using the hybrid GMM / SMC approximation. Simulations and comparison of different scenarios are shown in Sec. V and conclusions are presented in Sec. VI.

2) NOTATION
We denote scalars by a or A, vectors by a and matrices by A.
(.) T denotes matrix transpose. We denote random scalars and vectors respectively by A and A, which can be distinguished from deterministic scalars or matrices in context. The probability density function (p.d.f.) p evaluated at a is denoted by p(a). The p.d.f. of a multivariate real-valued Gaussian distribution parameterized by the mean vector µ and the covariance matrix is represented as f N (x; µ, ).

A. SIGNAL MODEL OF THE ROBOT DYNAMICS AND MEASUREMENTS
We assume a system with R humanoid robot soccer players (or agents). At discrete-time instant n, the state of agent r is a three-dimensional, real vector x r,n containing its (x, y) position in the field and its orientation (heading) w.r.t. the x axis x r,n = x r,n y r,n ψ r,n T .
Each robot is equipped with a camera, with which it may measure the distance and bearing to fixed landmarks in the field and to players of the same team, also known as teammates or allies. Those elements are uniquely identified, i.e. there is no wrong data association once they are detected within the camera's FoV and subsequently identified by the robot. At time n, each individual measurement taken by agent r of a landmark l or an ally a is denoted respectively as o r←l,n or o r←a,n . The sets containing the indices of the measured objects are denoted by L r,n and A r,n , respectively. Robots are also able to broadcast messages to their team. In that case, each listening agent obtains a measurement of the DOA of that message, which is a straight line from itself to the broadcaster. The DOA measurement of a broadcaster agent b ∈ B r,n is denoted as φ r←b,n . In our setup, broadcasting is restricted to a maximum of one agent at once, so that B r,n is either a unitary or an empty set at each time step. Also, B r,n is the same for all r, since all robots can hear the broadcaster.
The mathematical modeling of the state dynamics and the sensor measurements is described in the sequel. The sampling time is fixed to a value T s .

1) ROBOT DYNAMICS
We use the dynamic model given in [27], [35], [36]. The system input is the vector u r,n given by u r,n = x r,n y r,n ψ r,n T , where each entry is known by agent r and corresponds to the calculated displacement of the robot given by the omnidirectional walking engine of the humanoid. They are given in the local Cartesian (x , y ) frame of the robot, which is rotated (counter-clockwise) by an angle ψ r,n w.r.t. the global (x, y) frame, attached to the field. Therefore, the x and y axes are pointed according to the robot's heading and the robot's left, respectively. The robot's dynamic model is given by where f(x r,n , u r,n ) =   x r,n + x r,n y r,n + y r,n ψ r,n + ψ r,n   (4) and the input is converted to the field reference frame according to The state noise W r,n is modeled as a multivariate Gaussian vector with distribution N (0, Q). Conditioning the current state vector on the previous one, we have x r,n+1 |x r,n ∼ N f(x r,n , u r,n ), Q , with associated p.d.f. p f given by Lastly, it is worth noting that, under RoboCup rules [16], the robot's dynamic is updated at every discrete time step.

2) DIRECTION OF ARRIVAL FROM BROADCASTING ALLY
The observation model for the direction of arrival of the message from broadcasting ally b to robot r is given by where and v φ r,n ∼ N (0, σ 2 φ ). The function atan2 takes into account the signs of the numerator and denominator to compute the arc tangent in the range (−π, +π ]. The subtraction by the orientation of the receiving robot is due to the fact that all measurements are taken in local coordinates w.r.t. the robot r.
Conditioning the measured direction of arrival on the state of the broadcasting and receiving robots, it follows that with associated p.d.f. p given by Under RoboCup rules, the broadcast window is open at every two time instants, to a maximum of one broadcasting robot. Then each listener is informed of the broadcaster self belief and takes the opportunity to perform a DOA measurement. The message that an agent can transmit is also limited to 20 bytes in size.

3) CAMERA OBSERVATION
The observation model for the measurement of range and bearing obtained from observing an object o, which can be either a landmark or an ally, using the camera is given by where (13) and h φ is already defined in (9). The noise in each component is white and Gaussian, with Conditioning the camera observation on the states of the robot r and object o, we have where with the associated conditional p.d.f. p O written as Note that the dependence which the variance of the distance noise v ρ r,n in (14) has on the true distance poses additional complexity, since x r,n is always unknown, and x o,n is known only if o corresponds to a landmark. In the filter described later on, robot r estimates the variance in (14) using local estimatesx r,n andx o,n if o is a teammate, or the local estimatex r,n and the known vector Under RoboCup rules, each agent captures an image from its camera every three time instants, taking the corresponding VOLUME 8, 2020 measurements to the objects (landmarks and allies) which were in the angular section defined by the camera's FoV, independently of the range.
We note that the ranging between agents is done using a camera, which does not rely on TOA measurements, as in [12], [15]. Although TOA ranging can yield precise measurements, it relies heavily on clock synchronization between agents, which typically has to be jointly estimated with the robot's pose.
In order to consolidate all measurements that are taken at time instant n by robot r, we denote the vector y r,n as the collection of all measurements o r←l,n , o r←a,n and φ r←b,n , for l ∈ L r,n , a ∈ A r,n and b ∈ B r,n .
Considering a state vector X n and an input vector u n comprising the individual state vectors {X r,n } r=1:R and input vectors {u r,n } r=1:R of all robots, and an observation vector Y n collecting the measurements taken by all robots at time n, we end up with the complete state space model where f is the direct extension of function f in (4) and h n is a vector function composed by instances of functions h φ and d defined respectively in (9) and (13), whose dimension depends on the number of measurements at time n. The vector W n collects the state noises in (3), whereas V n collects the observation noises in (8) and (12).

B. FACTORIZATION AND REPRESENTATION WITH FACTOR GRAPH
Factor graphs are a graphic way of representing factorized multivariate functions and systematically obtaining the marginalization w.r.t. any of the variables applying the sumproduct algorithm, as originally proposed in [28]. References [4], [14] are examples of the application of FG in the context of distributed statistical signal processing. At time instant N , the conditional p.d.f. of interest for the filtering problem is p(x N |y 1:N ), which can be obtained from the marginalization of the posterior joint p.d.f. denoted by where the proportionality symbol ∝ indicates that the righthand side of the expression must be multiplied by a normalization factor such that the function in the left is a valid p.d.f. Assuming that each agent moves independently and using the Markovian assumption for each individual agent, as presented in the dynamics model (7), Using the assumptions of conditional independence of each measurement and the models in (10) and (12), the likelihood p(y 1:N |x 0:N ) is written as We introduce the following short notation for the p.d.f.'s in (24) and (25) Fig. 1 depicts the factor graph corresponding to (23), using the results from (24) and (25), detailing robots r and b and their measurements at time instant n. Note that, in this particular example, robots r and b are seeing each other in their cameras and robot b is broadcasting its self-belief. The symbol ← in the observations is not commutative, which means, e.g. that robot r sees a with its camera, but a does not see r. Note that the symbol ⇒ in Fig. 1 indicates any restriction on the existence of the preceding factor nodes, i.e. the factor nodes exist whenever the logical proposition following the symbol ⇒ is satisfied.
The decentralized approach, where each agent's state estimation is performed locally, results in a cyclic factor graph. By contrast, a centralized approach would generate a cyclefree factorization which can compute exact results, but is not scalable with the number of agents. The joint state vector would grow linearly with R, whereas the joint observation vector could grow with R 2 , considering the fully connected scenario. 2 Furthermore, Soccer 3D rules do not allow a direct communication link at all times between players or to a central data fusion processor, which makes any centralized approach unfeasible.

III. MESSAGE PASSING SCHEDULE
The message passing schedule comes directly from the implementation of the SPA following the rules for factor and variable nodes, described in [28], [31]. Due to the mutual measurements, our factor graph contains cycles, which implies that, to the best of our knowledge, there is no systematic procedure to exactly calculate the marginals. In addition, our focus is on the filtering problem, where the self-estimates must be available at each agent for real-time applications, which makes a smoothing approach with backward message passing prohibitive. The algorithm is detailed as follows.
Step 1: predict. Let m B r,n−1 (x r,n−1 ) and m B r|s,n−1 (x s,n−1 ) denote the marginal posterior p.d.f.'s (or beliefs) of the poses of robots r and s, respectively, stored by robot r at instant n − 1. Robot r computes first the prediction (P) message which is proportional to the predicted belief of its own pose obtained via the Chapman-Kolmogorov equation using the factor (26) defined from the transition p.d.f. in (7).
Step 2: assimilate measurements of landmarks within the camera's FoV. Next, robot r assimilates the camera rangebearing measurements {o r←l } of the observed landmarks l ∈ L r,n . Specifically, it computes the prediction (P) plus landmark assimilation (L) message in which each message m L r←l,n (x r,n ) represents the likelihood of the individual measurement o r←l of a landmark l at a known location x l , i.e.
Steps 3 and 4: broadcast self-belief (broadcaster robot) and assimilate DOA measurement (non-broadcaster robots). In the sequel, robot r follows some mutually agreed convention among all network nodes to select the current broadcasting robot. If it is robot r's turn, it broadcasts its locally updated belief m PL r,n (x r,n ) to its allies across the soccer field and makes the message m PLH r,n (x r,n ) = m PL r,n (x r,n ). Otherwise, it listens for a similar message from the selected broadcasting robot b = r. Upon receiving the message m PL b,n (x b,n ), node r measures the DOA φ r←b,n of the incoming message from robot b and assimilates it by computing which is given by the product between the local updated belief m PL r,n (x r,n ) and where λ H r←b,n (x r,n , x b,n ) is defined in (28) and m B r|b,n (x h,n ) is the latest belief that robot r has about robot b's pose at instant n. In this case, m PL b,n (x b,n ) is directly assigned to m B r|b,n (x b,n ), since b has just broadcast the most updated belief it has for itself.
Step 5: extrapolate beliefs. When assimilating the camera range-bearing measurements o r←a,n of the observed allies a ∈ A r,n , robot r will not have access to the most updated belief of a teammate that had not broadcast at time instant n. Therefore, each node locally keeps an extrapolated belief of the other robots, for all s ∈ {1, . . . , R} \ {r}, computed as where p w (x s,n |x s,n−1 ) denotes the transition p.d.f. of the extrapolation model. In the robot soccer context, a sophisticated model could take into consideration the strategy followed by the team. In our case, however, we adopt a simple random walk model with Gaussian increments in the components of the state vector, in such a way that p w (x s,n |x s,n−1 ) f N (x s,n ; x s,n−1 , w ), with The covariance matrix in (36) is diagonal and the entries σ 2 x , σ 2 y and σ 2 ψ satisfy 3σ x = x MAX , 3σ y = y MAX and 3σ ψ = ψ MAX , resulting in the 1/9 factor. For a Gaussian random walk model, this guarantees a probability of 99.7% that the displacement on each direction is smaller than the maximum possible displacement.
As in (26), we define the following short notation for p w λ W s,n λ W s,n (x s,n , x s,n−1 ) = p w (x s,n |x s,n−1 ).
Step 6: assimilate measurement of allies within the camera's FoV. Then, robot r incorporates the observations of allied (A) robots by computing the message where each message m A r←a,n (x r,n ), ∀a ∈ A r,n , is computed as with λ A r←a,n (x r , x a ) defined in (29) and m B r|a,n (x a,n ) representing robot r's belief of robot a's pose at instant n.
Step 7: assign new updated belief. Finally, we make the updated belief m B r,n (x r,n ) = m PLHA r,n (x r,n ) and store it along with the remaining robot pose beliefs {m B r|s,n (x s,n ) }, s ∈ {1, . . . , R} \ {r}, to be used in the next time step iteration n+1. Algorithm 1 summarizes the proposed message passing schedule indicating the SPA procedure running at each network location r at a particular time step n. The algorithm is initialized at each network node r assuming that m B r,0 (x r,0 ) = f N (x r,0 ; µ r,0 , r,0 ) and m B r|s,n (x s,n ) = f N (x s,0 ; µ s,0 , s,0 ), ∀s ∈ {1, . . . , R} \ {r}. Fig. 2 shows the derived factor graph and message passing scheme, highlighting the communication between two different nodes r and b at instant n. The blue and red arrows represent the messages computed inside nodes r and b, respectively. The filled arrows show the intranode messages, while the dashed arrow shows one internode message, i.e. a message sent over the communication network. The variables and factors filled in gray and the messages over them represent the extrapolation that each node performs of its teammates.

IV. SMC IMPLEMENTATION OF THE SPA
Due to the nonlinearities in the model and the lack of closedform analytical expressions for the integrals in the problem, we implement the forward SPA in Algorithm 1 using Monte Carlo representations of the functions that correspond to the messages according to the following steps: Initialization step: obtain a particle representation of the previous belief. When initializing the algorithm, we directly sample N p particles of the known distribution m B r,0 (x r,0 ) in order to obtain its corresponding weighted particle represen- and the notation x (p) ∼ f (x) indicates that the sample x (p) is drawn according to the continuous probability distribution specified by the p.d.f. f .
After the algorithm has started running, its last step already returns m B r,n−1 (x r,n−1 ) as a weighted particle representation r,n−1 ) }, p ∈ {1, . . . , N p }, thus this step is not needed in the subsequent time instants.
Step 1: predict. The prediction step consists of evaluating the Chapman-Kolmogorov equation, which can be approximated using the particle representation for m B r,n−1 according to The approximate representation for message m P r,n (x r,n ) in (43) can be interpreted as a multimodal Gaussian, with N p modes, each with weight w B,(p) r,n−1 and centered at f(x (p) r,n−1 , u r,n ), calculated from the previous particle state and the desired displacement. Note, however, that the approximation in (43) differs from the method proposed for the prediction step in [14] where the equivalent to both m B r,n−1 (x r,n−1 ) and λ P r,n (x r,n , x r,n−1 ) are approximated by Gaussian or sums Step 1: predict 3: Compute message m P r,n (x r,n ) from the previous belief m B r,n−1 (x r,n−1 ) as in (30).

4:
Step 2: assimilate landmarks 5: for l ∈ L r,n do 6: Compute the likelihood m L r←l,n (x r,n ) as in (32) using the measurement o r←l,n of the l-th observed landmark at a known location x l . 7: end for 8: Compute message m PL r,n (x r,n ) as in (31) from the predicted message m P r,n (x r,n ) and the individual likelihoods {m L r←l,n (x r,n )} l∈L r,n .

9:
Steps 3 and 4: broadcast or assimilate DOA 10: Use the team policy to determine the broadcasting node b at instant n. 11: if Broadcasting node b equals r then 12: Broadcast message m PL r,n (x r,n ).

13:
Make m PLH r,n (x r,n ) = m PL r,n (x r,n ). 14: else 15: Receive message m PL b,n (x b,n ) from remote node b and obtain the DOA measurement φ r←b,n . 16:

17:
Compute message m PLH r,n (x r,n ) as in (33) from the messages m B r|b,n (x b,n ) and m PL r,n (x r,n ). 18: end if 19: Step 5: extrapolate beliefs 20: for s ∈ {1, . . . , R} \ {r} do 21: Compute the extrapolated belief m B r|s,n (x s,n ) of node s at node r as in (35) from m B r|s,n−1 (x s,n−1 ). 22: end for 23: Step 6: assimilate observed allies 24: for a ∈ A r,n do 25: Compute message m A r←a,n (x r,n ) as in (39) using the extrapolated belief m B r|a,n (x a,n ) of the a-th ally and the observation o r←a,n . 26: end for 27: Step 7: assign new belief 28: Compute message m PLHA r,n (x r,n ) as in (38) from the messages m PLH r,n (x r,n ) and {m A r←a,n (x r,n )} a∈A r,n .

29:
Make m B r,n (x r,n ) = m PLHA r,n (x r,n ). Having interpreted m P r,n (x r,n ) as a GMM in (43), we sample new N p particles from it in order to build a new weighted particle set {( w P,(p) r,n , x (p) r,n )}, p ∈ {1, . . . , N p } such that The computational complexity of this step is O(N p ), corresponding to the sampling of new particles.
Step 2: assimilate measurements of landmarks within the camera's FoV. At this point, each particle preserves its state x with the proportionality constant such that Step 3: broadcast self-belief (broadcaster robot). The most accurate way of transmitting a Monte Carlo belief is just communicating all weighted particles, which would require transmitting 3N p real numbers. In our case, however, we parameterize the distribution to be broadcast m PL r,n (x r,n ) as a single modal Gaussian N (μ r,n ,ˆ r,n ) witĥ which requires the transmission of 9 real numbers (3 for the mean vector and 6 for the covariance matrix, considering its symmetry). For our particular signal model, the heading of an agent is not needed for assimilating the mutual measurements, thus further reducing the bandwidth requirement to 5 real numbers (2 for the mean vector, 3 for the covariance matrix). Considering the limitation of 20 bytes, we have to represent each real number with 4 bytes. When robot r receives the belief from a broadcaster b at time n, r assigns a Gaussian N (µ r|b,n , r|b,n ) to the belief it has about b, m B r|s,n (x s,n ), with µ r|b,n =μ b,n and r|b,n = b,n , as calculated by b.
Step 4: assimilate DOA measurement (non-broadcaster robots). Similarly to the assimilation of landmarks, at this point, each particle preserves its state, but its weight w L,(p) r,n is changed according to the likelihood of the DOA measurement φ r←b,n given the state of the particle and of the broadcaster ally. However, we only have the parameterized marginal probability distribution of that ally's state, i.e. m B r|b,n (x b,n ). In the particle filter approach, in order to represent the ally's state, we start by directly sampling N q particles from m B r|b,n with uniform weights, i.e. we build the weighted r,n of the landmark assimilation and updating the weights according to the likelihood of the measurement, i.e.
Note that the rightmost sum on the right-hand side of (51) is the Monte Carlo approximation of the integral on the right-hand side of (34) using the weighted The computational complexity of this step is O(N p N q ), corresponding to the update of the weights of the N p particles using the N q particles of the broadcaster robot.
Step 5: extrapolate beliefs. At time n, each robot r updates the Gaussian representation N (μ r|s,n ,ˆ r|s,n ) of the belief it has about the other robots, m B r|s,n (x s,n ), except for a robot that may have just broadcast. For the zero mean random walk model that we adopt, extrapolating consists of maintaining the mean vector and inflating the covariance matrix, i.e.μ r|s,n =μ r|s,n−1 , r|s,n =ˆ r|s,n−1 + w . (53) Step 6: assimilate measurement of allies within the camera's FoV. Similarly to the assimilation of DOA, for each observation o r←a,n , we must have a local representation with N q particles for the belief of ally a, drawn directly from m B r|a,n , i.e.
As in (51), we construct a particle representation of m PLHA r,n (x r,n ) as the weighted particle set {( w r,n = 1. Again, the rightmost sum on the righthand side of (56) is the Monte Carlo approximation of the integral on the right-hand side of (39) using the weighted particle set {( w B,(q) a,n , x (q) a,n )}. The computational complexity of this step is O (N p N q R), corresponding to the update of the weights of the N p particles using the N q particles of a maximum of R − 1 observed teammates. In practice, however, due to limitations in the camera's FoV, the cardinality of the set A r,n which collects the measurements o r←a,n at node r at any instant n is typically lower than R − 1, and especially much lower when R is large, resulting in a tenable computational cost even when the number of agents grow. In soccer applications in particular, R is usually limited.
Step 7: assign new updated belief. Message m PLHA r,n (x r,n ) is then adopted as the most recent belief of robot r, i.e. the particle representation of m B r,n (x r,n ) is obtained by keeping the state x Considering all the assimilation steps, the overall computational complexity of the fully cooperative algorithm is O(N p N q R + N p L).

A. SCENARIO GENERATION
We simulated a team of R = 5 robots in a single computer, using a simulator developed in Matlab R , applying the timing constraints for moving, communicating and collecting camera measurements specified by the RoboCup 3D Soccer Simulator [16]. There are no opponents in the field. The robot trajectories and all sensor measurements were generated according to the physical models described in Sec. II-A. Both the dynamic and observation noise parameters were empirically estimated by comparing the measurements provided by the Soccer 3D simulator and the corresponding ground truth. More specifically, the noise parameters of the camera include not only the intrinsic noise of this sensor, but also the effect of the coordinate conversion from the image frame to the local frame -whose origin is the vertical projection of the center of the robot's torso and is oriented according to its heading.
We simulated a total of N = 5000 discrete time instants, with sampling time T s = 20 ms. Each robot r receives a fixed input sequence {u r,n } n=1:N indicating the desired displacements, given in the local reference frame, for moving around the field. The actual trajectory {x r,n } n=1:N each robot r performs (ground truth) also depends, however, on the particular realization of the state noise sequence {w r,n } n=1:N as described in (3), with covariance matrix Q given by  the coverage area of its camera FoV. It is worth noting that we keep each camera fixed w.r.t. the robot's torso such that its FoV is always pointing to the robot's heading. For the sake of simplicity, all teammates and landmarks within the angular section defined by the camera FoV are detected, independently of their range. Also, those objects are treated as points, thus this model does not consider that one object may be occluded by another.
Note that the particular sequence of landmarks and teammates detected in the camera depends on the ground truth of each robot, which in turn is random due to the presence of state noise. Nevertheless, we defined the noise-free trajectory of each robot over a circle so that their ground truth poses is periodically roughly repeated. The initial self belief of each agent is represented by a set of particles drawn from a distribution N (x r,0 , r,0 ), wherẽ x r,0 is one sample of the distribution N (x r,0 , r,0 ), centered in the ground truth of agent r, x r,0 . The latter belief is also used by the other teammates to initialize their belief about r. For all agents r, we choose the initial covariance matrix as In the prediction step, each robot r uses the input signal u r,n and the covariance matrix Q of the state noise. On the other hand, since each landmark, DOA or ally position measurement can be independently assimilated, we can use different assimilation schemes within the observation step leading to different versions of the proposed filter. Specifically, in order to compare to which degree the cooperation improves the state estimation, we propose four variants of the cooperative filter: 1) PL. Non-cooperative filter in which the observation step consists solely of landmark assimilation, similar to [27]. Steps 1 and 2 of Algorithm 1 are performed. 2) PLH . Steps 1 to 4 are performed. This version extends the PL filter by assimilating DOA measurements. 3) PLA. Steps 1 to 7, excluding 4, are performed. This version extends the PL filter by assimilating detections of allies within the camera's FoV. 4) PLHA. Steps 1 to 7 are performed. All cooperative measurements are assimilated. When needed, the r-th robot extrapolates the beliefs of the other agents with a random walk model assuming maximum speeds of 1.0 m/s for x and y components and 3.0 rad/s for ψ. For the sampling interval T s = 20 ms, x MAX = y MAX = 0.02 m and ψ MAX = 0.06 rad in (36).
At every two time instants, one robot broadcasts its current belief, generating a DOA measurement at the listeners. The communication schedule follows a Round-robin scheme, starting with robot r = 1 at time instant 2 T s , changing to robot r +1 in the next communication slot and restarting from robot 1 after robot R's turn.
Concerning implementation details, given that the sensors have very small covariance parameters, and it is possible that a relatively large number of measurements are available at each agent for jointly assimilation, we artificially increased those covariance values seen by the filter, a technique called covariance inflation. Higher covariance parameters correspond to probability densities with wider peaks and whose value do not fall rapidly to zero. This technique was necessary to avoid that a great number of particles have their weights collapsed to zero due to the very small variance of the sensors. The multiplicative factors were chosen empirically as 100 and 50 to the range and bearing variance of the camera, respectively, and 20 to the DOA variance.
The overconfidence that a robot may have about its own belief is another problem that may affect the assimilation of mutual measurements. It is characterized by the error being much larger than the magnitude of the elements in the diagonal of the covariance matrix of the self-belief. In order to minimize that effect, whenever a robot broadcasts its selfbelief, we enlarged the covariance matrix by adding a term δ = 0.01 to the elements in its diagonal.
Finally, in order to avoid the phenomenon of particle impoverishment of the resulting belief m B r,n (x r,n ), we performed resampling and regularization [21], [22] whenever the effective number of particles, defined as the inverse of the sum of the square of the weights, falls below 0.6N p .

C. PERFORMANCE METRICS
Let P x,y = 1 0 0 0 1 0 and P ψ = 0 0 1 define two projection matrices such that P x,y x r,n and P ψ x r,n represent the 2D robot position and orientation, respectively.
The base metric for evaluating the filter performance is the root mean square error (RMSE) between the ground truth state x r,n and its estimatex r,n .
Detailing each metric, the first one is the estimated position root mean square error (RMSE) versus time e where |L r,n ,m | denotes the cardinality of the set L r,n ,m . Additionally, for each robot r and Monte Carlo run m, we define β r,n ,m as the blackout duration, i.e. the number of consecutive camera observation instants ending at time index n such that the set of observations L r,n,m is empty for all n within the discrete interval {n − 3(β r,n ,m − 1), · · · , n } ⊂ C. In precise mathematical terms, we use the following recursion to compute the blackout duration

Figs. 4 and 5 show respectively e
x,y n and e ψ n , i.e. the averaged position and orientation RMSE evolution over time -considering all robots and Monte Carlo runs -for the four variants of the proposed filter. As expected, the higher degree of cooperation improved the self-localization ability, whereas the non-cooperative PL variant presented larger errors. Among the cooperative filters, it was verified that the PLHA variant tends to present the lowest error, which is expected since it assimilates all measurements available at the network. Moreover, the cooperative PLHA filter shows a faster convergence time, i.e. it shortens the required time to eliminate the initial uncertainty from the prior distribution. During specific time intervals, the PLH and PLA filters, which assimilate only part of the mutual measurements, achieve similar or even slight lower error levels compared to that of the PLHA filter and present similar convergence times -whereas PLA is slightly  slower, PLH has almost the same convergence rate. Finally, we recall that no stationary state is expected in this problem since the number of camera observations depends on the particular realization of the robots' trajectories, and therefore can vary significantly over time.
Switching the perspective to each robot individually, Figs. 6 and 7 display respectively e x,y r and e ψ r , i.e. the position and orientation RMSE -averaged over all time instants and Monte Carlo runs -for the four variants of the filter. It offers a different view of how cooperation impacts the error for each agent. In particular, note that agent 2, which has the worst RMSE for the non-cooperative filter, benefits the most from the cooperation. This is reflected in a smaller variability of the errors among the individual agents, each performing a different trajectory, when they estimate their poses cooperatively. Note that this variability is stronger for the position error. Finally, it is worth noting that, compared  to PLA, the PLH filter further reduces the RMSE variability among agents observed in the non-cooperative PL filter.
We further investigate the RMSE peaks in Figs. 4 and 5 by detailing the RMSE curves of robots 2 and 5. In particular, Figs. 8 and 9 show -for robots 2 and 5, respectively -the position and orientation RMSE versus time of the PL and PLHA filters along with the average number of observed landmarks¯ r,n and the mean blackout durationβ r,n for each time instant.
Robot 5 exhibits very similar error levels for both the position and orientation RMSE when comparing the noncooperative PL filter and the fully cooperative PLHA filter. On the other hand, robot 2 presents much worse errors when comparing the PL and PLHA filters, especially presenting peaks at time instants 35 and 80 seconds. The average number of observed landmarks, which is the only measurement assimilated in the non-cooperative filter, does not explain alone the higher error by the robot 2, since the error peaks occur at time instants where the same number of landmarks are seen by the robot. Additionally, when considering all time instants, both robots 2 and 5 observe a similar average number of landmarks -refer to Figs. 8-c and 9-c.
A characteristic of the signal model of this problem is that a single landmark observation is sufficient to maintain a non-ambiguous estimate of a robot's pose once it has been determined. The continuous absence of measurements, or a TABLE 1. Pearson correlation coefficients between the position and orientation RMSE, e x,y and e ψ , and the mean blackout duration,β, or the average number of landmarks, . The four vectors collect robots 1 to R and time instants n ∈ C ρ . blackout, makes the robot blind about its environment and, if that blackout is long enough, the robot is prone to lose its track due to the dead reckoning. Figs. 8-d and 9-d indicate the mean blackout durationβ r,n experimented respectively by robots r = 2 and r = 5 at each time instant n considering all Monte Carlo runs. In the sequel, we investigate the correlation between the mean blackout evolution and the peaks along the RMSE curves.
First, we define the set C ρ {n ∈ C | n c ≤ n ≤ N }, with cardinality |C ρ | = (N − n c )/3 + 1, containing all camera observation instants starting at n c = 99 to ignore the initial convergence time. 3 Then we define the vectors e x,y 1:R,n ∈C ρ , e ψ 1:R,n ∈C ρ ,β 1:R,n ∈C ρ and¯ 1:R,n ∈C ρ , which collect the position and orientation RMSE, as well as the mean blackout duration and average number of landmarks, from all robots and all camera observation instants in C ρ . We supress their subscripts from now on. We proceed to calculate their Pearson correlation coefficient, using the formula where N s , µ a and µ b are, respectively, the number of samples in the vectors a and b and their sample means. In our case, a = e x,y or e ψ , b =β or¯ , and N s = R |C ρ |.
The results are shown in Table 1. The correlation values between 0.674 and 0.900 indicate that, indeed, there exists a correlation between those quantities. Specifically, the strong positive correlation between the mean blackout and the position / orientation RMSE curves suggests that the absence of camera observations indicated by the former significantly contributes to the error increase of the latter. On the other hand, the weak negative correlation between the RMSE curves and the average number of observed landmarks indicates that our hypothesis is true, i.e. the peaks along the RMSE curves are not related to fluctuations on the average number of assimilated landmarks.
Finally, we assessed the computational cost of the algorithm by measuring the average CPU time. The second column in Table 2 shows the CPU time for each of the four variants of the algorithm, namely PL, PLH , PLA, and PLHA, averaged over all robots, Monte Carlo runs and time instants. The values are presented in normalized units such that the average CPU time for the non-cooperative PL variant, where 3 The floor operator x denotes the largest integer less than or equal to x. players only assimilate landmark measurements, is normalized to one. The normalization is performed since the absolute values obtained using the Matlab R implementation do not properly represent the values that would be obtained by an efficient implementation (using C++, for example) designed to assimilate measurements online under a real world scenario. Moreover, the discussion in Klaas et al. [37] suggests that an implementation of marginal particle filter-type algorithms similar to ours, with N p = N q , might be feasible with complexity O(N p log N p ) rather than O(N 2 p ). On the other hand, previous results from our robotics team reported in [27] show that a non-cooperative Monte Carlo localization implementation running with N p = 200 particles in a similar scenario would have a CPU time of 0.025 ms if run on an Intel R Core TM i7-4720HQ CPU @ 2.60 GHz, a typical setup for the robot soccer simulation. The third column in Table 2 shows the projected CPU time for the proposed algorithms using that value as a time estimate for an alternative, efficient implementation. The fourth column in turn shows the duty cycle, i.e. the average CPU usage per sampling time, of the localization algorithms considering the sampling time T s = 20 ms.
As expected, the average CPU time increases as more available data in the network is assimilated by the individual agents. In particular, as shown in the second column of Table 2, the PLHA variant, where players assimilate directions of arrival of broadcast messages in addition to range and bearing measurements for both landmarks and other allied teammates, required an average CPU time in our simulations roughly 9 times greater than the average CPU time for the PL variant, indicating a trade-off between computational cost and localization performance. The simulations were conducted using Matlab R R2015a, on a 64-bit machine with an Intel R Core TM i9-7900X CPU @ 3.30 GHz and 64 GB RAM. Finally, the estimated duty cycles indicate that efficient implementations of the proposed algorithms are probably suitable for online processing.

VI. CONCLUSION
In this paper, we approached the self-localization problem for multiple robotic agents in a cooperative and distributed way assuming a scenario where communication is limited by external restrictions such that robots cannot exchange information whenever they need to perform cooperation. The factor graph formulation allows a decentralized modeling, which is the only feasible approach for this problem.
The implementation with sequential Monte Carlo allows the numerical evaluation of intractable expressions that do not have a closed-form analytical solution.
Concerning the final algorithm's RMSE, the cooperative approach provides marginal improvements for agents that already present low error levels. On the other hand, it can significantly improve the estimates for agents that present high errors due to a prolonged lack of landmark detections. The main drawback of the proposed implementation of cooperative filters is its high computational cost, which, although feasible in our scenario, can be a restriction in others.
In particular, the computational cost of the implementation, especially when assimilating mutual measurements using the marginal particle filter, is the main limitation of the proposed method. Although our results indicate that our hardware can run the algorithm, that could not be the case in other scenarios where there are either tight restrictions on processing time or hardware specifications, or the number of particles must be increased for higher resolution of the posterior p.d.f.
Future work can be done on proposing distributed approaches that reduce the need for sampling particles of a teammate's distribution. Moreover, the proposed cooperative formulation allows the incorporation of joint localization and tracking of objects, such as the ball or opponents. Other topics are the evaluation of performance bounds for the proposed algorithm, considering the communication restrictions and the topology exhibited by the network being not regular over time. Finally, alternative methods from the literature can be investigated to reduce the complexity of the marginal particles filters from O(N 2 p ) to O(N p log N p ). As a supplementary material, we provide the source code of the simulations 4 and a video 5 with the scenario generation and an animated representation of the results.