Interplay Between Topology and Social Learning Over Weak Graphs

This work examines a distributed learning problem where the agents of a network form their beliefs about certain hypotheses of interest. Each agent collects streaming (private) data and updates continually its belief by means of a <italic>diffusion</italic> strategy, which blends the agent's data with the beliefs of its neighbors. We focus on <italic>weakly-connected</italic> graphs, where the network is partitioned into sending and receiving sub-networks, and we allow for heterogeneous models across the agents. First, we examine <italic>what</italic> agents learn (<italic>social learning</italic>) and provide an analytical characterization for the long-term beliefs at the agents. Among other effects, the analysis predicts when a leader-follower behavior is possible, where some sending agents control the beliefs of the receiving agents by forcing them to choose a particular and possibly fake hypothesis. Second, we consider the dual or reverse learning problem that reveals <italic>how</italic> agents learn: given the beliefs collected at a receiving agent, we would like to discover the influence that any sending sub-network might have exerted on this receiving agent (<italic>topology learning</italic>). An unexpected interplay between social and topology learning emerges: given <inline-formula><tex-math notation="LaTeX">$H$</tex-math></inline-formula> hypotheses and <inline-formula><tex-math notation="LaTeX">$S$</tex-math></inline-formula> sending sub-networks, topology learning can be feasible when <inline-formula><tex-math notation="LaTeX">$H\geq S$</tex-math></inline-formula>. The latter being only a necessary condition, we then examine the feasibility of topology learning for two useful classes of problems. The analysis reveals that a critical element to enable topology learning is a sufficient degree of <italic>diversity</italic> in the statistical models of the sending sub-networks.


Index Terms
Social learning, topology learning, weakly-connected networks, Bayesian update, diffusion strategy.

I. INTRODUCTION
I n a social learning problem, several agents linked through a network topology form their individual opinions about a phenomenon of interest (learning process) by observing the beliefs of their neighboring agents (social interaction) [2]- [8].One relevant paradigm for social learning is that of weakly-connected graphs, which are prevalent in social networks and have been studied in [9]- [11].Under this model, there are two categories of sub-networks: the sending sub-networks, which feed information to the receiving sub-networks without getting back information from them.This scenario is common over social networks.For example, a celebrity may have a large number of followers, whose individual opinions are not necessarily followed by the celebrity.Another example is that of media networks, which promote the emergence of opinions by feeding data to users without paying attention to feedback from them.This work addresses two fundamental challenges arising in the study of social learning problems.One challenge is to understand the fundamental mechanism and implications of specific social learning strategies on opinion formation.In particular, over weak graphs, it is critical to understand how the receiving agents are influenced by the sending sub-networks.It is not difficult to envisage that the network topology plays an important role in determining the opinion formation.This motivates the second problem, which can be regarded as a dual learning problem.Given observation of the receiving agents' behavior, we want to establish whether it is possible to learn the strength of connections (weighted topology) from the sending components to the receiving agents.This second question is useful in identifying the main sources for opinion formation over a network.

A. Related Work
The goal of social learning is to let each individual agent create its own opinion (formally, choose one from among a finite set of hypotheses) through local consultation with its neighbors.One classical distinction for social learning models is Bayesian vs. non-Bayesian models.In the former category, the belief of each agent is obtained by computing posterior distributions through Bayes' rule [2]- [4], [12]- [15].In order to accomplish this task, each agent must have some detailed knowledge of the other agents' likelihoods, and some prior knowledge about the phenomenon of interest.In the latter category, this level of knowledge is not assumed and the agents implement suitable distributed algorithms to interact with their neighbors and to aggregate their beliefs into their own [9]- [11], [16]- [22].The present work considers non-Bayesian learning.
There are many useful implementations for non-Bayesian learning.The implementations differ in terms of the rule the agents adopt to update their beliefs.One first distinction concerns the type of distributed strategy.
A second distinction regards the way the beliefs are combined.In [10], [11], [19], they are combined linearly, while a linear combination of the logarithmic beliefs is used in [21], [22].This latter form is motivated by the fact that, in many detection problems, the best detection statistic is given by the linear combination of log-likelihoods and not of likelihoods.As a matter of fact, using the log-belief combination can help achieve an improved (i.e., faster) learning rate [22].
Once a particular learning rule has been chosen, the behavior of opinion formation will depend heavily on the type of network where the information propagates.In this respect, the majority of prior works in the literature focus on strongly-connected networks.In these networks, there always exist paths linking any two agents in both directions (which makes them connected) and, in addition, at least one agent has a self-loop and places some partial trusts in its own data (which makes them strongly connected).Under a homogeneous setting where the underlying true hypothesis is the same for the entire network, it has been shown that over strongly-connected networks all agents are able to discover and agree on the true hypothesis.This result is available in [11], [19] for diffusion and consensus rules with linear belief combination, and in [22] for the diffusion rule with linear log-belief combination.There are results available also for the heterogeneous setting, where different agents might have different data models, different likelihoods, and promote different opinions across the network.In particular, the diffusion rule with linear log-belief combination with a doubly-stochastic combination matrix is considered in [21], where it is shown that, over a strongly-connected network, all agents reach a common opinion, by minimizing cooperatively the sum of Kullback-Leibler (KL) divergences across the network.
In contrast, the important case of weakly-connected networks has received limited attention and was addressed more recently in [9], [10] by using the linear-belief-combination rule.Several interesting phenomena arise over weak graphs, which are absent from strongly-connected networks.The main difference in this work from [9], [10] is that we now consider the following general setting: i) diffusion-type algorithms with linear combination of log-beliefs (as opposed to the beliefs themselves), ii) heterogeneous data and inference model, and iii) inverse topology learning.

B. Main Results
This work leads to two main contributions.First, we characterize (Theorem 1) the limiting (as learning time elapses) agents' belief through analytical formulas that depend in a transparent manner on inferential descriptors (Kullback-Leibler divergences) and network descriptors (limiting combination matrix power).Some revealing behaviors can be deduced from these formulas.For example, we will be able to characterize a mind-control effect in terms of the interplay between the detection capacity of each agent and the centrality of the different agents.We will see that useful analogies arise with what has been observed in [10].For example, some of the effects shown in [10] will now be shown to hold under greater generality since our formulas can be obtained by October 31, 2019 DRAFT relaxing some assumptions used in [10], in particular, the all-truths-are-equal assumption, and the prevailingsignal assumption.However, we will observe also some remarkable distinctions with respect to [10].For instance, we will observe that the individual belief of each receiving agent will necessarily collapse to (or concentrate at) a single hypothesis (which might be different across the agents).This is in contrast with [10], where the beliefs of the receiving agents could end up assigning some probabilities to more than one hypothesis.The reason for this difference in behavior arises from the difference in the combination rule used in this work in comparison to [10].
The second contribution concerns the topology learning problem.We are interested to learn the topology linking the receiving agents to the sending components.This question is interesting because it would then allow us to identify the main sources of information in a network and how they influence opinion formation.
Nevertheless, the inverse topology problem is challenging because we will assume that we can only observe the beliefs evolving at the receiving agents.In particular, we will be able to recover some macroscopic topology information, in terms of the limiting weights that each receiving agent sees from each sending component.We call this a macroscopic information since these weights incorporate: i) the global effect coming from all agents belonging to a sending component, and ii) the effect of intermediate receiving agents linked to the receiving agent under consideration.The relevance in estimating these global weights relies on the fact that the limiting beliefs of the receiving agents depend solely on this aggregate information.
We will establish conditions under which topology inference becomes feasible.More specifically, given H hypotheses and S sending components, under the assumption of homogeneous statistical models within each sending component, we will ascertain that a necessary condition to achieve consistent topology learning is (Lemma 1): Once established a necessary condition, we will examine some useful models to see whether topology learning can be in fact achieved.We consider first a structured Gaussian model where: i) the true underlying (Gaussian) distributions are distinct across the sending sub-networks; and ii) the (Gaussian) likelihoods are equal across the sending sub-networks, and contain the true distributions.For this setting, we will show in Theorem 2 that topology learning is feasible only when S = 2.We then recognize that one fundamental element for topology learning is the diversity between the sending sub-networks.Adding this further element, we will establish in Theorem 3 that the problem is feasible for any S provided that (1) holds, and even under more general (e.g., non-Gaussian) models.
In summary, we remark that there are two learning problems coexisting in our work: a social learning problem and a topology learning problem.The former is the primary, or direct inferential problem that the agents are deployed for.The latter is the dual, or reverse problem, which is in fact based on observation of the output (the October 31, 2019 DRAFT beliefs) of the primary learning problem.One useful conclusion stemming from our analysis is to reveal some unexpected interplay between these two coexisting learning problems -see Sec.VIII further ahead.
Notation.We use bold font notation for random variables, normal fonts for their realizations.Capital letters are generally used for matrices, whereas small font letters for vectors or scalars.Given a matrix M , the symbol M † denotes its Moore-Penrose pseudoinverse.The L × L identity matrix is denoted by I L .Likewise, the L × 1 vector of all ones is denoted by 1 L .The notation "a.s." signifies "almost-surely".

A. Data Model and Inference
We consider a network of N agents that collect streaming data from the environment.Formally, the random variable ξ k,i ∈ X k denotes the data at agent k ∈ {1, 2, . . ., N } collected at time i ∈ N. The data are assumed to be independent over time, whereas they can be dependent across agents (i.e., over space).
We work under a heterogeneous setting.First, the space where the data are defined, X k , is allowed to vary across agents.For example, the data at different agents can be random vectors of different sizes.In the social learning context, this scenario is not that uncommon since different types of attributes can be observed by different users across the network.Second, the data ξ k,i are assumed to be generated according to certain distributions f k (ξ), which are allowed to vary across the agents as well, namely, for k = 1, 2, . . ., N : This type of heterogeneity can arise in practice for several reasons, for example, some agents may intentionally inject fake data to let the other agents have a bias towards fake hypotheses.
Based on the available data {ξ k,i }, the network agents aim to solve an inferential problem that consists in choosing one state of nature from among a finite collection Θ = {1, 2, . . ., H}, with H denoting the number of possible hypotheses.To solve such inferential problem, the agents rely on a family of likelihood functions.
Specifically, for k = 1, 2, . . ., N , the likelihood function of agent k is denoted by: We will often write L k (θ) instead of L k (ξ|θ) for simplicity.In our treatment, we assume that the data can be modeled either as continuous or discrete random variables, with the same (continuous or discrete) nature across all agents.Accordingly, both the true distributions and the likelihoods will be either probability density or probability mass functions, depending on the considered type of random variables.
We remark that the considered model includes the possibility that the true distribution f k (ξ) is equal to a certain likelihood L k (ξ|θ 0 ) (as assumed, e.g., in [10], [22]) and in this case θ 0 can be considered as the true underlying hypothesis.More generally (e.g., in [21]) the true distribution f k (ξ) is not equal to any of the likelihoods L k (ξ|θ), which might happen when the agents have an approximate knowledge of the statistical October 31, 2019 DRAFT models, and even the likelihood more similar to the true distribution can be a mismatched version thereof.
In order to quantify the dissimilarity between the true distribution and a certain likelihood, we will use the Kullback-Leibler (KL) divergence [23]: We remark that we have written ξ k in place of ξ k,i to highlight identical distributions across time, and that the expectation is computed under the actual distribution of ξ k , i.e., under f k .In the forthcoming treatment we use the following regularity condition [21], [22].
Assumption 1 (Finiteness of KL Divergences): All the KL divergences in (4) are well-posed, namely, for all k = 1, 2, . . ., N , and all θ ∈ Θ we have that: Remark 1 (Likelihood Support): Assumption 1 implies that the likelihood L k (ξ k,i |θ) can be equal to zero only for an ensemble of realizations ξ k,i having zero probability under the distribution f k .

B. Social Learning Algorithm
Motivated by the diffusion strategy in [10], [11] for opinion formation over social networks, in this article we consider the useful variations studied in [21], [22].The learning procedure employs a two-step algorithm that can be described as follows.For each admissible hypothesis θ ∈ Θ at time i, each agent k uses its own fresh private observation, ξ k,i , to compute the local likelihood L k (ξ k,i |θ).Using this likelihood, agent k updates its local belief, µ k,i−1 (θ), obtaining an intermediate belief ψ k,i (θ) through the following update rule: Then, agent k aggregates the intermediate beliefs received from its neighbors through the following combination rule (the division by the denominator term in ( 7) is meant to ensure that µ k,i (θ) is a probability measure with its entries adding up to one): where a k is the nonnegative combination weight that agent k uses to scale the intermediate log-belief received from agent .It is assumed that a k is equal to zero if k does not receive information from , which means that agent k can combine only intermediate beliefs received from its neighbors.When collected into a combination matrix A (with ( , k)-entry equal to a k ), these combination weights are assumed to obey the standard requirements that make A a left-stochastic matrix, namely, we have that: The value of µ k,i (θ) provides an estimate for the likelihood by agent k at time i that the true hypothesis value is θ.We remark that, differently from [10], the second step in (7) combines linearly the logarithm of the intermediate beliefs, log ψ ,i (θ), in the neighborhood of agent k.Exponentiation and normalization are used to construct the final belief.
We invoke the following standard initial condition, motivated by the fact that, at time i = 0, the agents have no elements to discard any hypothesis [21], [22].

III. WEAK GRAPHS
In this section, we consider the case of a weak graph (or weakly-connected network), which is defined as follows [9], [10].The overall network N = {1, 2, . . .N } is divided into S +R disjoint components -see Fig. 1.
The first S sub-networks form the sending part S, whereas the remaining R sub-networks form the receiving part R: We remark that S and R denote the number of sending and receiving components, respectively.They do not denote the cardinalities of S and R, which are instead given by: where the notation N j denotes the number of agents in sub-network N j .Communication from a sending component to a receiving component is permitted, whereas communication in the reverse direction is forbidden.
Communication between sending agents is possible, but a sending sub-network is identified by the following two conditions: i) each sending sub-network is strongly connected [24]; and ii) agents belonging to different

N1
< l a t e x i t s h a 1 _ b a s e 6 4 = " n R F T 0 6 p U r N W j / D s 2 A w E J j / A E z 9 a D 9 W q 9 W R / T a M a a z R z C H K z P X / 1 u o u 8 = < / l a t e x i t > Fig. 1.One example of weakly-connected network, with sending sub-networks N1 and N2, and receiving sub-networks N3 and N4.
R sub-networks being allowed.In particular, we assume that each receiving sub-network is connected to at least one agent in each sending sub-network.
Without loss of generality, we assume that the network nodes across the S + R components are listed in increasing order.According to the above description, the combination matrix corresponding to a weaklyconnected network admits the following convenient block decomposition [9], [10]: where the matrix A S = blockdiag {A N1 , A N2 , . . ., A NS } contains the combination weights within the sending sub-networks, and has a block-diagonal form since communication between sending sub-networks is forbidden.
The matrix block A SR contains the combination weights for the communication that takes place from sending agents to receiving agents.The left-bottom matrix block is zero since there are no direct links from the receiving agents to the sending ones.Finally, the matrix block A R contains the combination weights ruling the communication among the receiving agents.Figure 1 offers a graphical illustration of the weakly-connected paradigm.
It was shown in [9] that the limiting combination matrix power has the following structure: where October 31, 2019 DRAFT is a block diagonal matrix that stacks the N s × 1 Perron eigenvectors p (s) associated with the s-th sending sub-network 1 , and where We denote the entries of Ω by [ω k ] and we keep indexing the columns of the |S| × |R| matrix Ω with an index Since the limiting matrix power is left-stochastic and has a zero bottom block, the limiting weights ω k obey: i.e., Ω is left-stochastic.From (15) we can also write: whence we see that ω k embodies the sum of influences over all paths from sending agent to receiving agent k.

IV. LIMITING BELIEFS OF RECEIVING AGENTS
Let us momentarily consider a single-agent scenario where agent operates alone.A natural way for agent to choose a hypothesis would be to choose the θ that gives the best match between a model L (θ) and the distribution of the observed data, f .One measure of the match between f and L (θ) is the KL divergence The smaller the value of this divergence is, the higher the match between the data and the model.
For this reason, a strategy could be that of choosing the θ that minimizes the divergence In the social learning context, this optimization problem turns into a distributed optimization problem.In particular, under our social learning setting over weak graphs, we will show soon (Theorem 1) that the logbelief diffusion strategy in ( 6)-( 7) will end up minimizing (without knowing the true distributions) the following average divergence at receiving agent k ∈ R: which is a weighted combination, through the limiting combination weights {ω k }, of the KL divergences of the sending agents reaching k.The role of average divergence measures like the one in (19) already arose in the case of strongly-connected networks.For example, it was shown in [21], [22] that with the log-belief diffusion 1 For s = 1, 2, . . ., S, the Perron eigenvector of the sub-matrix A Ns corresponding to the s-th sending sub-network is given by: strategy in ( 6)-( 7), each agent ends up minimizing the same weighted combination of divergences.Under classical identifiability conditions, such minimization leads each individual agent to discover the true underlying hypothesis [22] or the best available approximation thereof [21].In our weak-graph setting, however, the effect of minimizing D k (θ) (which depends on the particular receiving agent k) will be less obvious.We already see from (19) that the average divergence combines topological attributes, encoded in the limiting combination weights, with inferential attributes, encoded in the local KL divergences.The interplay arising between the network topology and social learning will be critical in determining the choices of the receiving agents.
Throughout the work, we will invoke the following classical identifiability assumption that, as we will see in our examples, arises naturally in several models of interest.
Assumption 3 (Unique Minimizer): For each k = 1, 2, . . ., N , the function D k (θ) has a unique minimizer: We are now ready to characterize the limiting belief of the receiving agents.The following theorem is an extension to the case of weakly-connected graphs of similar theorems proved in [21], [22] for the case of strongly-connected graphs.
Theorem 1 (Belief Collapse at Receiving Agents): Let k ∈ R.Under Assumptions 1-3 we have that: where the symbol a.s.
= denotes that the pertinent limit exists almost-surely.Moreover, for all θ = θ k , the convergence of the belief to zero takes place at an exponential rate as: Proof: The proof combines the techniques to establish the convergence of the social learning algorithm used, e.g., in [21], [22] for strongly-connected graphs, with the convergence results of the combination matrix over weak graphs used in [10], [11].The detailed steps are reported in Appendix A.
Several insightful conclusions arise from Theorem 1.

Remark 3 (Collapse):
The limiting belief of each receiving agent is always degenerate, meaning that it collapses to a single hypothesis, when sufficient time for learning is allowed.
Remark 4 (Discord): Different agents can in principle be in discord, since they can converge to different hypotheses.The particular behavior (who chooses what) will depend on a weighted combination of KL divergences.

Remark 5 (Mind Control):
We see from (19) and ( 20) that only the local divergences corresponding to the sending agents, ∈ S, determine the value of D k (θ) and, hence, of θ k .Therefore, the limiting hypothesis θ k at agent k is determined by the KL divergences pertaining only to the statistical models within the sending subnetworks, and, hence, irrespective of the data sensed at agent k within its receiving sub-network.In a nutshell, we see the emergence of a mind-control effect: i) the final states of the receiving agents are dependent only upon the properties of the detection problems at the sending agents; and ii) different network topologies allow the sending agents to drive the receiving agents to potentially different decisions.The emergence of a mind-control effect over weakly-connected networks was already discovered in [9], [10].Here, we establish a similar effect albeit one where the receiving agents attain degenerate beliefs.In comparison, in [9], [10], receiving agents end up assigning nonzero probabilities to more than one belief.
Remark 6 (Distinctions relative to [10]): Besides these commonalities, there are nevertheless important distinctions between the behavior observed in our setting and what was observed in [10].First, for the linearbelief-combination algorithm used in [10], the limiting belief of a receiving agent was shown to be a convex combination of the limiting beliefs of the sending agents, with the convex weights coming from the matrix W in (15).This means that if two sending sub-networks have, e.g., limiting beliefs that collapse to different hypotheses, then the limiting belief of a receiving agent can have nonzero values a these two different locations.In comparison, in the log-belief-combination algorithm considered here the limiting beliefs are always concentrated at a single hypothesis.
Moreover, in [10], the analysis required some regularity assumptions called all-truths-are-equal and prevailingsignal assumptions.These assumptions are not required in Theorem 1.In a sense, the lack of these assumptions ascertains that some relevant effects, such as mind control, hold under greater generality and more relaxed settings.
Finally, it is useful to observe that one fundamental role in our setting is played by the weighting matrix Ω = EW , while in [10] the main role was played by W alone.

A. Canonical Examples
In order to examine in more detail the implications of Theorem 1, we consider a simple yet insightful example.
The sending and receiving components are: namely, we have two sending sub-networks, N 1 and N 2 , and one receiving sub-network For what concerns the inferential model, we assume there are three possible hypotheses, θ ∈ {1, 2, 3}.The likelihood functions are the same across all agents.In particular, we assume that, for all ξ ∈ R, and for θ ∈ {1, 2, 3}: where the means corresponding to the different hypotheses are chosen as, for some ∆ > 0: We further assume that the true distributions of the sending sub-networks (recall that only the sending subnetworks determine the limiting beliefs of the receiving sub-network) are Gaussian distributions, with expectations chosen among the expectations in (25).In particular, we assume that agents belonging to sub-network N 1 generate data according to model θ = 1, i.e., with expectation equal to −∆, whereas agents belonging to sub-network N 2 generate data according to model θ = 3, i.e., with expectation equal to +∆.Formally we write: Recalling that the KL divergence between two unit-variance Gaussian distributions of expectations a and b is given by 0.5(a − b) 2 , under the setting described above we can write, for all k ∈ R: which further implies: where, in the intermediate equality, we used (17).As a result, we can compute the limiting hypothesis, for each k ∈ R, as: From (18), one can argue that ∈Ns ω k reflects the sum of influences over all paths connecting all sending agents in sub-network s to receiving agent k.
In order to find the minimizer in (30), we start by using ( 17) in (30), which yields: In view of Theorem 1, the belief of the k-th receiving agent will converge to θ k = 1 if the following two conditions are simultaneously verified: Taking the most stringent condition in (32) reveals that: In summary, we conclude that agent k follows the opinion promoted by sending sub-network N 1 if the influence of sub-network N 1 on agent k is "sufficiently large".
The situation is reversed if the influence of sub-network N 2 is sufficiently large, namely, where we recall that hypothesis θ = 3 is promoted by sub-network N 2 .However, there is another possibility.It occurs when: In this case, no clear dominance from one sub-network can be ascertained, and each receiving agent will choose θ k =2, i.e., an opinion that does not coincide with any of the opinions promoted by the sending sub-networks.
From (33) and (34), we see that the dominance of one of the sending sub-networks is determined by the aggregate influence ∈N1 ω k , with the complementary aggregate influence being ∈N2 ω k = 1− ∈N1 ω k .The main way to manipulate these factors consists in varying the sizes of the sending sub-networks or their connections with the receiving agents.
In order to illustrate more carefully the possible scenarios, we consider the following simulation framework: • The strongly-connected sending components N 1 and N 2 are generated as Erdős-Rényi random graphs with connection probability q, and the entries of the corresponding combination matrix are determined by the averaging rule, namely, 2 where n k is the number of neighbors of node k (including node k itself).In our experiments we set q = 0.7.
• An agent k is connected to a sending agent through a Bernoulli distribution with parameter π s , which depends on the sending sub-network s.Given the total number d k , of directed edges from sending agents to agent k, we initially set a k = 1/d k .The combination matrix A of the overall network is normalized so that it is left-stochastic.
It is now possible to examine different scenarios by manipulating the size of the sending sub-networks as well as the send-receive connection probabilities π s .
network graph -Setup 1 or "How majorities build a majority".In Fig. 2, we set π 1 = π 2 = 0.5, i.e. it is equally probable that a receiving agent connects to any sending agent, irrespective of the sending sub-network.In view of this uniformity, we can expect that the limiting weights ω k are sufficiently uniform across the two sending subnetworks and, hence, that the value of ∈N1 ω k is primarily determined by the sub-network size N 1 .In the example we are going to illustrate, we assume that the number of agents in sub-network N 1 is three times larger than the size of sub-network N 2 .From the lowermost panel in Fig. 2, we observe that receiving agents Fig. 3. How filter bubbles build a majority: Convergence of beliefs when the connectivity from sending sub-network N1 is dominant.(18,19,20) converge to θ = 1, i.e., to the opinion promoted by N 1 .We see also that agent 17 takes a minority position and opts for θ = 2, i.e., it does follow neither the opinion promoted by N 1 nor by N 2 .This shows the following interesting effect.Even if sub-network N 1 is bigger, for the specific topology shown in the example (see uppermost panel of Fig. 2), the aggregate weight of agent 17 is ∈N1 ω 17 = 0.645.This means that condition (35) is actually verified, which explains why agent 17 opts for θ = 2.In summary, we observed that building a majority of agents in N 1 relative to N 2 yields a majority of receiving agents opting for the hypothesis promoted by N 1 .
-Setup 2 or "How filter bubbles build a majority".Under this setup, we assume that both sending components have the same size, however π s is different for each of the two components.We set π 1 = 0.9 and π 2 = 0.1 in order to motivate agent k to have more connections with sub-network N 1 than with N 2 .This scenario is considered in Fig. 3, where we see that all agents end up agreeing with opinion θ = 1, i.e., with the opinion promoted by the sending component N 1 .Therefore, closing a receiving agent into the "filter bubble" determined network graph by the overwhelming flow of data coming from N 1 essentially makes these agents blind to the solicitations coming from N 2 .We notice that, while in the example of Fig. 2 one receiving agent behaves differently from the majority of agents, for the specific parameters used in Fig. 3 all agents opt for the same hypothesis.However, another type of distinction arises in terms of convergence rate.We observe that agent 10 is reluctant to the received solicitations, since for the first half of the observation window the preferred hypothesis is θ 2 , and the convergence to θ 1 is significantly slower than the convergence of the other agents.
-Setup 3 or "Truth is somewhere in between".We now address the balanced case where the sending subnetworks have the same size and similar number of connections to the receiving sub-network (π 1 = π 2 = 0.5).
Under this setting, it is expected that no dominant behavior emerges, and (35) holds.We see in Fig. 4 that the receiving agents' opinions tend to converge with full confidence to hypothesis θ = 2 (m θ = 0), which is an opinion pushed by none of the sending agents.How can we explain this effect?One interpretation is that, in the presence of conflicting suggestions coming from the two sub-networks, the receiving agent opts for a conservative choice.If sending sub-network N 1 says "choose −∆", while sending sub-network N 2 says "choose +∆", then the receiving agent prefers to be agnostic and stays in the middle, i.e., it chooses 0. Referring to real-life situations, we can think of one person betting on a soccer match between teams A and B. Assuming that discordant solicitations come from the environment, i.e., the person receives data suggesting to bet on the victory of team A, as well as data suggesting to bet on the victory of team B. If there is no sufficient evidence to let one suggestion prevail, then the most probable choice would be betting on a draw!This "truth-is-somewherein-between" effect is a remarkable effect that is peculiar to the weakly-connected setting, and that has been not observed before, e.g., it was not present in [10].
In summary, it is the cumulative influence of a sending group over a receiving agent that determines whether it will follow the group's opinion or not.This situation emulates the social phenomenon of herd behavior: agents choose to ignore their private signal in order to follow the most influencing group of agents.When none of the above dominance situations occurs, the receiving agent can opt for an opinion that is not promoted by any of the sending agents.

V. TOPOLOGY LEARNING
In the previous section we examined the effect of the network topology on the social learning of the agents.In particular, we discovered how the topology and the states of the sending agents determine the opinion formation by the receiving agents.The way the information is delivered across the network ultimately determines the minimizers in (20), i.e., the value that each receiving agent's belief will converge to.We now examine the reverse problem.Assume we observe the belief evolution of part of the network.We would like to use this information to infer the underlying influences and topology.This is a useful question to consider because understanding the topology can help us understand why a particular agent adopts a certain opinion.The main question we consider now is this: given some measurements collected at the receiving agents, can we estimate their connections to the sending sub-networks?
We shall answer this question under the following assumption of homogeneity of likelihoods and true distributions inside the individual sending sub-networks.
Assumption 4 (Homogeneity within sending sub-networks): For s = 1, 2, . . ., S, we assume that the distribution and the likelihood functions within the s-th sending sub-network are equal across all agents in that sub-network, namely, for all ∈ N s : One main consequence of Assumption 4 is that (19) becomes: where N s denotes the collection of agents in the s-th sending sub-network.Equation (38) has the following relevant implication.Under Assumption 4, the network topology influences the average divergence D k (θ) only through an aggregate weight: The latter equality, using w k instead of ω k , comes straightforwardly from ( 13) and ( 15).This equality reveals that the aggregate weights depend solely on the matrix W , and not on the matrix E of Perron eigenvectors.In other words, the inner structure of the pertinent sending sub-network s does not influence the aggregate weight x sk .We notice that, while a combination weight a k accounts for a local, small-scale pairwise interaction between agent and agent k, the aggregate weight x sk accounts for macroscopic topology effects, for two reasons.First of all, x sk is determined by the limiting weights ω k , which embody not only direct connection effects between and k, but also effects mediated by multi-hop paths connecting and k.Second, from (39) we see that x sk embodies the global effect coming from all agents belonging to the s-th sending component.In other words, x sk is a measure of the effect from all agents in sending sub-network s on agent k.Since, in view of Theorem 1, the average divergence determines the behavior of the limiting belief, we conclude from (38) that the network topology ultimately influences the particular hypothesis chosen by a receiving agent only through these global weights {x sk }.
We assume that the data available for estimating x sk are the shared (intermediate) beliefs, ψ k,i (θ).We will say that consistent topology learning is achievable if the x sk can be correctly guessed when sufficient time is given for learning, i.e., we will focus on the limiting data, for all θ = θ k :3 Accordingly, the topology inference problem we are interested in can be formally stated as follows.For any receiving agent k, introduce its global-weight vector: and consider the vector stacking the H limiting beliefs y k (θ) (i.e., the data): < l a t e x i t s h a 1 _ b a s e 6 4 = " n R F T 0 6 p U r N W j / D s 2      The main question is whether we can estimate x k consistently from observation of y k .In the sequel we will sometimes refer to this problem as a macroscopic topology inference problem -see Fig. 5 for an illustration.
As compared to other topology inference problems, we are faced here with one critical element of novelty.
We have no data coming from the sending agents.This means that correlation between sending and receiving agent pairs cannot be performed.This is in sharp contrast with traditional topology inference problems, where the estimation of connections between pairs of agents is heavily based on comparison (e.g., correlation) between data streams coming from these pairs of agents [25]- [27].In contrast, we focus here on the asymmetrical case that, when estimating the weights x sk from sending to receiving agents, no data are available from the sending agents.For this reason, the topology learning problem addressed in this work is significantly different from other traditional topology problems studied in the literature.

VI. IS MACROSCOPIC TOPOLOGY LEARNING FEASIBLE?
We now examine the feasibility of the topology learning problem illustrated in the previous section.
Let us preliminarily introduce a matrix D = [d θs ], which collects the H × S divergences between any true distribution in the sending sub-networks and any likelihood, and whose (θ, s)-th entry is: Using ( 41) and ( 43) in (38), the network divergence of receiving agent k, evaluated at θ, can be written as: Through (42) we can rewrite the limiting data in (40) as: It is useful to introduce the matrix: where e m is an H × 1 vector with all zeros and a one in the m-th position.It is important to note that B k has its θ k -th row equal to zero.We can now formulate the topology problem in terms of the following constrained system: where we remark that the notation x k > 0 signifies that all entries in the solution vector x k must be strictly positive.This positivity constraint is enforced because by assumption, each receiving sub-network is connected to at least one agent from each sending sub-network, which implies that the true vector we are looking for, x k , has all positive entries.The equality constraint in (47) can be readily included in matrix form by introducing the augmented matrix and vector: which allow rewriting (47) as: We are now ready to state formally the concept of feasibility for the topology learning problem.First, we want to solve the problem under the assumption that the matrix of divergences, D, is known, i.e., that sufficient knowledge is available about the underlying statistical models (likelihoods and true distributions).In this respect, we remark that the matrix B k in (46) depends on θ k , which in turn depends on the unknowns x sk as well through (20).However, from Theorem 1 we know that the beliefs (and also the intermediate beliefs) converge to 1 at θ k .Therefore, we can safely estimate θ k from the limiting data y k (θ), which is tantamount to assuming that the matrix B k is known.
Therefore, achievability of a consistent solution for the topology learning problem translates into the condition that the linear system in (49) should admit a unique solution.We will now prove the following result.

Lemma 1 (Necessary Condition for Macroscopic Topology Learning):
The topology learning problem described by the system in (49) admits a unique solution if, and only if: Thus, a necessary condition for topology learning is that the number of hypotheses is at least equal to the number of sending sub-networks, namely, that: Proof: We remark that we are not concerned with the existence of a solution for the constrained linear system (49).In fact, this system admits at least a solution, namely, the true weight vector, x k ∈ R S + , which by assumption fulfills the equation Let us now focus on the unconstrained system (i.e., the system in (49) without the inequality constraints), whose set of solutions is given by [29]: where z ∈ R S is an arbitrary vector, and which implies that the second term on the RHS in ( 52) is zero, which in turn implies that the unconstrained system has the unique solution: The latter equality holds because, if the unconstrained system has a unique solution, this is also the unique solution for the constrained system, i.e., it coincides with x k and satisfies the positivity constraints.Accordingly, we have proved that whenever rank(C k ) = S, the constrained system has the unique solution corresponding to the true vector x k .
We now show that when rank(C k ) < S the constrained system has infinite solutions.Since any solution of the unconstrained system takes on the form (52), and since x k is a particular solution, there will exist a certain vector z 0 such that the x k can be written as: Consider a solution x k in (52) that corresponds to another vector, z = z 0 + , where is a perturbation vector: Since by assumption x k > 0, we conclude from (55) that for sufficiently small perturbations it is always possible to obtain a distinct x k > 0, which implies that the constrained system in (49) has infinite solutions.
In summary, we conclude that the topology learning problem is feasible if, and only if, rank(C k ) = S.
Finally, by observing that the augmented matrix C k is an (H + 1) × S matrix with an all-zeros row, we have in fact proved the claim of the lemma.
Lemma 1 has at least three useful implications.First, it reveals a fundamental interplay between social learning and topology learning: the possibility of estimating x k depends on the comparison between two seemingly unrelated quantities, the number of hypotheses H (an attribute of the social inferential problem) and the number of sending sub-networks S (an attribute of the network topology).
Second, the necessary condition in (51) highlights that topology learning over social networks is challenging.
For example, if the agents of the social network want to solve a binary detection problem (H = 2), then the October 31, 2019 DRAFT maximum number of sending sub-networks that could allow faithful topology estimation is S = 2. Increasing the complexity of the social learning problem (i.e., increasing H) is beneficial to topology estimation, since it allows to increase also S.
Third, we see that having more sending sub-networks makes topology learning more complicated.This is because increasing the number of sending sub-networks increases the number of unknowns (i.e., the dimension of x k ), while not adding information since in our setting we are not allowed to probe the sending nodes.
Remarkably, when examining jointly the social learning and the topology learning problems, the role of the data and of the unknowns is exchanged.In the social learning problem, more hypotheses means more unknowns and more sending sub-networks means more data; in the topology learning problem, the situation is exactly reversed.

A. Structured Gaussian Models
In this section we consider the practical case of a Gaussian model, defined as follows.
• These likelihoods are unit-variance Gaussian likelihoods with different means {m θ }.
• Each true distribution coincides with one of the likelihoods.This implies that the distribution of the s-th sending sub-network, f (s) , is a unit-variance Gaussian distribution with mean ν s that is chosen among the means {m θ }, namely, for s = 1, 2, . . ., S: • The sending sub-networks have different means.
Using (43) and the definition of KL divergence between Gaussian distributions, the matrix D is given by: From (57) it is readily seen that, if the sending sub-networks share the same true distribution (i.e., if , then the matrix D has rank 1, and, hence, the topology learning problem is obviously not feasible.
As said, we will instead focus on the opposite case where the true expectations are all distinct.
For ease of presentation, and without loss of generality we can assume that the sending sub-networks are numbered so that the expectations of the true distributions are: which implies that (57) takes on the form: The structure in (59) implies that, for H = S, the matrix D is a Euclidean distance matrix (but for the constant 1/2) [28].These matrices are constructed as follows.Given points r 1 , r 2 , . . ., r L , belonging to R dim , the (i, j)-th entry of the matrix EDM(r 1 , r 2 , . . ., r L ) is given by the squared Euclidean distance between points r i and r j .
Accordingly, we see from (59) that, for H = S: For H > S, the matrix D can be described as an extended Euclidean distance matrix, constructed as follows. Let: and let F be the (H − S) × S matrix with entries, for θ = S + 1, S + 2, . . ., H and s = 1, 2, . . ., S: Then, we have the following representation: The following theorem, which establishes the feasibility of the topology learning problem for the considered Gaussian model, relies heavily on some fundamental properties of Euclidean distance matrices.
Theorem 2 (Macroscopic Topology Learning under Structured Gaussian Models): Let S ≥ 2 and H ≥ S.
Assume that all sending sub-networks have the same family of unit-variance Gaussian likelihood functions L(θ) with distinct means {m θ }, for θ = 1, 2, . . ., H. Assume that the true distributions f (s) , within the sending sub-networks s = 1, 2, . . ., S, are unit-variance Gaussian with distinct means ν s , chosen from the collection {m θ }.Then, under Assumption 3 (so that the matrix B k in (46) is well defined), for all receiving agents k ∈ R we have that: Proof: The proof is reported in Appendix B.
Remark 7 (Topology Learning under Structured Gaussian Models is Challenging): In view of Lemma 1, Eq. (64) has the following implication.Under the considered Gaussian model, topology learning is feasible only when S = 2.We remark also that, when S = 2, condition (51) plays no role, since any meaningful classification problem has at least H = 2.In summary, Theorem 2 reveals that the structure of the Gaussian model makes topology learning very challenging, as this problem is not solvable for networks with more than 2 sending sub-networks.Thus, the theorem reveals that H ≥ S is not a sufficient condition for consistent topology learning.

B. Diversity Models
We can now examine the effect that diversity in the models of the sending sub-networks can have on topology learning.Since the limiting beliefs are essentially determined by the divergence matrix D, it is meaningful to impose a form of diversity in terms of the divergences between distributions and likelihoods.In other words, differently from the Gaussian case illustrated in the previous section, we now require that the entries of D are not tightly related to each other, namely, we allow them to assume values in R H×S + (where we denote by R + the nonnegative reals) with no strong structure linking them.
One typical model for this type of diversity is that the divergences perceived by the different agents (i.e., across index s), and corresponding to different hypotheses (i.e., across index h), are modeled as absolutely continuous random variables.This randomness is a formal way to embody some degree of variability in how the agents "see" the world.For example, this is a useful model to consider when the agents, due to imperfect knowledge, have likelihoods that are slightly perturbed versions of some nominal model.Examples of this type are illustrated in the next section.
In order to avoid confusion, it is important to remark one fundamental property.Under the diversity setting, the matrix D is random 4 with entries modeled as absolutely continuous random variables.The full-rank property for this type of matrices is a classical result.However, we observe from (46) that the matrix B k is obtained from D by multiplying a matrix that depends on a random variable θ k , which in turn depends statistically upon the entries of D. Finally, we know from (48) that C k is obtained from B k by adding an all-ones row.Accordingly, to determine the rank of C k we need to address carefully these intricate dependencies.This is accomplished in the proof of the forthcoming Theorem 3. Next, we address the topology learning problem.First, for an observation time i, we construct the empirical data y k (θ) = (1/i) log ψ k,i (θ), and construct an estimate θ k as the value of θ that maximizes y k (θ) (i.e., the hypothesis where y k (θ) will collapse to 1).We can then construct an estimate for B k as: from which we obtain C k by adding an all-ones row, according to (48).At this point, we have verified on the simulated data that, for any receiving agent k ∈ {9, 10, 11, 12}, the matrices C k are full column rank.Then, we used (53) with empirical matrices replacing the exact ones to estimate the connection-weight vector x k as:6 We see from Fig. 6 (right) that this procedure allows us to retrieve the topology coefficients {x sk }, provided that the system evolves for a sufficiently long time.different means of the likelihoods.Moreover, we see from the parameters of the random variables { θs } that a relatively small perturbation is already sufficient to enable consistent topology learning.
c) Beta with H = S = 3.Finally, we consider a non-Gaussian example.The network topology is the same as in the last example.However, now the likelihood functions follow a Beta distribution with scale parameter equal to 2 and with shape parameters given by θ + 1 + u θs , where {u θs }, for θ ∈ {1, 2, 3} and s ∈ {1, 2, 3}, are independent random variables sampled from a uniform distribution with support [−0.1, 0.1].The true distributions coincide with the unperturbed likelihoods, i.e., the true distribution of the s-th sending sub-network is a Beta distribution with scale parameter equal to 2 and shape parameter equal to s + 1.For the receiving sub-network we apply the same type of random perturbation of the likelihoods, whereas the true distributions are Beta with scale and shape parameters equal to 2. The belief convergence for the receiving agents can be seen in the middle group of panels of Fig. 8.In the rightmost group of panels, we see the convergence of the topology estimates.

VIII. SOCIAL LEARNING VS. TOPOLOGY LEARNING
In this work we have considered two learning problems.The first problem is the Social Learning (SL) problem, which is the goal of the agents in the network.These agents aim at forming their opinions after consulting the beliefs of their neighbors through an iterative update-and-combine SL algorithm.The second problem is the Topology Learning (TL) problem, where a receiving agent (or some entity monitoring its behavior) attempts to get knowledge about the connections between that receiving agent and the sending sub-networks.We can refer to the SL problem as the direct learning problem, in the sense that it is the original inferential problem the network is deployed for.Likewise, we can refer to the TL problem as the dual learning problem, since it is an inferential procedure that takes as input data the output of the direct TL problem.
The analysis conducted in this work has revealed some interesting interplay between SL and TL problems.
Let us make a summary of the main results.We recall that S denotes the number of sending sub-networks, and H the number of hypotheses.First, we established in Lemma 1 that H ≥ S is a necessary condition to achieve consistent TL.This condition has a remarkable interpretation.In a sense, the number of hypotheses is an index (even if not the only one) of complexity associated to the SL problem since, other conditions being equal, more hypotheses make the SL problem more complicated.Likewise, the number of sending components represents an index of complexity of the TL problem, since, other conditions being equal, estimating more links is more complicated.According to these remarks, the condition H ≥ S implies that the TL problem can be feasible when its complexity is not greater than the complexity of the SL problem.Such an interplay appears to be not obvious at all.As a matter of fact, in the traditional topology inference problems, the connections between agents are inferred from some kind of pairwise measure of their dependency.In our setting, since we cannot measure the output of the sending sub-network, we cannot get direct data quantifying dependency between a receiving and a sending agent.Our TL inference is based instead on the belief functions.The belief function contains some richness of information in that it is evaluated for the H different values of θ.This richness (i.e., H) is critical to enable feasibility of the TL problem.In particular, H ≥ S means that the richness of information in the belief function should be greater than or equal to the number of unknown topology weights to be estimated, S.
Having established a necessary condition for consistent TL, we moved on to examine some useful models to see whether and when consistent TL is in fact achievable.First, we have considered a structured Gaussian model where all sending sub-networks use the same family of Gaussian likelihoods, and the sending sub-networks have distinct true distributions, each one coinciding with one of the likelihoods.We have shown in Theorem 2 that the TL problem is feasible only if S = 2, for any H ≥ 2. The limited possibility of achieving consistent TL can be ascribed to the limited diversity existing between the different sub-networks (which all use the same family of likelihoods).This observation motivated the analysis of more general models with a certain degree of diversity, a condition formalized by saying that the KL divergences between true distributions and likelihoods are not structured, i.e., they are nonnegative real numbers with no particular relationship among them.Under this setting we have ascertained that, if H ≥ S, the TL problem becomes feasible for almost all configurations, in a precise mathematical sense as stated in Theorem 3. In summary, two critical features that enable consistent TL are: more hypotheses than sending components and a sufficient degree of diversity.

APPENDIX A PROOF OF THEOREM 1
Exploiting (6) we can write, for θ, θ ∈ Θ: Using (7) we have: By iterating over i, we can write: Under Assumptions 1-2, thanks to the integrability of the log-ratios between the true distributions and the likelihoods implied by ( 5), through standard limiting arguments (see, e.g., [21], [22]) it is possible to determine the asymptotic behavior of (70) by: i) replacing the powers of matrix A with their limit A ∞ in (12); and ii) applying the strong law of large numbers to conclude that: where in the second-last equality we used (4), and we performed the replacement [A ∞ ] k = ω k , which holds in view of the block representation in (12) since k is a receiving agent and since we adopt the indexing in (16).Now, in light of Assumption 3, we conclude that: for all θ = θ k .Since the denominator of µk,i(θ) µk,i(θ k ) is bounded by 1, Eq. (72) implies that the numerator µ k,i (θ) is converging to zero.Since the belief function must sum to 1, the result in (21) holds.

APPENDIX B PROOF OF THEOREM 2
Preliminarily, it is useful to introduce some auxiliary matrices.We let, for all θ = 1, 2, . . ., H: and In view of Eqs. ( 46) and (48), the definitions in (73) and (74) imply: We continue by showing some useful properties of the matrix D under the considered Gaussian model.Let us focus on the representation in (63).It is a known result that the rank of a Euclidean distance matrix with n points in R dim is at most dim + 2 [28].Since in our case dim = 1, we, can write: Moreover, for the cases S = 2 and S = 3 we have that: and, hence: Therefore, when the points that determine the Euclidean distance matrix are all distinct, both the above matrices are full rank.Thus, when S = 2, we have that rank(E S ) = 2.When S > 2, since E 3 is full rank, and in view of (76), we have instead rank(E S ) = 3.From the representation of D in (63), we then conclude that: (80) Next we state and proof a useful lemma.
Lemma 2: Let I(θ) be defined as in (73).Then, for all θ = 1, 2, . . ., H we have that: Proof of Lemma 2: For ease of notation, in the following proof the explicit dependence on θ is suppressed, and we write I in place of I(θ).By definition of the Moore-Penrose inverse, matrix I † satisfies: Then we note that: where in the second equality we used the first identity in (82).Equation (83) implies that the columns of (I H − I † I) belong to the null space of I, denoted by N (I) = {v : Iv = 0}.On the other hand, in view of (73) we can write: Now we would like to see if equality is satisfied for the cases S = 2 (with H > 2) and S > 2 (with H ≥ S).
To this end, we start by noticing that equality in Sylvester's inequality holds if, and only if, there exist matrices X and Y that solve [29]: which in turn admits a solution if, and only if, [30]: Applying Lemma 2, from (95) we get: which means that the equality sign in (92) or (93) holds if, and only if: In particular, we will now show that (97) does not hold for S = 2, while it holds for S > 2.
Let us start with the case S = 2 (and H > 2).We will appeal to the representation of D in (63), which for the case S = 2 can be written as: Let us move on to examine the case S > 2 and H ≥ 2. It is known that, for an L × L Euclidean distance matrix M , one has M M † 1 L = 1 L , implying that 1 L belongs to the range space of M [31].We can apply this result to the matrices E S and E H in (63), since they are proportional to Euclidean distance matrices.In particular, we can say that there exist vectors u S and u H such that E S u S = 1 S , E H u H = 1 H .In particular, one of the (infinite) solutions is given by Applying now (99) into (63), we can write: Equation (97) now follows by observing that: We have in fact shown that (97) holds true for S > 2, which implies that (93) becomes an equality for S > 2.
In summary, we have shown so far that rank(B) = 2 for all H ≥ S (but for the case H = S = 2, which has been examined separately).We will now use this result to prove the claim of the theorem, namely, that rank(C) = 2. Since C is obtained from B by adding an all-ones row, determining the rank of C from that of B amounts to check whether the row vector 1 S lies in the row space of B, which is tantamount to ascertaining whether there exists z such that: Since we exclude the case H = S = 2, we have always H ≥ 3. Now, let us consider an EDM E 3 defined on 3 distinct points p 1 , p 2 , p 3 .Since in this case E 3 is full rank, the system v 3 E 3 = 1 3 has the following (unique) solution: where we denoted by e ij = 1/2(p i − p j ) 2 the (i, j)-th entry of E 3 .Let us now introduce the vector: Since, for H ≥ 3, we know that rank(E H ) = 3, we conclude that: which, using the block representation of D in (63), yields: In view of (106), one solution z to (102) exists if z I = v H , that is, if v H lies in the row space of I.
On the other hand, from the definition in (73), we see that the matrix I can be represented as: where the bold notation highlights the θ-th row and column.According to (107), the row space of I is: which is equivalent to: Examining (103), from straightforward algebra it can be shown that v 3 1 3 = 0, which, in light of (104), implies that v H 1 H = 0. Using (109), we conclude that v H lies in fact in the row space of I, which finally implies, for H ≥ S (excluding the case H = S = 2) that rank(C) = 2.

APPENDIX C
PROOF OF THEOREM 3 We remark that in our setting the divergences are modeled as random variables, which implies that the value of θ k is random as well.We should take this into account when proving the claim of the theorem.First, we observe that: We now show that, for all θ = 1, 2, . . ., H: It is useful to visualize the matrix C(θ) as follows: The matrix C(θ) has H − 1 random rows (i.e., excluding the all-zeros and all-ones rows).Thus, when H > S there are at least S rows with random entries.These random entries are jointly absolutely continuous since i) so are the entries of D; and ii) the mapping from D to (the random entries of) C(θ) is non-singular. 7This implies that, for H > S: which proves (111) for the case H > S.
We switch to the case H = S. Let us denote by B S−1 (θ) the sub-matrix of B(θ) obtained by deleting its last column, and with b S (θ) the last column of B(θ).We can write: We notice that B S−1 (θ) depends only on the sub-matrix D S−1 that is obtained by deleting from D the last column.It is thus meaningful to introduce the set of matrices: Recalling that B S−1 (θ) contains an all-zeros row, we see that, given a matrix D S−1 ∈ E, there exists a unique sequence of weights: w 1 , w 2 , . . ., w θ−1 , w θ+1 , . . ., w S , (116) to obtain the row vector 1 S−1 as a weighted linear combination of the rows of B S−1 (θ).Accordingly, given a matrix D S−1 ∈ E, the rank of C(θ) will be equal to S if the last row in C(θ) cannot be obtained as a linear combination of the rows of B(θ).In view of (114), this corresponds to check whether the linear combination of the elements in b S with the same weights is equal to 1, namely, if: The proof of the theorem will be now complete if we show that the probability of having a unique θ k is equal to 1.To this aim, by using ( 20) and (44), we see that: x sk d θs .
Let us consider the summations in (122) corresponding to different values of θ.Since the random variables {d θs } are jointly absolutely continuous (and since x k is not an all-zeros vector), the probability that two or more summations are equal is zero, which finally implies that θ k is unique.

3 Fig. 2 .
Fig. 2. How majorities build a majority: Convergence of beliefs when the size of sending sub-network N1 is dominant.

3 Fig. 4 .
Fig. 4. Truth is somewhere in between: Convergence of beliefs under balanced influences.

N 2 <
l a t e x i t s h a 1 _ b a s e 6 4 = " a p O w l P S a o g + s D + h S T D P x 7 x N U

1 u o u 8 =
< / l a t e x i t > k < l a t e x i t s h a 1 _ b a s e 6 4 = " Y U e H D b w s A C 0 0 t 7 b O Z I w o E H Z P j p o= " > A A A C C H i c b V D L S g M x F M 3 4 r O O r 6 t L N Y C m 4 K j N 1 o R u x 6 M Z l C / Y B 7 V A y 6 Z 0 2 N J M J S U Y o Q 3 9 A t / o F / o A 7 c e v G T x B / w y 8 w 0 3 Z R W w 8 E D u e e + 8 g J B K N K u + 6 3 t b K 6 t r 6 x m d u y t 3 d 2 9 / b z B 4 c N F S e S Q J 3 E L J a t A C t g l E N d U 8 2 g J S T g K G D Q D I Y 3 W b 1 5 D 1 L R m N / p k Q A / w n 1 O Q 0 q w N l J t 2 M 0 X 3 J I 7 g b N M v B k p X H 3 a l + L l y 6 5 2 8 z + d X k y S C L g m D C v V 9 l y h / R R L T Q m D s d 1 J F A h M hr g P b U M 5 j k D 5 6 e T Q s V M 0 S s 8 J Y 2 k e 1 8 5 E n e 9 I c a T U K A q M M 8 J 6 o B Z r m f h f r Z 3 o 8 M J P K R e J B k 6 m i 8 K E O T p 2 s l 8 7 P S q B a D Y y B B N J z a 0 e w D 2 q F k 0 j t t a C Y z J B m h D P 0 E 3 e r e T 3 A n b s V P E H / D L z D T d l F b D w Q O 5 5 7 7 y P F j z p R 2 n G 8 r t 7 K 6 t r 6 R 3 7 S 3 t n d 2 9 w r 7 B w 0 g R P a F n 6 8 F 6 t d 6 s 9 6 k 1 Z 8 1 6 D t E f W B + / w U i e 2 Q = = < / l a t e x i t > x1k < l a t e x i t s h a 1 _ b a s e 6 4 = " T 0 2 H H D 5 l T m T u r d e S V Z G g f 6 p 9 Q M M = " > A A A C D X i c b V D L S s N A F J 3 4 r P F V F d y 4 C Z a C q 5 L U h S 5 L 3 b h s w T 6 g D W U y n b R j J 5 M w c y O W k G / Q r f 6 D S 3 f i S v A b 3 P g R f o G T t o v a e m D g c O 6 5 j z l e x J k C 2 / 4 y V l b X 1 j c 2 c 1 v m 9 s 7 u 3 n 7 + 4 L C p w l g S 2 i A h D 2 X b w 4 p y J m g D G H D a j i T F g c d p y g w X o 0 3 4 3 1 q X T F m P U f o D 4 z P X / L e o B 8 = < / l a t e x i t > x2k < l a t e x i t s h a 1 _ b a s e 6 4 = "+ d 0 t A H d V x 3 T S R 1 + C C n G P r 9 t 7 E 2 Q = " > A A A C D X i c b V D L S s N A F J 3 4 r P F V F d y 4 C Z a C q 5 L U h S 5 L 3 b h s w T 6 g D W U y n b R j J 5 M w c y O W k G / Q r f 6 D S 3 f i S v A b 3 P g R f o G T t o va e m D g c O 6 5 j z l e x J k C 2 / 4 y V l b X 1 j c 2 c 1 v m 9 s 7 u 3 n 7 + 4 L C p w l g S 2 i A h D 2 X b w 4 p y J m g D G H D a j i T F g c d p y x t d Z f X W H Z W K h e I G x h F 1 A z w Q z G c E g 5 a a 9 7 2 k P E p 7 + Y J d s i e w l o k

x2< l a t e x i t s h a 1 _ b a s e 6 4 =
" 4 P w 3 t J 1 U 9 R 1 5 Q J / U D b k 1 s 2 x Y y + I = " > A A A C E H i c b V D L S s N A F J 3 4 r P F V F d y 4 C Z a C q 5 L U h S 5 L 3 b h s w T 6 k D W U y v W m H T i Z h Z i K W k J / Q r X6 C e 3 c i u P I P 3 P g R f o G T t o v a e m D g c O 6 5 j z l e x K h U t v 1 l r K y u r W 9 s 5 r b M 7 Z 3 d v f 3 8 w W F T h r E g 0 C A h C 0 X b w x I Y 5 d B Q V D F o R w J w 4 D F o e a O r r N 6 6 A y F p y G / U O A I 3 w A

Fig. 5 .
Fig. 5. Macroscopic topology inference problem.The object of topology inference is constituted by the global weights x sk from sending sub-network s to receiving agent k.For example, the weight x 1k in the figure embodies the influence of all sending agents in N1, from all paths (possibly including intermediate receiving agents) leading to receiving agent k ∈ N3.

Theorem 3 ( 2 Fig. 6 . 1 = m 1 = 1 , ν 2 = m 2 = 2 .
Fig. 6.Unperturbed Gaussian model.Left.Network topology.Middle.Belief convergence at the receiving agents.Right.Estimated macroscopic topology.For each of the four panels, the numbers on the right denote the true values {x sk }, with different colors denoting different s, according to the legend.

3 Fig. 7 .
Fig. 7. Perturbed Gaussian model.Left.Network topology.Middle.Belief convergence at the receiving agents.Right.Estimated macroscopic topology.For each of the four panels, the numbers on the right denote the true values {x sk }, with different colors denoting different s, according to the legend.

b) 3 Fig. 8 .
Fig. 8. Perturbed Beta model.Left.Network topology.Middle.Belief convergence at the receiving agents.Right.Estimated macroscopic topology.For each of the four panels, the numbers on the right denote the true values {x sk }, with different colors denoting different s, according to the legend.