Evaluation Mechanism of Collective Intelligence for Heterogeneous Agents Group

Collective intelligence is manifested when multiple agents coherently work in observation, interaction, decision-making and action. In this paper, we define and quantify the intelligence level of heterogeneous agents group with the improved Anytime Universal Intelligence Test(AUIT), based on an extension of the existing evaluation of homogeneous agents group. The relationship of intelligence level with agents composition, group size, spatial complexity and testing time is analyzed. The intelligence level of heterogeneous agents groups is compared with the homogeneous ones to analyze the effects of heterogeneity on collective intelligence. Our work will help to understand the essence of collective intelligence more deeply and reveal the effect of various key factors on group intelligence level.


I. INTRODUCTION
Collective or group is a very common organizational structure of intelligent creatures. Collective intelligence means that group of individuals acting collectively in ways that seem intelligent [1]. And it only occurs when there are interactions between agents in the group [2]. A human groups performance on a wide variety of tasks can be explained by a general collective intelligence factor. [3] The theory of collective intelligence is helpful for understanding many aspects of group performance, bringing benefits to scientific research and practical applications [4].
There are several authoritative methods to quantify the intelligence of isolated agent, but it's hard to quantify the intelligence of groups. David L Dowe [5] proposed an additional computational requirement on intelligence, the ability of expression, as an extention to the Turing Test. C-Test is a test for comprehending ability, equally applicable to both humans and machines, which was presented by J Hernandez-Orallo [6]. Kannan and Parker [7] proposed an effective metric for the evaluation of learning capability. They attempt to evaluate the quality of learning towards understanding system level fault-tolerance. Schreiner [8] presented a study related to creating standard measures for systems that can be considered intelligent, which is realized by the US National Institute of Standards and Technology (NIST). Javier Insa-Cabrera [9] [10] analysed the influence of including agents of different degrees of intelligence and identified the components that should be considered when measuring social intelligence in multi-agent systems. Presented by Fox and Martin [11], an agent benchmark model is developed as a basis for analyzing and comparing multiple agent systems with cognitive capabilities. In the research of Anthon and Jannett [12], the agent-based systems intelligence is based on the task intelligent costs. Hibbard [13] proposed a metric for intelligence measuring based on a hierarchy of increasingly complex environment sets. And an agents intelligence is measured as the ordinal of the most difficult set of environments it can pass. Chmait et al. [14] proposed a metric considered universal and appropriate to empirically measure the intelligence level of different agents or groups. J Hernndez-Orallo [15] presented a way to estimate the difficulty and discriminating power of any task instance. A measure for machine intelligence was proposed by Legg and Hutter [16], who mathematically formalized essential features about human intelligence to produce a general measure of intelligence for arbitrary machines.
Nader Chmait [14,17,18] provided an information-theoretic solution and at first time quantified and analyzed the impact of communication and observation abilities on the intelligence of homogeneous multi-agent system. They considered a series of factors hindering and influencing the effectiveness of interactive cognitive systems [19][20][21].
Heterogeneous groups are the aggregations of two or more interactive agents of different behaviors [20]. The intelligence level of heterogeneous groups cant be achieved directly by homogeneous group model [22]. In this paper we develop the mechanism of quantifying the intelligence level of heterogeneous group. We find that the intelligence level of heterogeneous collectives is higher than same size homogeneous collectives in most cases. And the composition/heterogeneity of heterogeneous collectives also has an important impact on the intelligence level.
The remainder of the paper is organized as follows. Section II introduces the model and mechanism we define. The experiment settings and parameters are in Section III. We present our experiment results in Section IV along with some discussion and analysis of quantitative results. In Section V, we briefly draw conclusion and introduce the future directions.

II. MODEL AND METHOD
The Anytime Universal Intelligence Test (AUIT) [23] is a method to evaluate the intelligence level of homogeneous multi-agent groups. The test simulates agents working in a finite environment and calculate the rewards corresponding to their actions. The average rewards over all agents are considered as the intelligence level of this group. The model works in a toroidal grid space, named T heΛ * (LambdaStar)Environment.
To evaluate the intelligence level of heterogeneous groups, we extend the model as shown in Fig. 1. The environment is a toroidal grid space (periodic boundaries) which means that moving off one border makes you appear on the facing one. In this test environment, there are objects from finite set Ω = {π 1 , π 2 , ..., π x , ⊕, } which contains working agents (Π ⊆ Ω, Π = {π 1 , π 2 , ..., π x }) and two moving special objects, Good (⊕) and Evil ( ). The two special objects travel in the environment with measurable complexity movement patterns. Each element in Ω can work as a finite set of move actions A = { lef t, right, up, down, up − lef t, up−right, down−lef t, down−right, stay }. Reward is defined as a function of the distance of the evaluated agent to objects ⊕ and [20]. In the test environment, given an agent π j , its reward r j is a real number, calculated as follows, in which r j+ represents the reward caused by ⊕, and r j− represents the reward caused by : (1) (2) d π j ,⊕ and d π j , each means the (toroidal) chessboard distance [24] between π j and ⊕ or , e.g. in a 10-by-10 grid-world, the distance from cell (2, 1) to (2, 10) is 1. The reward of agent is the combination of the effects of two distances. The snapshot of agent rewards map is shown in Fig. 2.
The two special objects act as the moving targets in the evaluation, and the agents' work is to chase Good (⊕) and keep away from Evil ( ). With these settings, a test episode consist of a series of ϑ iterations works as Algorithm 1.
In Algorithm 1, an observation means the reward information of π i j s observation range (1 Moore neighbour cells) [25]. The evaluation result is the average reward of each agent over each iteration [20], shown in Algorithm 1.

Algorithm 1 Evaluation algorithm
Input: Π(set of n evaluated heterogeneous agents), special objects(⊕ and ), A(set of actions), environment size m×m, iteration number ϑ Output: The evaluation of an n-agent groups intelligence 1: Agents from Π ⊆ Ω and the two special objects ⊕ and are randomly distributed in the m-by-m toroidal gridworld // Initialize 2: for i ← 1 to ϑ do 3: for j ← 1 to n do The two special objects ⊕ and perform the next action in their movement pattern and renew the rewards distribution in the environment 10: for j ← 1 to n do 11: The environment returns a reward r i j to π i j according to its distance to the special objects // Reward 12: In environment µ, there are two types of complexity. One is task complexity K(µ) [20], which corresponds to the difficulty of the task. And it is represented by the Kolmogorov complexity [26] of the two special objects movement patterns. The other one is search space complexity or environmental complexity H(µ) [20], represented by Shannon entropy [27] of the environment, which stands for the uncertainty of µ and corresponds to the size of the environment. To evaluate the collectives over the same task complexity, we do the simulation with the special objects following the same movement pattern while the other settings (total number of agents, environment size, test and iteration times) remain the same. And we increase the search space complexity by enlarging the size of the environment.
In the heterogeneous group evaluation, we take several types of agent into consideration. They include Local Search Agent, Oracle Agent, and Random Agent.
a. Local Search Agent: This kind of agent will choose the cell of highest reward in its observation range to be the target cell in one iteration. If the rewards of cells in the observation range are all equal, it will randomly choose one. b. Oracle Agent: It knows the movement pattern of the Good special object and can get close to it in the fastest way. c. Random Agent: It randomly chooses one neighbor cell as its next moving target cell. The synergy among the agents in the group is crucial to the performance [28]. The evaluation also considers different group communication methods, such as Talking Agents get their own observation exactly. Each of them gets others' observations with random fake rewards. c. Imitation: Agents take the same action as other agent in their observation range. When there are more than one agent in range, they randomly choose one to follow. In each iteration, different agents get information according to the communication methods they take. Agents with direct communication will get all agents' exact observations, while agents with indirect communication will get others' observations with bias. And Agents with imitation will get the action information of the other agents in their observation range. After that, agents choose and perform actions. Then they get rewards from the environment. At the end of a test episode, the average rewards for each agent are considered as the intelligence level of the group.

III. NUMERICAL EXPERIMENT PARAMETERS
Our experiments try to figure out the impact of agent composition, group size, environmental complexity and evaluation time on the intelligence level of heterogeneous groups. And we also compare the intelligence of heterogeneous group with same size homogeneous group to understand the impact of heterogeneity. The agents and communication methods we use in our simulation, as well as the corresponding symbols are listed in Table I. In the evaluation of heterogeneous groups, we mainly carry out the simulation in a 20×20 environment, corresponding to H(µ)=17.2bits. And each test contains 20 iterations. When we need to figure out the impact of environmental complexity, time and group size, H(µ) varies from 13.2bits to 19.6bits, number of iterations varies from 10 to 500, and agent number varies from 10 to 60(the ratio of the components remains the same). In the comparison of heterogeneous groups and homogeneous groups, we use same size homogeneous groups with other settings all the same.

A. Evaluation of heterogeneous agents group 1) Agents Composition
In heterogeneous groups, agents of different decision strategies or communication methods work cooperatively. We combine different agents to get heterogeneous groups and evaluate their intelligence, showing the results in Fig. 3. The intelligence level of heterogeneous groups is mainly determined by the intelligence level of the components. And the same size group shows higher intelligence level as the heterogeneity gets stronger, which indicates that heterogeneity can help improve the group intelligence.

2) Impact of Agent Number
Just as the saying goes "many hands make light job". We enlarge the group size step by step to observe the changes in their quantified collective intelligence level. In Fig. 4, The ratio of one type agents to the other type in the collectives remains 9:1(e.g., the second bar of the first cluster with "/" pattern represents a collective consists of 18 SLs and 2 T Ls). In the test environment, more agents means more information can be observed, leading to a better reward. With number of agents continue increasing, the result will come to an upper limit.

3) Impact of Environment Complexity
We gradually increase the size of the environment to increase the search space complexity.  illustrates that heterogeneous collective intelligence will decrease with the environmental complexity increasing. With the environment space getting larger, it is much harder to meet ⊕ and elude in finite time, resulting in the decrease of heterogeneous group performance. When the environment is too large for any agent to sense or learn the position of special objects, the group may just perform like all agents walking randomly and aimlessly, and they get rewards close to 0.

4) Impact of Evaluation Time
We extend the evaluation time by increasing the number of iterations for each test. Then we get Fig. 6, in which heterogeneous collectives show performance increase. As time goes longer, agents get more chance to seek and follow ⊕ as well as staying away from . And the evaluation results gradually come to an upper limit with test time long enough. In Fig. 6, the gap between IL10&T L10 and IL10&O10 is larger than that between SL10&T L10 and SL10&O10. That means the intelligence level of indirect communicating heterogeneous groups is more stable than that of imitative heterogeneous groups.

1) Agents Composition
We evaluate the intelligence level of heterogeneous groups and same size homogeneous groups, and we get Fig. 7. The heterogeneous group intelligence level is apparently higher than the avarage level of the components when they are in the homogeneous groups. And it implies that heterogeneity does have a positive impact on group intelligence level.
In Fig. 8, SL(&T L) means the quantified result of SL in the heterogeneous collectives which also contain T L. In these heterogeneous collectives, the ratio of SL or IL to the other type of agents

2) Impact of Agent Number
The group performance going up with the agents number increasing in both heterogeneous and homogeneous cases. In Fig. 9, homogeneous group T L is apparently less intelligent than O. However, in heterogeneous occasions, SL&T L and SL&O get similiar performances, indicating that in indirect communication the improvement caused by heterogeneity may have no absolute relation to the original performance of the wiser agents. Indirect communicating groups improve to a similar intelligence level, even if agents of different intelligence level are added to the groups. Fig. 9: Evaluation of groups of various agent numbers.

3) Impact of Environment Complexity
In Fig. 10, homogeneous groups and heterogeneous groups both show intelligence level decrease when environmental complexities increase. The decrease speed seems to be the average of the components. The impact of environment complexity on heterogeneous group intelligence level is mostly determined by the components. The heterogeneity can make the group performance more stable.

4) Impact of Evaluation Time
Homogeneous and heterogeneous collectives all show similar performance increase with time ex-panding, shown in Fig. 11. Most heterogeneous groups' performance rise slower than that of homogeneous groups. It is quite obvious that IL9&O1 outperforms T L10 when time is long enough. And SL9&O1 may work as well as T L10. The performance of heterogeneous collectives which contain few high-performance agents is getting very close to that of the homogeneous collectives contain only high-performance agents. When time expanding, we can expect heterogeneous groups mainly consist of low-performance agents to get high intelligence level. That is of great significance in reality since high-performance agents, like T L and O, are often energy-intensive or even impractical.

V. CONCLUSION AND FUTURE WORKS
To evaluate the intelligence level of heterogeneous group, we improve The Anytime Universal Intelligence Test(AUIT) model and method. We evaluate the intelligence level of different heterogeneous groups and study the impact of agent composion together with communication methods, group size, environment complexity and evaluation time. Experiment results prove that (a) Heterogeneity can improve the group intelligence level; (b)More agents and longer test time can also lead to better group performance; (c)The intelligence level improvement of heterogeneous groups that mainly adopt indirect communication is quite stable, while groups most made of imitative agents are more likely to be affected by external conditions such as the space size and evaluating time.
In the future, to make the simulation closer to the actual situation, especially for the indirect communication method, the generation of fake rewards should have something to do with the distances between agents. To expand our work, we consider enriching the agent types, for example, incorporating reinforcement learning agents, and adopting different agent organizational structures in simulation.