By Topic

• Abstract

SECTION I

## INTRODUCTION

IN ORDER for agents to initiate proper behaviors in dynamic environments, a practical vision system should be able to process images and extract useful cues in real-time. This ability is critical for both animals and autonomous robots, especially for future robots, which may play a role in our daily life. The basic skills, such as collision avoidance, are vital for their success in interacting with their human hosts. However, previous segmentation and registration based robotic vision techniques have not been able to reliably and cheaply recognize collision in real-time in dynamic environments [9], [29]. Even with several kinds of sensors, such as visual, ultrasound, infrared, laser, and mini-radar, for object recognition (for example, [1], [10], [32], and [57]), it is still very difficult for a robot to run autonomously without collision in complex dynamic environments without human intervention. In another application field, to reduce or alleviate the impact of road collisions and the number of casualties in driving scenarios, a reliable technique for visual based collision recognition is badly needed [54], [62].

On the other hand, nature has provided a rich source of inspiration for artificial visual systems. Many animals use their visual systems to successfully avoid collision in the real world. Insects in particular, with their rapid reactions to dynamic scenes use only a small amount of neural hardware and are very attractive as sources of inspiration (for example, [18], [25], [26], Web and Reeve 2003, [13], reviewed by [23]; [24], [28], [33], [60], and [69]). In insects’ visual pathways, identified specialized neurons have been known for several decades (for example, [35], [38], and [39]). The properties revealed can be used to produce unique computing efficient models for visual sensors for collision recognition.

Recently, specialized neurons found in animals have been used as the model in producing artificial vision systems for collision recognition. For example, an identified neuron in the locust, the lobula giant movement detector (LGMD) (for example, [35], [41], [42], and [49]) has been used as the basis for an artificial visual system for collision avoidance in robots [6], [40], [43], [44], [46], [60], [65], [67] and [61], [66], and more recently in cars [51], [61], and embedded in hardware [34].

Several feature selective neurons may also be combined to provide a robust collision detecting visual system. Direction selective neurons (DSNs) have been found in animals for decades, for example, in insects such as the locust [38], [39], beetle and fly [7], [19], also in vertebrates such as the rabbit [2], [3], [52] as reviewed by [55] and the cat (for example, [31] and [37]). Such DSNs could be used to signal looming (for example, [22]; Harrison, 2006). When organised in an asymmetrical layered network, these DSNs can produce a neural network specialized for collision recognition [64]. By training and then testing in either a driving situation or in a robotic laboratory, the combined DSNs were shown to reliably detect collisions in dynamic scenes [64], [68].

In animals, it is believed that many different specialized visual neurons act together to extract and fuse different visual cues from dynamics scenes. However, when the LGMD and DSNs coexist in a natural or an artificial visual neural system, can they serve the collision recognition role together or does only one type of neuron contribute? This question needs to be addressed. An investigation into the robustness of the LGMD and the DSNs, comparing their competence for collision recognition can also provide useful information for the design of artificial vision systems for robots or cars. In insects, little is known as to how the LGMD and DSNs interact with each other. However, it is possible to investigate interactions by allowing currently available computation models of the LGMD and the DSNs [6], [40], [46], [60], [61], [62], [63], [64], [67], [68], to either operate alone or in cooperation on the same platform or agent. An agent here refers to an entity or a complex neural system that consists of several different types of neural subsystems and is capable of responding to input visual images. The LGMD and DSNs can be such neural subsystems that form an agent. The agent is then exposed to a specific collision recognition task during a period of continuous development.

Evolutionary computation, especially genetic algorithms (i.e., [17], [20], and a recent example, [12]), has provided useful tools to investigate the competence and possible cooperation between similar visual neural subsystems in specific environments. In this paper, we used a genetic algorithm to investigate the competence and possible cooperation of the LGMD and the DSNs in specific environments. There were three different types of collision recognition agents, each with a different type of neural subsystem functioning for collision recognition, i.e., an LGMD agent using the LGMD neural subsystem, a DSNs agent using the DSNs neural subsystem and a hybrid agent with the Hybrid neural subsystem. These LGMD, DSNs and Hybrid neural subsystems all exist in each agent's visual system and evolve simultaneously in a robotic environment.

Since all the three neural subsystems coexist in each agent's visual system in an evolution process, coevolution (for example, [36]) has been considered as an option. In biology, coevolution is about the change of a biological object that is triggered by the change of a related object [59]. Each party in a coevolutionary relationship exerts selective pressures on the other, thereby affecting each others’ evolution (for example, [58]). In evolutionary computation, coevolution can be competitive coevolution [21] or cooperative coevolution [36]. Both of them are aiming to produce better searching results. In this study, our focus is on the competence of the LGMD, DSNs and their cooperative neural networks. As the LGMD and DSNs are both specialized for one visual task—collision recognition, competitive coevolution seems to be the right choice; Hybrid neural subsystems need the cooperation of both LGMD and DSNs, cooperative coevolution seems to be a good choice in this case. However, it would be a complex task to use the above coevolution computation strategies, i.e. competitive and cooperative coevolution, to investigate the competence of the three neural subsystems simultaneously in an evolution process. Fortunately, there is a simple way to accommodate and compare different types of subsystems in an evolution process—to set specific gene(s) to determine which candidate subsystem plays the role. We introduced a switch gene for accommodating these coexisting neural subsystems while providing opportunities for each subsystem to compete for the collision recognition role during an evolution.

Within the whole visual neural system of an agent, the switch gene determines which neural subsystem plays the collision recognition role. During an evolution, each type of agent is evaluated according to their performances on collision recognition tasks. The most important indicators of success are the number of each type of agent in the whole population and their performance over successive generations. Over successive generations agents that perform well have more chances to affect the newly produced switch genes. This means that the competence of that type of agent can be reflected in the increasing number of its kin agents (with similar switch genes) in the whole population.

Via these evolutionary computations, we want to know which type of agent is able to adapt to the environment quickly and robustly, that is to say, which one is more likely to develop the collision recognition ability and prevent others from doing the same task. Secondly, we want to know if there is a need for cooperation between the LGMD and the DSNs for a collision recognition task; this may be the case if the hybrid agent can easily dominate the whole population. We hope the experiments will provide useful conclusions or suggestions for designing artificial vision systems for mobile robots and cars.

SECTION II

## METHODS AND FORMULATIONS

In this section, the visual neural subsystems, including the LGMD, DSNs, and especially their adaptable parts, are illustrated. The switch gene, parameters of the visual neural subsystems, evolving environment and experiments setup are also described in this section.

### A. LGMD Neural Subsystem

The LGMD [see Fig. 1(a)] used in this study is based on the previous model described in [6], [40], [46], and [60] with minor changes.

Fig. 1. Schematic illustration of the LGMD (a), the DSNs (b) and the hybrid neural subsystems (c). Note that the LGMD has symmetrical lateral inhibition but the direction selective neuron, L for example, has leftward lateral inhibition. In the neural vision system, the P, E and I layers are shared by the LGMD and the DSNs. The scales of the P, E, I, and S layer are the same- 100 pixels by 80 pixels arranged in a matrix. The hybrid neural subsystem combines together the excitation of the LGMD, the excitation and the intermediate output of the DSNs. The outputs of the hybrid neural system are also spikes.

The LGMD model is composed of four groups of cells—photoreceptor P, excitatory E, inhibitory I and summing S, and two single cells—feed-forward inhibition (FFI) and LGMD.

#### 1) P Layer

The first layer of the neural network are the photoreceptor $P$ cells which are arranged in matrix form; the luminance $L_{f}$ of each pixel in the input image at frame $f$ is captured by each photoreceptor cell, the change of luminance $P_{f}$ between frames of the image sequence is then calculated and forms the output of this layer. The output of a cell in this layer is defined by equation TeX Source $$P_{f}(x,y)=\left(L_{f}(x,y)-L_{f-1}(x,y)\right)+\sum_{i}p_{i}P_{f-i}(x,y)\eqno{\hbox{(1)}}$$ where $P_{f}(x,y)$ is the change of luminance corresponds pixel $(x,y)$ at frame $f$, $x$, and $y$ are the pixel coordinates, $L_{f}$ and $L_{f-1}$ are the luminance, subscript $f$ denotes the current frame and $f-1$ denotes the previous frame, the persistence coefficient $p_{i}$ is defined by $p_{i}={(1+e^{\mu i})}^{-1}$ and $\mu\in(-\infty,+\infty)$.

#### 2) I E Layer

The output of the $P$ cells forms the inputs to two separate cell types in the next layer. One type is called the excitatory cells, through which excitation is passed directly to the retinotopical counterpart of the cell in the third layer, the $S$ layer. The second cell types are lateral inhibition cells, which pass inhibition, after 1 image frame delay, to their retinotopical counterpart's neighboring cells in the $S$ layer. The strength of inhibition spread to a cell in this layer is given by TeX Source $$I_{f}(x,y)\!=\!\!\sum_{i=-n}^{n}\sum_{j=-n}^{n}P_{f-1}(x\!+\!i,y\!+\!j)w_{I}(i,j),(i\!\neq\! j,\ if\ i\!=\!0)\eqno{\hbox{(2)}}$$ where $I_{f}(x,y)$ is the inhibition in pixel $(x,y)$ at current frame $f$; $w_{I}(i,j)$ are the local inhibition weights; $n$ defines the size of the inhibited area.

#### 3) S Layer

The excitatory flow from the $E$ cells and inhibition from the $I$ cells is summed by the S cells using the following equation: TeX Source $$S_{f}(x,y)=\left\vert P_{f}(x,y)\right\vert-\left\vert I_{f}(x,y)\right\vert W_{I}\eqno{\hbox{(3)}}$$ where $W_{I}$ is the global inhibition weight. Excitations that exceed a threshold value are able to reach the summation cell LGMD TeX Source $$\mathtilde{S}_{f}(x,y)=\cases{S_{f}(x,y)&if\ S_{f}(x,y)\geq T_{r}\cr 0&if\ S_{f}(x,y)<T_{r}}\eqno{\hbox{(4)}}$$ where $T_{r}$ is the threshold.

#### 4) LGMD Cell

The membrane potential of the LGMD cell $U_{f}$, is the summation of all the excitations in S cells as described by the following equation TeX Source $$U_{f}=\sum_{x=1}^{k}\sum_{y=1}^{l}\sum\sum\left\vert\mathtilde{S}_{f}(x,y)\right\vert\eqno{\hbox{(5)}}$$ The membrane potential $U_{f}$ is then transformed to a spiking output using a sigmoid transformation TeX Source $$u_{f}=\left(1+e^{-U_{f}n_{\rm cell}^{-1}}\right)^{-1}\eqno{\hbox{(6)}}$$ where $n_{cell}$ is the total number of the cells in S layer. Since (5) is a sum of absolute value and $U_{f}$ is greater than or equal to zero, the sigmoid membrane potential $u_{f}$ varies from 0.5 to 1. The collision alarm is decided by the spiking of cell LGMD. If the membrane potential $u_{f}$ exceeds the threshold $T_{s}$, a spike is produced. A certain number of successive spikes, which is denoted by $S^{LGMD}$, will trigger the collision alarm in the LGMD cell. However, spikes may be suppressed by the FFI cell when whole field movement occurs [46].

#### 5) FFI Cell

In the absence of feed forward inhibition (FFI), the LGMD network may produce spikes and a false collision signal when challenged by a sudden change of visual scene, for example during a rapid turn. The feed forward inhibition cell works to cope with such whole field movement when a large number of P cells are activated [40], [46]. The FFI at a given frame is taken from the summed output of the photoreceptor cells with one frame delay TeX Source $$F_{f}=\sum_{x=1}^{k}\sum_{y=1}^{l}\left\vert P_{f-1}(x,y)\right\vert n_{cell}^{-1}.\eqno{\hbox{(7)}}$$ Once $F_{f}$ exceeds its threshold $T_{FFI}$, spikes in the LGMD are inhibited immediately.

The early visual processing layers such as P, I, and E are treated as developed layers and the adaptable variable between I to S layer is the inhibition weight $W_{I}$. The FFI threshold $T_{FFI}$ and the LGMD cell's threshold $T_{s}$ are also adaptable during evolution. Other parameters are all treated as developed (and are given in later sections) and fixed without change during the evolution.

### B. The DSNs Neural Subsystem

The DSNs [see Fig. 1(b)] fuse the visual motion cues extracted by the several direction selective neurons. These neurons share the same photoreceptor $P$ cells with the LGMD network; and have their own excitatory $E$ cells and inhibitory $I$ cells which are similar to those in the LGMD network; they have several groups of summing cells- $SL$, $SR$, $SU$, and $SD$ cells etc., direction selective cells—$L$, $R$, $U$, and $D$ etc., several intermediate cells, and a spiking cell $sx$ [64]. We will take the left inhibitory summing cells $SL$ and left inhibitory cell $L$ as examples to illustrate the neural system.

#### 1) SL Layer

The inhibition from an $I$ cell is passed on to its retinotopic counterpart's neighboring cells in the next layer. The inhibition is passed, with one image frame delay, asymmetrically from between one to eight cells away. The summed strength of inhibition to a cell in this layer is TeX Source $$I_{f}^{L}(x,y)=\sum_{i=1}^{m_{I}}\sum_{j=-n_{I}}^{n_{I}}P_{f-1}(x+i,y+j)w_{f}^{L}(i,j),(m_{I}>n_{I})\eqno{\hbox{(8)}}$$ where $I_{f}^{L}(x,y)$ is the summed inhibition to the $SL$ cell and $w_{I}^{L}(i,j)$ are the local inhibition weights. In the above equation, inhibition can spread in four directions: up, down, left, and right, though in an asymmetrical way. The spread to the left is stronger than that to the right since $m_{I}$ is greater than $n_{I}$. At this stage we found that it was not necessary to use all three inhibition directions because the outputs of several direction selective neurons are combined at the next level to extract and then fuse the visual motion cues. To save computing time, we set $n_{I}$ to 0 (and $m_{I}$ to 8), so that inhibition has a maximum spread of eight pixels to the left resulting in directional selectivity with a single nonpreferred direction (leftward in this instance) [63]. With a strong inhibition from the right side, the excitation caused by left translating movements will be reduced or even cancelled [63]. Therefore, the summing cell $L$ keeps silent with objects moving to the left but is excited by motion in the other three directions (R, U, and D).

The excitatory flow gathered in an $SL$ cell will be TeX Source $$S_{f}^{L}(x,y)=\left\vert P_{f}(x,y)\right\vert-\left\vert I_{f}^{L}(x,y)\right\vert W_{I}^{L}\eqno{\hbox{(9)}}$$ where $W_{I}^{L}$ is the global inhibition weight.

#### 2) L Cell

The excitations in the $SL$ cells are summed by the left inhibitory cell $L$. However, to reach the summation cell, excitations should be able to exceed the threshold $T_{rL}$ TeX Source $$\mathtilde{S}_{f}^{L}(x,y)=\cases{S_{f}^{L}(x,y)&if\ S_{f}^{L}(x,y)\geq T_{rL}\cr 0&if\ S_{f}^{L}(x,y)<T_{rL}}.\eqno{\hbox{(10)}}$$

The membrane potential of the left inhibitory cell $L$ is TeX Source $$U_{f}^{L}=\sum_{x=1}^{k}\sum_{y=1}^{l}\left\vert\mathtilde{S}_{f}^{L}(x,y)\right\vert.\eqno{\hbox{(11)}}$$ The membrane potential of the $L$ cell is then transformed using a sigmoid function TeX Source $$u_{f}^{L}=\left(1+e^{-U_{f}^{L}n_{\rm cellL}^{-1}}\right)^{-1}\eqno{\hbox{(12)}}$$ where $n_{\rm cellL}$ is the total number of the cells in $SL$ layer. Since $U_{f}^{L}$ is not less than zero according to (11), the membrane potential $u_{f}^{L}$ varies sigmoidally from 0.5 to 1.

The membrane potential $u_{f}^{R}$ for right inhibitory cell $R$, $u_{f}^{U}$ for up inhibitory cell $U$ and $u_{f}^{D}$ for down inhibitory cell $D$ can be obtained in a similar way. The outputs of the network L, R, U, and D, etc. are then combined to extract collision cues.

#### 3) DSNs

In the previous research the direction selective neurons have been successfully organized for collision recognition [64]. In this paper, a layered network [see Fig. 1(b)] is used to fuse the several neurons for collision recognition and the efficiency of this structure has been demonstrated in recent study [65], [68]. For a fusion network with $n$ layers, each layer has $m_{i}$ intermediate cells, the inputs to the network are the excitation in the direction selective neurons, i.e., TeX Source $$\{F\}_{f}^{1}=\left(s_{f}^{L}s_{f}^{R}s_{f}^{U}s_{f}^{D}\cdots\right)^{T}\eqno{\hbox{(13)}}$$ where $s_{f}^{L}$, $s_{f}^{R}$, $s_{f}^{U}$, and $s_{f}^{D}$ are the excitation in the $L$ $R$ $U$ and $D$ neurons, and $\{F\}_{f}^{1}$ is the input array to the FNs. The output of the $i^{\rm th}$ layer can be formulated in matrix form as TeX Source $$\{F\}_{f}^{i}=[W]_{f}^{i}\{F\}_{f}^{i-1}\eqno{\hbox{(14)}}$$ where $\{F\}_{f}^{i}$ and $\{F\}_{f}^{i-1}$ is the excitation array in $i^{\rm th}$ and $i-1^{\rm th}$ layer respectively, $[W]_{f}^{i}$ is the weight matrix.

A spiking cell $sx$ sums its adjacent layer's excitation. If the excitation $\kappa_{f}$ in the spiking cell ${\rm s}x$ exceeds the threshold $T_{sp}$, a spike is produced as the output. If several successive spikes $S^{DSNs}$ are produced, a collision is recognized by the neural system DSNs.

In the DSNs, the direction selective neurons are considered as developed parts and will be fixed without change during evolution processes. These direction selective neurons are at a similar level to the LGMD. The adaptable variables of the DSNs are the connection weights and threshold of the next level of their organization where the outputs of the directionally selective neurons are combined [see Fig. 1(b)].

### C. The Hybrid Neural Subsystem

The hybrid neural subsystem is represented by a neural network which combines the outputs of the LGMD and the DSN neural subsystems and outputs its own spikes. As illustrated in Fig. 1(c), the output of the LGMD and the final and intermediate outputs from the DSNs are fused in the cooperative neural network. Detail of the Hybrid neural subsystem is similar to the LGMD and DSNs’ and is not illustrated again.

The adaptable part is the cooperative neural network in which the weights and threshold are adjustable. Note that the hybrid system not only depends on the weights and threshold of the cooperative neural network but also the input from the DSNs and the LGMD which are also flexible during evolutionary processes.

### D. The Switch Gene

As described in the above, among the three collision recognition neural subsystems, the LGMD is at the lowest level in terms of complexity with fewest adaptable variables; the DSNs is at an intermediate level and the Hybrid system represents the highest level with the greatest number of adaptable variables. Since the three neural subsystems coexist within a whole neural system and evolve in the same environment for the same visual task, a switch gene is introduced to determine which neural subsystem plays the collision recognition role within the neural vision system as a whole. As schematically illustrated in Fig. 2, the agent takes its name from the neural subsystem that is connected to decision making.

Fig. 2. Schematic illustration of the three collision recognition agents. Three neural subsystems- the LGMD, the DSNs and the Hybrid are coexist in the same entity- an agent or a bigger neural system. The switch gene controls the information flow from the three neural subsystems to the decision making level. Only one neural subsystem's decision can feed to the final decision making and the whole neural system is named after that type of agent. Once the connection from a certain neural subsystem to the decision making has been made, other neural subsystems were blocked from sending output to the decision making. The double arrow between the DSNs and the Hybrid represents two levels of excitation flow from the DSNs to the Hybrid. (a) DSNs agent. (b) Hybrid agent. (c). LGMD agent.

During the evolutionary development period of the whole neural system, the switch gene adapts within a range of values from 0.5 to 3.5. As shown in Fig. 2, if the switch gene is located within the range 0.5 to 1.5, the LGMD neural subsystem plays the collision recognition role; the outputs of the DSNs and the Hybrid are blocked and become redundant; the whole neural system is termed an LGMD agent. If the switch gene is located within the range from 1.5 to 2.5, the Hybrid plays the role, the LGMD and DSNs are the functioning part of the hybrid system but are blocked from making any direct connection to the decision making and the whole neural system is termed a Hybrid agent. Otherwise, the DSNs plays the role, the LGMD and the Hybrid are blocked and become redundant, and the whole system is termed a DSN agent. The range of switch genes’ value can be any other real numbers rather than 0.5 to 3.5 as long as equal (or randomized) opportunity is provided for each type of agent.

### E. Parameter Setting

Parameters of the LGMD are set before the experiments. The range of adaptable variables is mainly decided based on empirical experience to balance computing, searching costs and opportunities.

The input video images are 100 (in horizontal) by 80 (in vertical) pixels; images are grey scale ranging from 0 to 255 (parameter without unit, similar parameters hereafter will not be restated). Therefore there are 8000 cells in $P$ layer and the same number of cells in $I$, $E$ and $S$ layers, respectively. The lateral inhibition spread to its neighbors 1 layer away and with one frame delay. The local inhibition weights are set as: 25% for the four nearest neighbors and 12.5% for the four diagonal neighbors. Other parameters are listed in Table I. These parameters are set based on the early experiments and are not adaptable in the following evolution experiments unless stated.

TABLE I THE PARAMETERS OF THE LGMD

The inhibition weight $W_{I}$ is adaptable within (0.5 $\sim$ 2.0); the FFI threshold $T_{FFI}$ adapt within the range from 0.5 to 1.0 and the LGMD cell's threshold $T_{s}$ are also adaptable within the range 0.0 to 30.0 during the evolution.

The selectiveness of DSNs is supposed to be a developed character of the DSNs in this study and is not be alterable during the evolution. Parameters of the DSNs are given in Table II based on our experimental study. The local inhibition weight $w_{I}(i)$ is set to be as strong as 5.5 to ensure inhibitory effect and directional selectivity. The four direction selective neurons used in this paper are: left inhibited DSN $L$, right inhibited DSN $R$, upward inhibited DSN $U$, downward inhibited DSN $D$.

TABLE II THE PARAMETERS OF THE DSNS

There are four intermediate cells in the DSNs. In this case, there are a total of 21 weights and thresholds that are adaptable in the evolution process. The connection weights are allowed to adapt between $(-1.0\sim 1.0)$. The threshold of the spiking cell is allowed to adapt within (0.0 $\sim$ 10.0).

There are six input cells connected to the spiking cell of the Hybrid, the six connection weights are all adaptable, within $(-1.5\sim 1.5)$ and the spiking cell threshold is within (0.0 $\sim$ 4.0).

### F. Setting up Evolution Experiments

Evolutionary computation has been very successful in different applications—computer vision is one of the areas that evolution processes have been used to tackle problems in a variety of different levels (for example, [12], [30], and [64]).

For this study, similar specialized neural subsystems coexist in a vision system; they need to compete with each other for the specific roles—collision recognition, or cooperate to achieve a better performance. All these competition and cooperation happens simultaneously in one evolution process. As stated above, the coevolution computation strategies are not adopted directly. In coevolution computation (e.g., [12] and [21]), relative fitness is often used for judging one agent against another but in different groups. However, in this study, not only the number of agents in a whole population is an important indicator, but also the absolute fitness value which represents the overall performance of different agents in the specific environment is extremely important as well. The solution for our case is to introduce the switch gene which determines one of the three subsystems to play for visual collision recognition for the whole entity. In this case, a normal genetic algorithm [8] with slight modification becomes the best procedure once the switch gene is incorporated.

#### 1) Algorithm Setting

A population of agents (60 hereafter, unless restated differently) in each generation are processed via a genetic algorithm [8], [17], [61]. The first generation is produced randomly. To form a new generation, the worst performing agents (20% of the whole population in a generation) are replaced. New agents (20% of a whole population) are produced by the best performing parents in the previous generation through crossover. Single-point crossover routine is used to perform crossover with probability set to 0.75 [8]. Mutation is made to the chromosomes (binary coded) of these newly produced agents with a mutation rate 0.1.

In an evolution process, different types of agents evolve in the same environment simultaneously and are therefore affected by the presentation of the rivals. Different groups of agents are evaluated according to their absolute fitness value which was assigned under the same rule. Therefore, the worst performing type of agents may be driven to extinction by the best performing agents. Because of the random factor in producing new agents, mutation may bring the extinct agent back again in subsequent generations.

#### 2) Fitness

Each agent's behaviour is evaluated based on its weighted success rate [64], i.e., fitness value. In each generation, an agent that responds to all visual events correctly, i.e. recognize imminent collisions and make no mistakes on translating scenes or other challenges, scores a fitness value (success rate) of 100%; an agent that fails in all events scores a fitness value 0%; an agent that fails in a noncolliding challenge scores a lowered fitness value (reduced success rate); an agent that fails in a colliding event get a sharp reduction in success rate since a collision event is much more important in scoring than a noncollision one—for example, failure in a collision sequence may be equal to four times the failure in a noncollision event. However, an agent scores 50% in fitness value if it only fails in all collision events or only fails in all no-collision events.

The fitness of an agent may be formulated as the following: TeX Source $$F_{k}=\left(1-{\sum\limits_{i=1}^{N_{v}}f_{event}^{i}\over M_{nb}}\right)\times 100\%\eqno{\hbox{(15)}}$$ where $F_{k}$ is the fitness value of the $k^{\rm th}$ agent in the population, $f_{\rm event}^{i}$ is the score for the in $i^{\rm th}$ events in the total $N_{v}$ events, $M_{nb}$ is the highest possible scores, and $f_{event}^{i}$ depends on performance: failure or success TeX Source $$f_{event}^{i}=\left\{\matrix{K_{col}&{\hbox{failure\ in\ collision\ event}}\cr K_{non}&{\hbox {failure\ in\ noncollision\ event}}\cr 0&{\hbox{success}}}\right.\eqno{\hbox{(16)}}$$ where $K_{\rm col}$ is the score for failure in a collision event, $K_{\rm non}$ is the score for failure in a noncollision event. For a collision event, failure means no collision signal is sent out by the agent 3 $\sim$ 30 frames before real collision. $K_{\rm col}$ is several times bigger than $K_{\rm non}$ to assure that an agent only fails in all collision events and an agent only fails in all noncollision events will have the same fitness value: 50%. In an evolutionary process, $N_{v}=14$ (including 4 collision events), $K_{\rm col}$ is 2.5, $K_{\rm non}$ is 1 and $M_{\rm nb}$ is 20.

Fig. 3. Samples from video sequences making up the robotic laboratory environments in which the three types of agents were evolving. The number under each image is the number of its corresponding video sequence. The arrows in the images are added for schematically indicating the visual motion direction. The black ball is 95 mm in diameter. In video sequences 1 and 2, the ball was moving across the field of view from left to right at an intermediate speed, taking 19 and 20 frames respectively; in video sequences 3 and 4, the robot was turning anticlockwise at about 50 °/s while moving forwards, at 3.2 cm/s; in video sequences 5 and 6, the robot was turning clockwise at about 50 °/s while moving forwards, at 3.2 cm/s; in video sequences 7, the ball was bouncing to the right; in video sequence 8 and 9, the ball was bouncing up and down; in video sequences 10, the ball was bouncing to left; in video sequences 11, the ball was approaching the robot at 0.4 $\sim$ 0.5 m/s from right side; in video sequence 12 and 13, the ball was approaching the robot at 0.4 $\sim$ 0.5 m/s from the central area; in video sequence 14, the ball was approaching the robot at 0.4 $\sim$ 0.5 m/s from the left side. There were 60 frames in each video sequence. The collision sequences were numbers 11 $\sim$ 14. The robot's field of view was 60 ° [64].

#### 3) Evolving Environments

To cultivate well performing agents, the evolving environment should include as many visual events as possible. However, a huge video database may result in unacceptable computing time. Balance can be achieved by carefully selecting visual events to form the evolving environment. As illustrated in Fig. 3, a group of video sequences, which were recorded in a robotic laboratory with a Khepera II mobile robot,1 are selected to form the environment for the agents to evolve in. Each sequence represents one event that can cause strong excitation in the photoreceptor layer. These sequences include a robot interacting with a black ball and turning around. Since these video sequences were recorded directly from a mobile robot, many visual perturbation challenges such as bumping and shaking were presented. These video images were all taken at about 25 frames per second.

SECTION III

## RESULTS AND DISCUSSIONS

Four rounds of evolution have been conducted with three types of neural subsystems coexisting and evolving simultaneously with the same parameter and environment setting. Following this, another three rounds of special evolution, in which only one type of neural subsystem is allowed to play the recognition role in each evolution, have then been conducted in the same environment. Each evolution ran for about 16 hours on a Dell laptop computer (P4 CPU 2.8 GHz). Results are shown in Fig. 4 to Fig. 9, respectively.

Fig. 4. Results with three groups of collision recognition agents evolving during the four rounds of evolution. The number of best agents means the number of agents with success rate equal or greater than 90%. (a) Round 1; (b) round 2; (c) round 3; and (d) round 4.

As shown in Fig. 4, there are about the same number of the three types of agents in the first generation in the four rounds of evolution. However, the LGMD agents have quickly established themselves with increasing number of kin agents (“kin agents” here means agents that are using the same neural subsystem for collision recognition) and dominated the whole population after about 10 [see Fig. 4(a), (b), and (c)] to 40 [see Fig. 4(d)] generations. Though the Hybrid agent also showed very strong ability in the early generations of the 3rd and 4th rounds of evolution and in a specific isolated evolution process in which only the Hybrid agents were involved in collision recognition in the evolution [see Fig. 5(b)]. The LGMD agent performed well in a specific isolated evolution with high averaged fitness and best fitness value [see Fig. 5(a)]. The above results showed that the LGMD has the ability to detect collisions robustly and leaves no opportunity for others to do the same work in this environment.

Fig. 5. Results of three rounds of special evolution in which only one specific type of agent is allowed to be involved in collision recognition in each evolution.

The values of switch gene versus generation number in the four rounds of evolution have also been plotted in Fig. 6. It is found that the distribution of the switch genes tended to lock to the LGMD neural subsystem in 10 to 20 generations. However, the initial distribution in the first generation was uniformly distributed. The distribution trend over generations illustrated the number of different agents and reflected the competence of certain types of agents.

Fig. 6. Switch gene value in each generation in the four rounds of evolution. Note that in each generation there were 60 different agents each with its own switch gene value; some of the gene value were very close and may overlap in the plot. The LGMD played the collision recognition role if the switch gene value fell within the range (0.5 $\sim$ 1.5), Hybrid played the role if it was within (1.5 $\sim$ 2.5) or DSNs played the role if it was within (2.5 $\sim$ 3.5). (a) round 1; (b) round 2; (c) round 3; (d) round 4.

The gene value converged very quickly as shown in Fig. 7. The majority of the gene values lay within a narrow area (see Fig. 7, 50th and 100th generation) which meant that the LGMD genes were converged and remained stable over several generations. The number of its kin agents also demonstrated the robustness of the LGMD in recognizing collisions, since a slight change in the gene value had not caused significant behavioural difference (see Fig. 7 and Appendix). In contrast, the gene of DSNs and the Hybrid neural subsystem showed little convergence over the generations, for example, the 18th gene of DSNs and the 3rd gene of Hybrid neural subsystem in Fig. 8.

Fig. 7. Distribution of the three gene values of the LGMD neural subsystem in the 1st, 50th, and 100th generations. Data are from the first round of evolution.
Fig. 8. Distribution of the 18th gene in the DSN neural subsystem and the 3rd gene in the Hybrid neural subsystem in the 1st, 50th, and 100th generations. Data are from the 1st round of evolution. The two genes were randomly selected.

The LGMD agent was also tested using similar visual clips and results are shown in Fig. 9. The LGMD agent was picked up from the 100th generation's 48 best LGMD agents. The chromosome of the 13 of those 48 agents was transformed from binary to decimal value and is shown in the Appendix. The chromosome of the agent used in the test was in the first column. Note that only the 22nd–24th gene belongs to the LGMD neural system and the others are either the redundant DSNs or Hybrid gene or switch gene. The test showed the LGMD agent was able to recognise collision in similar scenarios [see Fig. 9(a) and (b)] and did not respond to noncollision scenes [see Fig. 9(c), (d), and (e)]. As shown in Fig. 9, the optimal combination of the LGMD cell and the FFI cell resulted in the observed good performance. The selectivity for looming objects over translating objects was largely based on this optimal combination.

The LGMD agent was also challenged with two unfamiliar scenes. One scene was captured when the robot moved towards clustered blocks without collision psee Fig. 9(f) [. The agent responded to it correctly—no repeated spikes were sent out. The other scene was captured when a ball approached the robot on a collision course in the first stage and missed the robot at the final stage [see Fig. 9(h)]. It was no surprise that the LGMD agent detected a collision at round frame no. 40, because the LGMD agent tended to detect collision quite early [e.g., Fig. 9(a) and (b)] during the approach stage [see Fig. 9, (h)].

Since the LGMD, the DSNs and the Hybrid neural subsystem extracted and fused visual cues at different levels—LGMD at lower level, DSNs at intermediate level and Hybrid at higher level, and their flexibilities were also different due to their physical structure, it had been hard to predict which one would win the competition. Through the above evolutionary experiments, it became clear—if the DSNs coexisted with the LGMD, they may not have had the chance to develop themselves for collision recognition in the specific environments; the cooperation of the DSNs and the LGMD for collision recognition would also be difficult to develop in this case. However, the DSNs and the Hybrid agents could reach high success rate evolving alone (see Fig. 5) which meant that if the LGMD's output was blocked, the chances for the DSNs alone and the cooperative Hybrid neural subsystem to play the collision recognition role would be high. The results may also suggest that the DSNs may have to be involved in other visual tasks instead of collision recognition, if they were to coexist with the LGMD. In the future, more visual tasks may be introduced into the evolution to investigate the possible function diversity and coordination of these neural vision subsystems.

For visual neural systems, the evolving environment was also critically important in forming and determining a structure for certain tasks. Often, the best agent in one specific environment may not be the best in another unfamiliar environment. Interestingly, quite similar results were obtained when we put the competition and coordination game into another dynamic environment involving driving scenarios as briefly shown in Fig. 10. The LGMD agents also dominated most of the population after several generations in our driving scenario experiments, however, we noted that the best scored agent was not always a LGMD one, the DSNs and the Hybrid scored very high fitness value in three out of four rounds of evolution, which was consistent with previous studies, in which only one type of DSN was allowed to evolve in a specific environment [63], [64]. It was also harder for the LGMD to gain domination in the whole population [see Fig. 10(b), 4th round].

Fig. 9. One of the best agents (LGMD agent, from the 1st round of evolution) processing different test scenes. The adaptable value of this agent is detailed in Appendix C in the first column. Frame numbers are shown under each image frame. The dashed horizontal lines are the thresholds for LGMD (blue) and FFI (red). Excitation levels are indicated in solid lines with LGMD in blue and FFI in red. Spikes are represented by asterisks. For the approaching cases, the last image shown is the one taken when the ball touched the robot. (a) Processing an approaching ball on a direct collision course. (b) Processing another ball approaching on a direct collision course. (c) Processing a moving ball translating at the same range from the camera. (d) Processing a bouncing ball. (e) Processing a nearby translating ball. (f) Processing turning scenes. (g). Processing forward motion in a clustered environment. (h) Processing a near miss scene.

The LGMD agents may benefit from its relatively stable structure with a smaller searching space, although the switch gene has structurally provided each type of agent with equal opportunity. Further experiments have been carried out, by assigning 18 dummy variables to the LGMD agents in addition to its three variables. These dummy variables involved in some simple addition and deduction operation but exerted no final contribution to the LGMD outputs. Two rounds of evolution were conducted and the results (see Fig. 11) showed no significant difference compared to that from the previous experiments. Additional experiments with introduced significantly enlarged random factor (higher mutation rate, in this case) have also been carried out three times, in order to see if this can provide a better chance for the DSNs or Hybrid agents. The three rounds of evolution results are shown in Fig. 12. More random changes in the gene has not lowered the LGMD agent's competence, though the success rate of the LGMD agents has understandably dropped down with a higher mutation rate (see Fig. 12(c), right column).

SECTION IV

## FURTHER DISCUSSIONS

The domination of the LGMD agent may be explained by the robust computational structure of the LGMD neural system for collision recognition. An LGMD subsystem is stable as it ‘sums’ the excitations resulting from expanding edges (e.g., [40], [41], [42]) regardless of the direction of their movement. The excitation level of the LGMD system in response to similar visual stimuli, for example Fig. 13(a)(d), will be the same as these expanding edges are summed without directional bias. However, these similar visual stimuli [see Fig. 13(a)(d)], will elicit quite different outputs from the directional sensitive neurons of the DSNs—making the learning process much more difficult for their postsynaptic network. For the same reason, it will not be any easier for the hybrid neural subsystem to adapt to these challenges quickly.

The computational structure of each collision recognition subsystem determines the learning efficiency. A robust agent, such as a well performing LGMD agent in the above experiments, produced offspring that also performed well though these offspring's gene was slightly altered due to both crossover and mutation. It is obvious that, in the competitive developmental process described above, the DSNs and even more complex cooperative hybrid agents had difficulties in generating offspring that performed well. The robustness of the LGMD suggested that it could be a good model for designing artificial vision systems for collision recognition and avoidance for mobile robots, vehicles, airplanes and other high speed mobile machines.

Fig. 10. (a) Sample images from video footages representing a driving environment in which the agents evolved. The number under each image is the video sequence number. Video sequence 1 was a car collision scene while driving at high speed, video sequence 2 was a car collision scene while driving at low speed, video sequence 3 was a leftward translating van while the camera was stationary, video sequence 4 was of a left running pedestrian while driving at very low speed, video sequence 5 wais a left walking pedestrian while driving at very low speed, video sequence 6 was a turning car while driving at low speed, video sequence 7wais a fast translating car while waiting at a roundabout, video sequence 8 was a car cutting in while driving at normal speed on a motorway, video sequence 9 was the scene with road symbols- arrow while driving at high speed, video sequence 10 was road symbols- arrows and zebra lines while driving at high speed. (b). Results of the four rounds of evolution. Left column shows the number of agents over generations and right column shows the fitness over generations.
Fig. 11. Results of three groups of collision recognition agents evolving (in the robotic lab, Fig. 3) during the two rounds of evolution in which 18 dummy variables were assigned to LGMD agents in addition to its original 3 variables. The LGMD and DSNs agents were with the same number of variables in the evolution. The number of best agents means the number of agents with success rate equal or greater than 90%. (a) Round 1 and (b) round 2.
Fig. 12. Results of three groups of collision recognition agents evolving (in the robotic lab, Fig. 3) during the 3 rounds of evolution in which mutation rates were set to (a) 0.4, (b) 0.6, and (c) 0.8, respectively to introduce more random factor in the evolution. (a) Round 1; (b) round 2; and (c) round 3.
Fig. 13. Examples of similar looming (collision) visual stimuli each has edges moving to three different directions as indicated with arrows. All the four looming objects will elicit similar level of LGMD excitation but will trigger different outputs from the four directional sensitive neurons. For example, (b) will only trigger responses from L, R, and U but D directional neuron—the DSNs collision recognition system has to learn to cope with each of these looming objects differently.

On the other hand, the experiments demonstrated the way three different types of functioning neural subsystems coexist and work in one entity via a switch gene. The full potential of DSNs and the hybrid subsystems has been confirmed and demonstrated in separate experiments (see Fig. 5). In product design and system engineering, redundancy is often specially introduced for enhancing reliability. Redundant structures in an artificial vision system may be necessary to gain further robustness and reliability. In future research, it is important to investigate how the collision recognition functionality could be reorganized from the redundant structures if malfunction occurs in the dominant subsystem.

The results of this study are useful for both the design of artificial vision systems and in understanding biovision systems but the limitation of this study is also obvious. The video database used in this study only represented a limited number of collision patterns; however, colliding objects and patterns in an environment can be very diverse. Although previous studies showed that these motion-sensitive neural vision systems could cope with a wide range of colliding objects even when trained in a simple environment with simple objects [64], it may be interesting to investigate if the cooperation is necessary when these agents are challenged with very complex and diverse scenes with colliding objects. It is also worth investigating how these neural systems may evolve for multiple visual tasks in the future.

The LGMD in locusts and direction selective neurons in many animal species including locusts are still under investigation (see [15], [16], Santer [46], [47], [48], and [67]). The interaction of these direction selective neurons to guide behaviour in animals is also a subject of speculation (e.g., [11]). Our study above shows that an evolution method may provide chances to explore possible competition and coordination mechanisms between these neurons for specific visual tasks. We hope that by using modeling and evolutionary computation methods, together with the increasing information revealed by scientific investigations in insects’ visual pathways and the continuous investigation on developmental brain science (e.g., [27] and [56]), efficient and robust active vision systems could be created for future autonomous robots to interact with dynamic environments effectively.

SECTION V

## CONCLUSION

In the above sections, we have investigated the competence and cooperation between the LGMD and the DSNs for the visual collision recognition role via evolution processes. Represented by three different types of agents, i.e., the LGMD agent, the DSNs agent and the cooperative Hybrid agent, the neural subsystems evolved in the environments simultaneously. The experiments showed that, the LGMD has the ability to establish its role for collision recognition very quickly and therefore reduce the other neural systems chance of developing the same skill.

The LGMD is very robust in detecting collisions therefore it is an ideal model for designing artificial vision systems for the collision recognition task. Although the cooperation of the LGMD and the DSNs can be very successful, there has been little chance for the neural system to develop coordination aimed solely at collision recognition—the LGMD would have already gained a dominate role in this case. The DSNs may have to develop themselves for other visual tasks to maintain existence.

This study gave us a chance to look at the developmental process of several specific neural subsystems fighting for their places via evolution. The above results provide useful information for the design of novel artificial vision systems for collision recognition which can be used in robots, cars, and many other application areas. With similar methods, the coordination between these visual neural systems for multiple visual tasks could be investigated in the future.

## APPENDIX

The binary to decimal transformed chromosome of the first 8 of the 48 best agents (which are all LGMD agents) in the 100th generation from the 1st round of evolution:

### ACKNOWLEDGMENT

The authors would like to thank all the anonymous reviewers for their valuable comments and suggestions to revise the manuscript.

## Footnotes

This work was supported in part by EU FP7 Projects EYE2E (269118), HAZCEPT (318907), and HAZCEPT (318907).

S. Yue is with the School of Computer Science, University of Lincoln, Lincoln LN6 7TS, U.K. (e-mail: syue@lincoln.ac.uk; yue.lincoln@gmail.com).

F. C. Rind is with the Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, NE1 7RU, U.K. (e-mail: cliare.rind@newcastle.ac.uk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available