Neuromorphic Computing for Interactive Robotics: A Systematic Review

Modelling functionalities of the brain in human-robot interaction contexts requires a real-time understanding of how each part of a robot (motors, sensors, emotions, etc.) works and how they interact all together to accomplish complex behavioural tasks while interacting with the environment. Human brains are very efficient as they process the information using event-based impulses also known as spikes, which make living creatures very efficient and able to outperform current mainstream robotic systems in almost every task that requires real-time interaction. In recent years, combined efforts by neuroscientists, biologists, computer scientists and engineers make it possible to design biologically realistic hardware and models that can endow the robots with the required human-like processing capability based on neuromorphic computing and Spiking Neural Network (SNN). However, while some attempts have been made, a comprehensive combination of neuromorphic computing and robotics is still missing. In this article, we present a systematic review of neuromorphic computing applications for socially interactive robotics. We first introduce the basic principles, models and architectures of neuromorphic computation. The remaining articles are classified according to the applications they focus on. Finally, we identify the potential research topics for fully integrated socially interactive neuromorphic robots.


I. INTRODUCTION
The biological intelligence of living beings has been an area of focus to explore their capabilities of memorising, thinking, perceiving, and acting accordingly. Among all species, humans have a remarkable capacity to make sound and quick decisions in diverse situations, sometimes based on vague and incomplete information. Humans perform complex behaviours that are important for surviving in dynamic environments. Advances in computational and behavioural neuroscience and embodied cognitive systems provide a baseline to integrate the interdisciplinary approaches for further technological progress in robotics. With increasing efforts of mimicking those functional and structural principles, roboticists have researched how the brain, robot sensors, and actuators The associate editor coordinating the review of this manuscript and approving it for publication was Daniel Augusto Ribeiro Chaves . operate together to perform complex tasks in a real-world environment [1]. To acquire more autonomy and operate in the real world, robots should: 1) perceive their environments in real-time, 2) process sparse information with energy efficiency and response latency, 3) behave under changing conditions and acquire self-learning ability.
With the emergence of increasingly powerful computers and sophisticated sensing systems, machine learning algorithms became increasingly capable and have achieved success in several scientific and commercial applications. Recently, advances have been made in deep-learning approaches based on the hierarchical nature of the human vision system [2]. However, the current dominant machine learning (ML) models in robots are far from performing human-like tasks that require precise motor control, fast reaction time and adaptation to external conditions. Besides this, these ML models also lack scalability. Furthermore, the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ divergence between the human brain and current technology can be exemplified by the fact that a hypothetical clock-based computer running a ''human-scale'' brain simulation requires approximately 12 Gigawatt of power. By contrast, the actual brain works with just 20 Watt [3]. A major bottleneck that severely limits the up-scaling of intelligent interactive agents is the unnatural discretization of time imposed by mainstream processing and sensing architectures [4], which are based on arbitrary internal clocks. Clock frequencies must be increased to deal with the continuous inputs of the real world. However, very high frequencies prove unfeasible and make large-scale applications of the current hardware inefficient. To achieve such efficiency, living creatures process the information using spikes, which help them to perceive and act in the real world exceptionally well. A challenge for human-like machine intelligence is to imitate the efficient neuro-synaptic framework of the physical brain. This area of focus has been investigated extensively in recent years and many new technologies and methods are developed which try to mimic the biological behaviour of the human brain which consumes very less energy and acts very fast. One such method is Neuromorphic computing. Neuromorphic computing (also known as brain-inspired computing) is a multidisciplinary research paradigm that investigates large-scale processing systems that support natural neuronal computations through spike-driven communication. Compared to traditional approaches, key advantages of neuromorphic computing are energy efficiency, execution speed and robustness against local failures [5]. Currently, analog-programmable non-volatile memory (NVM) devices such as phase change memory (PCM) [6], resistive RAM (RRAM) [7], conductive bridging RAM (CBRAM) [8], magnetic RAM (STT-MRAM) [9] are the heart of these neuromorphic computing devices. The gradual switching of the resistance level in these devices are the key to neuromorphic computing and robotics applications [10]. Moreover, the neuromorphic design overcomes the distortion of the artificial discretization of time by using asynchronous event-driven computing that matches the temporal evolution of the external world [11]. Inspired by this event-driven type of information processing, emerging hardware and software knowledge in the field of neuroscience and electronics have made it possible to design biologically-inspired machines by using Spiking Neural Networks (SNNs) to model cognitive and interactive capabilities [12].
The interaction between humans and machines is of great relevance for both the field of neuromorphic computing and Robotics. Utilising neuromorphic technologies in robotics, from perception to motor control, is a promising approach to creating robots that can seamlessly integrate into society. In neurorobotics (neuromorphic computing and robotics), bio-inspired sensors are used to efficiently encode sensory signals. It also adapts to different environmental conditions by integrating inputs from multiple sensors and using event-based computation to accomplish desired tasks [13]. Figure 1 is summarising the landscape of neuromorphic computing and interactive robotics. Hardware and software simulators use specific neuron and synaptic models according to the desired applications.
However, so far, a comprehensive review of the two fields has not been performed. Researchers have focused on narrow problems and proposed solutions that incrementally build on the mainstream machine learning paradigm rather than radically change it. To foster future research on interactive neurorobotics, this article focuses on how neuromorphic computing and SNNs can be used for interactive robotics and what impact does it make when SNN models are used along with neuromorphic hardware. We investigated the contributions made in this field both theoretically and practically in terms of hardware and software platforms for the development of neurorobotics solutions. In the next section, we introduce details of spiking neural networks, neuron and synapse models. Section III gives the methodology, criteria and discusses the results of systematic research of the scientific literature on neuromorphic computing and SNNs for interactive robotics. In Section IV we present a comprehensive review of the articles, which are grouped and discussed according to the applications they focus on. Section V presents and briefly discusses neuromorphic hardware, simulators and frameworks that were used in the articles found in the systematic review. Section VI is dedicated to an analysis of shortcomings and possible future directions for research in this field. Section VII gives our conclusions.

II. SPIKING NEURAL NETWORK (SNN)
Neural networks are usually classified into three generations and all of them somehow mimic the multilayer architecture of the human brain, but the behaviour of neurons differs significantly among them [14]. In the first generation, the output of a neuron is binary (0,1) and it is obtained by simple thresholding of the weighted synaptic input. In 1943 McCulloch and Pitts showed that networks of artificial neurons have the ability to do some mathematical and logical computation [15].
With time, another concept came about when researchers developed the backpropagation technique for multilayer perceptron networks. This technique resolves the limitations of prior neural perceptron technique and it is extensively used in deep learning today. This second generation of neural networks is also known as Artificial Neural Networks (ANNs). The main difference from the first generation of neural networks is in the neuron output: in ANNs this can be a real number, which is the result of the weighted sum of all the inputs after being passed through a transfer function, usually sigmoidal. Weights are obtained as a result of some machine-learning algorithm ranging from simple linear regression to high-level classification. Modern computational hardware is widely available to evaluate the novel concepts of advanced neural networks, such as learning protocols and inventive architecture.
First and second generations of neural networks have only limited modelling capabilities of their biological counterparts and, in particular, there is no time reference to electrical signals that have been described in their biological neural networks. Moreover, there is still limited research available about biological processes. Processing real-time data, the human brain offers an efficient signal processing in which information is encoded in a number of features related to spikes, including the specific times of events [16]. This idea of simulating neural events led researchers to the development of Spiking Neural Networks (SNNs), which are, so far, the most biologically plausible models. Figure 2 shows the overview of all three generations of neural networks (NNs). The first generation of neural networks are based on McCulloch Pitts (MP) neuron model, which shows that artificial neurons can perform basic computations. In the 2nd generation of NNs continuous non-linearity function (e.g, Relu, Sigmoid, Tanh etc.) gives promising results in applications like multi-class classification, detection, identification etc. Finally, spiking neural network, also called the third generation of neural networks [17], imitates the action of normal neurons in the brain. The neurons in brains are excitable and produce action potentials, which are also known as spikes. These spikes are basic currency of the brain. They allow neurons to perform computation and to communicate with other neurons. During a spike, a neuron releases a neurotransmitter, a chemical that travels across a synapse before reaching another neuron. Spiking neural networks also operate on spikes; these are discrete events taking place at specific times. SNNs take a spike train as input and produce a spike train at the output. The state variable(s) of a neuron in SNN can change based on the mathematical model of a neuron. If the value of the neuron state variable representing its membrane potential exceeds a predefined threshold, the neuron will send a single impulse (spike) to each post-synaptic neuron [14].
The major advantages of SNNs are the temporal plasticity, reduced computational complexity, and ease of use of neural interfaces [18]. In recent years, the popularity of SNN and its models has increased and several models have been developed for image classifications and object recognition using SNNs. The spiking neural networks are also suitable for a diverse range of applications related to computer vision and robotics, such as classification, clustering, pattern recognition, etc. There are many examples of converting data directly from sensors [19], [20], [21], intelligent systems with controlling manipulators [11], [22], and robots [23], [24] [25]. Moreover, performing detection and recognition tasks [26], [27], [28], and processing numerical data with Neural Engineering Framework (NEF) [29], [30], [31] can also be done with SNNs.

A. LEARNING IN SPIKING NEURAL NETWORKS
The key concepts of spiking neural network operations do not allow the use of classical learning techniques and methods that are appropriate for a conventional neural network. There are still several methods to train an SNN. For unsupervised learning in SNNs, the most famous method is Spike-timingdependent plasticity (STDP) [32]. In STDP, the synaptic weight is based on the difference in firing time of pre-and post-synaptic neurons. When a pre-synaptic spike arrives before the post-synaptic action potential leads to Long Term Potentiation (LTP), whereas a pre-synaptic spike arrives after post-synaptic spikes leads to Long-term Depression (LTD) of the synapses. The change of the synapse plotted as a function of the relative time of pre-and post-synaptic spikes is known as STDP function or learning window. For supervised learning in SNNs, there are methods like back-propagation [33] where the learning algorithms like SpikeProp [34] and Fre-qProp [35], demonstrate how the network of spiking neurons with a biologically plausible time constant can perform complex non-linear classification tasks in temporal coding. Another such method is ReSuMe [36], which is suitable not only for movement control but also for other applications like identification and modelling of non-stationary objects. There are also a few models available for reinforcement learning in SNN. One of them is the actor-critic model [37] which uses temporal-difference learning by combining local plasticity rules with global reward signals. This network is capable to solve non trivial grid-world tasks with sparse rewards. There is also a method for reinforcement learning in SNNs through modulation of STDP [38]. This modulation is used as a global reward signal that leads to reinforcement learning.

B. SPIKING NEURAL NETWORK ARCHITECTURES
Like other neural networks, SNNs architectures are majorly divided into four groups [39].

1) FEED-FORWARD NEURAL NETWORKS
It is a classical neural network architecture in which data is transmitted only in one direction, without any cyclic connection, and processing can take place over many hidden layers. In feed-forward neural networks, which are commonly known as a multilayered network of neurons or Deep Neural Networks (DNNs), information first enters the input node, moves through hidden layers and finally, the results come out through the output nodes. The network does not have any connection to feed the information coming out at the output, back into the network. The majority of modern artificial neural network architectures, such as convolutional neural networks (CNNs) and deep neural networks (DNNs) are feedforward [40]. In robotics, it is usually adopted for low-level sensory acquisition, such as vision [41], olfactory [42] and tactile sensing [43].

2) RECURRENT NEURAL NETWORKS (RNN)
Unlike feed-forward NNs, recurrent neural networks leverage error correction. In simple words, in RNNs the output from the previous step is fed as an input to the subsequent step. A RNN is a recursive network with a certain structure, such as a linear chain. This mechanism is used by living organisms to process arbitrary input sequences using their internal memory stored within RNNs. Besides this, a Liquid State Machine (LSM) is also a type of recurrent network of spiking neurons where internal connectivity parameters remain static during the training process [44]. In robotics, RNNs are widely used for speech recognition [45], control [1] and planning [46].

3) HYBRID NEURAL NETWORK STRUCTURES
This type of hybrid SNN structure shows some neurons having a feed-forward connection whereas others have recurrent connections. This type of hybrid structures are often used for end-to-end training of SNNs for tasks like object detection and pattern recognition [26]. Experiments with a hybrid approach demonstrated promising results with less computational cost [26].

4) HYBRID NEURAL NETWORK ARCHITECTURES
In these architectures SNNs interact with ANNs. This type of Hybrid ANN-SNN can be trained without conversion; it results in highly accurate networks which are also more computationally efficient than their ANN counterparts [47].

C. SPIKING NEURAL NETWORK MODELS
SNNs are built on the mathematical description of biological neurons. Usually, neuron models are expressed in the form of differential equations. So far, many mathematical descriptions of SNN models have been proposed, processing inhibitory and excitatory inputs using internal state variables. Here, we will discuss some of the most influential spiking neuron models, majorly because of their simplicity and widespread use in robotics applications.

1) HODGKIN-HUXLEY NEURON MODEL
The first bio-inspired neural model was developed by Sir Alan Hodgkin and Sir Andrew Huxley in 1952 [48]. Their mathematical model describes how action potentials in neurons are initiated and how they propagate. This model is a set of nonlinear differential equations that approximates the electrical characteristics of neurons. They also explained the ionic mechanisms underlying the initiation and propagation of action potentials in giant axons of a squid. The following differential equation of Hodgkin-Huxley model is relating the change in membrane potential to the current flowing across the membrane: The I ext is an externally applied current. C m is membrane Capacitance whereas V m is membrane Voltage. Here the iconic current I ion is the combination of three components, a sodium current, a potassium current and small leakage current [49].

2) IZHIKEVICH NEURON MODEL
The Izhikevich neuron model [50] combines the biological plausibility of the Hodgkin-Huxley model and computational efficiency of the integrate-and-fire neuron (LIF) model. This model reproduces spiking and bursting behaviour of known types of cortical neurons [20]. There are only 2 state variables and 2 parameters to tune in order to reproduce the complex behaviour of cortical neurons [50].
This model can be represented as 2-D system of differential equations [50]: where V m is the membrane potential, U is the membrane recovery variable I m is injected bias current or incoming synaptic spikes. a and b are dimensionless variables that tweak the neuron's behaviour based on their values.

3) LEAKY INTEGRATED-AND-FIRE (LIF) NEURON MODEL
This is the most commonly used model because of its simplicity. In the leaky integrated-and-Fire model, each cell has a membrane potential V m , with its capacitance C m and a leaky channel that allows current to flow across the membrane with resistance R m . The charge carriers travelling across the membrane are driven by force voltage V e . When the voltage in the cell exceeds the threshold value V th , an action potential (impulse, action potential or spike) is generated. After the spike occurs, the voltage is artificially dropped to the reset value V reset . After emitting a spike a neuron may have a ''rest'' period in which it may not be excited. This time is usually known as ''refractory period''. The differential equation describes how the membrane potential V m changes over time ( dV m dt ) in the face of an externally applied membrane current I m is as follows: A simplified form of the last equation is: Here V e , R m , τ m are taken to be intrinsic properties of the cell while I m is the external current and V m is the membrane potential [51].

D. SYNAPTIC PLASTICITY MODELS
After the selection of an appropriate neural model, the synapse model should be decided to connect neurons inside and among the layer of the spiking neural network. Initially proposed by Hebb in 1949 [52], synaptic plasticity is a mechanism for learning and memory based on theoretical analysis. Depending on the relationship between neural networks and synaptic plasticity, they are roughly classified into two types: spike-based and rate-based [1].

1) RATE-BASED
The most common model of synaptic plasticity that has been proposed over many years is rate-based models. These models use a definition of firing rate that refers to an average of spike-count over time [53]. Here the magnitude of synaptic plasticity is assumed by the rate of pre-and post-synaptic firing over the specific time period. This type of model is typically used for converting conventional artificial neural networks to spiking neural networks by using backpropagation [54], [55]. The ANN is trained with backpropagation technique and then it is converted into an equivalent SNN by relating the activation of ANN units and the firing rate of spiking neurons. The relation between transfer function of spiking neurons and the activation unit of ANN have been thoroughly discussed in [56], [57], and [58]. Rate-based models are successfully used in experiments of a robot-based sensory and/or motor system [59].

2) SPIKE-BASED
A synaptic model with a spike-based learning rule was developed in the early 1980s [60], [61]. Several experiments showed that synaptic plasticity is influenced by the exact time of individual spikes. The phenomenon, which has been named Spike-Time-Dependent-Plasticity (STDP), alters the synaptic weight based on the relative timing of pre-and post-synaptic spikes: a causal relationship (pre-before postsynaptic spike) causes the synaptic weight to increase, while an anti-causal relationship (post-before pre-synaptic spike) causes the opposite effect.
Bienenstock et al. [62] have created a theoretical learning rule called the BCM rule. The basis of this algorithm is that the instantaneous firing rate rather than individual spikes (as for STDP) set the pattern of weight modifications. Izhikevich and Desai [63] have proven an equivalence of the BCM rule and the STDP spike-pair rule, opening both to common theoretical treatment and adding flexibility in specific model implementation choices.
In recent years the STDP learning rule has been successfully implemented in underlying neural learning mechanisms in robotics within both simulated and real environments. Equation 6 gives the precise mathematical definition of STDP learning rule.
where A + and A − represent the strength of potentiation and depression, respectively. τ + and τ − are positive and negative time constants.

III. SYSTEMATIC REVIEW METHODOLOGY
For this systematic review, we have selected three databases IEEE-Xplorer, Scopus and Web-of-Science. The keywords which we searched selected according to our goal to identify the scientific articles in the intersection of neuromorphic computing and interactive robotics. To this end, we identified two groups of keywords with similar meanings, respectively: 1) ''neuromorphic computing'', ''spiking neural network'' and ''brain inspired computing''; 2) ''interactive robotics'', ''social robotics'', ''humanoid robotics''. In group 2, we also included ''cognit*'' to find cognitive models that may be used to learn and implement interactive behaviour in agents and robots. Finally, we use the pair ''brain-inspired computing'' AND ''robot*'', because we wanted to find alternative methods by using the ''brain-inspired computing'' keywords but, when combined with the keywords of group 2, it resulted in only 2 articles. We searched all these keywords in the three databases as part of Title, Abstract and Keywords, and downloaded the resulting publications for an initial screening. We added this meta-data to Rayyan.ai [64], an online tool for systematic review. The total number of articles we added was 955. Initially, we removed all duplicates, reducing the total count to 951. After that we decided to limit our systematic review to articles published on or after 2017. This reduced the total publication count to 358. Finally, we labelled all VOLUME 10, 2022 remaining publications by searching our keywords in title, abstract, and keywords. As this review focuses on socially interactive robotics and neuromorphic computing, we shortlisted publications presenting neuromorphic computing techniques to implement a new behaviour for human-robot interaction. In case of ambiguity in the metadata, we also reviewed the body of the article. The review was done independently by the three authors, articles were shortlisted only if two authors agreed that the article was pertinent. The number of publications that remain for review is reduced to 60. Figure 3 summarises the paper selection procedure using a PRISMA chart. Figure 4 shows the distribution per year of the publications retrieved, compared with the number of publications related to robotics which are considered for this systematic review.

IV. NEUROMORPHIC CHIPS/WWWWW/SIMULATORS AND FRAMEWORKS
In socially interactive neurorobotics, neuromorphic hardware is one of the essential elements for the robot to perform cognitive tasks. With recent advancements in neuroscience and in the chip industry, new neuromorphic hardware is introduced for the simulation of SNNs. In the tables below, we will briefly discuss the neuromorphic chips (Table 1), simulators (Table 2), and frameworks (Table 3) we came across during our review. We also discuss the robots (Table 4) used in our selected publications. There are several other neuromorphic chips (e.g. TrueNorth [65], Braindrop [66], SyNAPSE [67], FACETS [68], NeuroMem [69], NM500 [70], SynSense [71]), simulators (e.g: Genesis [72], SpikeFun [73]) and humanoid robots (e.g. Pepper [24], Nao [74]) which are not discussed here as they were not used in any of the publications we reviewed. Some of the general-purpose simulators such as Mayavi [75] and CSIM [76] focus on simulating the environment for models, instead of simulating SNN models. Unlike other simulators, they do not have in-built neuron and/or synaptic models. Besides this, we also found several articles where no neuromorphic hardware is used. Input signals are obtained by sensors on the robot and SNN models are used as processing layers. We discuss more it in the following Future Direction section.

V. SNNs IN ROBOTICS APPLICATIONS
The use of SNNs in robotics introduces considerable complexity with limited benefits when performing simple tasks. In cognitive robotics, the goal is to understand the environment and compute the output. Such an approach usually returns useful insights for neural architectures and learned behaviour, especially when dedicated neural hardware is available. So far we have briefly discussed spiking neural networks and their models. In this section, we dive deeper into the applications of SNNs and neuromorphic computing in the field of socially interactive robotics. Robots provide an interesting testbed for SNNs, yet their application requires finding solutions to many problems such as power consumption, action duration, and output fidelity. Through our systematic review, we found 5 major directions in which contributions have been made. Before getting into the details of these applications, we summarise our analysis in Table 5 and Table 6. Here, Table 5 shows the robots and hardware used in the experiments, while Table 6 is about the software or simulation platform used to conduct experiments. These tables guide the reader through the type of robots and platforms that are typically used in experiments related to neurorobotics.

A. SIGNAL ACQUISITION AND PROCESSING APPLICATIONS
The implementation of robotic devices with intelligent sensors has been recognized as an important ingredient in the development of modern-day robots. A robot should plan and execute a series of operations autonomously while adjusting to the surrounding environment in real-time. Image segmentation and identification is one of the major objectives of a robot vision system. The features of the interesting object must be obtained and compared with the reference library for identification [120]. While much research has been conducted on vision-based identification, the combination of vision and non-vision sensors promise improvements in the speed of the recognition process. Rast et al. [28], demonstrated the learning system using an iCub robot and a SpiNNaker system to solve object identification tasks. The SpiNNaker neuromorphic system [121] is a neural network simulation platform, designed for real-time simulations. The details of hardware and chip architecture are given by Furber et al. [77]. The major goal of SpiNNaker is to achieve real-time processing of real-world data. Meanwhile, iCub is a research-grade humanoid robot with 53 degrees of freedom typically used in developing and testing embodied AI algorithms. The most common protocol used to communicate between iCub and host PC is called Yet Another Robot Platform (YARP) [122].
Furthermore Rast et al. [28] made several enhancements to the basic networks and showed how they can be used to direct performance towards behaviourally relevant goals. They observed the behaviourally relevant STDP appears to contribute strongly to positive learning as compared to negative learning. Paper demonstrated the integration of neuromorphic chips and humanoid robots to show how such      a system can learn to recognize and attend to preferred objects without relying on off-line training. SpiNNaker uses External/Internal Event Input-Output (EIEIO) protocol [123] which is the standard protocol of communicating AER data, as well as general or device-specific commands between heterogeneous platforms. Figure 5 shows the general block diagram of the proposed system. In an extended publication, Garcĺa et al. [95], showed how neuroanatomically grounded VOLUME 10, 2022 FIGURE 5. General block diagram of iCub-SpiNNaker system. The I/O from the robot is converted into YARP bottles. It is then processed by a host-based EIEIO transceiver which converts the message into spikes to transmit and receive from SpiNNaker.
SNNs for visual attention can be extended with word learning capabilities. After the completion of learning, the robot was able to call the name of the object when visual input was present.
As significant research has been conducted to understand the processing of multi-sensory information, neuroscientists, psycho-physicists, and psychologists suggested models explain how information from different sensors is integrated in order to perceive the environment. The term convergence-zone was proposed by Damasio [124], which is one of the plausible models that explain the mechanism of the multi-modal human perception process. Inspired by this, Al-Qaderi and Rad [22] proposed a multi-modal perceptual system for social robotics. Unlike the majority of studies in machine perception which deal with uni-model sensory systems, they proposed a multi-model system which uses concepts from fading memory [21], binding criteria [125], cell assemblies [52], and top-down influences [126]. To access the performance of their system they selected Pioneer 3DX which is a popular research mobile platform. They equipped the robot with an RGB camera, Kinect sensor, directional microphone, and sonar sensor. Experimental results proved that multi-model perceptual systems perform better than unimodal systems.

B. PATTERN RECOGNITION APPLICATIONS
Visual or pattern recognition is a fundamental component for most robotic systems operating in the real world. A large variety of tasks related to human-robot interaction require visual or pattern recognition. Therefore, a lack of successful recognition is often an impediment to applying robotics systems to real-world situations. In particular, in cognitive robotics visual or pattern recognition is a building block of complex systems including many other components such as pose estimation, grasping and manipulation [127]. So far, this problem has received comparatively little attention and was usually dealt with through methods that require strict supervision during the training phase, such as uncluttered views of the objects or meta-data about object position or orientation towards the camera.
Fanello et al. [27] proposed some improvements in the robot visual perception capabilities with a limited amount of constraints. They modified state-of-the-art coding-pooling pipelines for visual recognition to improve the representation while maintaining real-time performance. Recently, Mansouri-Benssassi and Ye [118] tried to explore all the major applications of bio-inspired SNNs with unsupervised learning using STDP for facial expression recognition. They pre-processed the images with Laplacian of Gaussian (LoG)  figure: represents the SNN workflow for facial expression recognition. a) is a raw image b) image with LoG filter and Poisson spike train creation with convolution layer. c) Excitatory layer. d) Inhibitory layer [118]. Right Figure: a) Teaching phase where learner visualize the target action. b) In turn-taking phase, learner extract the nonverbal information. c) In Trial phase. learner confirms the target action [98].
filters to detect the edges and contours of the facial image. After that a spike train is created using a Poisson distribution where the firing rate is directly proportional to the input pixels' intensity. Lastly, they adopt the online STDP [128] to perform unsupervised learning, see Figure 5 (left). The proposed approach was evaluated on two publicly available facial expression datasets and achieved better accuracies to some of the popular methods like Histogram of Oriented Gradients (HOG) features or CNN. In another publication by Al-Qaderi and Rad [96], the multi-modal perceptual system was introduced for efficient facial recognition. The experiments were conducted in real-world scenarios where a robot has to recognize the face of the user while moving around.
Besides this, experiments have been conducted to recognize the motion of the human being in real-time by using SNNs and clustering of direction vectors to analyse the nonverbal information and construct a cognitive system to recognize the target motion and acquire actions [98]. This instructed learning is done in three phases. First is the teaching phase where a teacher is sharing the cognitive environment and the learner visualises and confirms the target action. In the turn-taking phase the learner extract the nonverbal information, and in the final stage, the learner imitates the action. Figure 6 (right) shows the block diagram of this instructed learning.
In another process, proposed by Cyr and Theriault [99], robots learn the relationship between left/right and horizontal/vertical visual stimuli, regardless of the location of the image or their specific pattern composition. Here, different patterns were shown to the robot, and the correct recognition along with appropriate movement was expected from it. Experiments also show that with the different rewarding rules, SNN can adapt its behaviour in real-time.
Similarly, Cyr, Thériault, and Chartier [30] propose a 2-bit task (XOR) with visual compound binary images as input and the left/right action for the output. Here the robot can adapt its behaviour from learning other simpler associative rules during runtime. Furthermore, the impact on the neural architecture was also explored when passing from a 2-bits to a 3-bits task. After that, it is fed to Spiking Neural Networks and then to Feed-Forward neural network for classification. This classification layer produces an output angle that is used to control the motor [19].

C. SPEECH RECOGNITION APPLICATIONS
Speech recognition is the ability of a machine or robot to identify words and phrases in language and convert them into machine convenient language. In most common cases of speech recognition in robots, the voice command is taken through a microphone then processed in a computer, and finally sent back to the robot for action. Sometimes it is hard to recognize speech efficiently due to the noise environment, to solve this problem Davila-Chacon, Liu, and Wermter [19] propose the embodied embedded cognition approach to improve Automatic Speech Recognition (ASR) systems for robots in noisy environments. Sound Source Localization (SSL) was used to locate the direction of the sound source accurately in a short time. Figure 7 represents the SSL architecture. In this approach, before doing an ASR task, the robot orients itself toward the angle where the signal-to-noise ratio (SNR) of speech is maximised for one microphone. Here a spiking neural network was applied to calculate the sound signal angle. The system was tested both on the iCub robot and Soundman, and its performance was measured and used as a baseline for the iCub robot head.
The result of this test shows that the proposed ASR with SSL can considerably improve the accuracy of speech recognition in a humanoid robot. Moreover, in the proposed approach where SSL directed the orientation of the robot and selected the appropriate channel as input to the ASR system, showed better results. This was unlike other approaches, where sound signals for both channels (left and right) are averaged before feeding to the ASR system. In the future, speech recognition can also be used as the bootstrapping mechanism to train neural layers as it can perform auditory grouping in frequency and time domain [19].
In another approach, a robot (iCub) was used to perform vestibulo-ocular reflex (VOR) tasks, which applied a spiking cerebellar model comprising an adaptive real-time control loop [100]. It operates as a feed-forward controller that integrates several neural models, certain neural topology and characteristics. The cerebellar model effectively adapts the reflex for a robot by using STDP which generates the eye motor commands to compensate for the head movement of the iCub robot.

D. MOTOR CONTROL APPLICATIONS
The goal of many studies in this field is to develop a robot that can mimic human cognitive and motor behaviour. From doing simple tasks like pointing in different directions to performing advanced learning tasks, movements, grasping and touching actions in robotics have several applications. The method was proposed to generate and control pointing motions for a robot using a spiking neural network. The SNN learns a base motor primitive for pointing at targets (left, right, up, down). The network was able to combine multiple motor primitives to control a robot in real-time to reach a specific point. The performance of the network was evaluated with the help of a humanoid robot (HoLLiE). The board plane with targeted points was placed in front of the robot and the robot has to produce motion towards a target point. The major benefit of this approach is that it is not dependent on the specific kinematic structure of the robot and can be used with different robots [31]. Figure 8 shows the overall architecture of the process. Here the activation pattern is produced by the motion generation layer for four correction primitives and the motion command is given to the motor control system. Similar work is done by Mirus et al. [129]. Here the mobile robot is capable of moving around and detecting objects in an unknown environment.
A different study by Batres-Mendoza et al. [23] presents the development of the real-time locomotion systems for the Hexapod robot using bio-inspired computing. The improved Quaternion based Signal Analysis method (iQSA) method based on quaternion algebra is used for processing and classification [130]. Here the data was collected from 120 users to create a decision rule for the iQSA method. The proposed system has three major parts: the first is a signal acquisition through a Brain-Computer Interface (BCI), which translates electrical signals from the brain into computer input. In the second step, this acquired data is analysed and processed through the iQSA method. Finally, a SNN is used to mimic the Central Pattern Generator (CPG) [131] behaviour to control robot movements. CPGs are a specialised neural network that is capable of creating rhythmic patterns without the use of sensory input. These patterns allow us to coordinate and control repetitive activities, such as chewing, swimming, walking, and running. Figure 9 presents the block architecture of a system. The system was implemented in real-time for performance evaluation. The experimental results show that the robot was able to replicate the gait pattern generated through the user's mental activity with a slight delay.
Similarly, Lele et al. [101] used CPG and coupled it with Dynamic Vision Sensor (DVS) for a prey-tracking scenario in the close-loop robotic system. Here the legs of a Hexapod robot are controlled by a network of spiking neurons. There are six neurons, each controlling one leg, and each spike of these neurons causes movements in the corresponding leg. For the learning of various gaits, a supervision-based weight adaptation algorithm is proposed. The result shows that a maximum of three gaits can be programmed using this VOLUME 10, 2022 FIGURE 8. Brief block diagram of the motion generation approach. It contains three major components: first, the motion generation layer produces circular activity that creates activation patterns for primitives. Second, the motor control layer has arm base primitive and arm correction primitives for pointing motion and to point to target, respectively. Third, the target layer takes the relative distance between target and base point for selective excitation to activate the correction primitives.

FIGURE 9.
Communication architecture between the Brain-Computer Interface (BCI) and the Hexapod robot. The EEG signals are acquired through the Emotive Epos headset. This is then transferred to the iQSA module to determine robot movements. Finally, commands are given to the robot locomotion module via Bluetooth [131].
algorithm with high energy efficiency when implemented on modern neuromorphic hardware.
Alongside with the above-discussed methodologies, the movement of a robotic arm has also remained an area of focus. The model proposed by Zahra, Navarro-Alarcon, and Tolu [104] shows how the controller based on cellular-level guides the motion of the robot arm with real-time data from sensors. The model is inspired by the biological features of the cerebellum, by monitoring the firing rate and pattern of the different groups of neurons in it. This model contains two layers of neurons: one input, one output, with all-to-all connections between them, to provide transformation between two correlated spaces. The network correlates the spatial velocity in the two layers, it acts as a differential map, so the concept of Differential Mapping Spiking Neural Network (DMSNN) is utilised. For learning the STDP is put to use. The model was tested with the UR3 universal robot, where the elbow and shoulder joints are controlled to test the manipulation of the end actuator to a specific position. The experiment shows that such a model can reduce error in a certain direction with fast convergence of learning. Moreover, it can be developed further for real-time adaptive robot control in multiple challenging environments and applications [132], [133]. In addition, another study showed a similar architecture where a virtual robot controlled by a specific SNN could independently learn simpler associative rules [30]. Additional to this, Tieck et al. [107] showed the concept of soft-grasping to control a robotic arm with SNN. The robotic arm is able to grasp objects of different sizes, stiffness and shapes without calculating complex contact point planning or inverse kinematics. This approach required only one example of each grasping motion to train the primitive and it can be used on different robots having similar features of the human hand. In another experiment, they used sEMG signal to activate motion reflexes on a robotic hand. The trained network can classify the sEMG signals and detect finger activation. The basic block diagram of the prosthetic control through a Brain-Computer Interface (BCI). As shown, the brain provides the EEG signals to the FeNeuCube framework which, in turn, gives instructions to the controller. Finally, the controller forwards the control command to prosthetic hand [85].
The reflexes of the finger are modelled with motion primitives and mapped to a robot kinematics [108], [109].
There are also applications of neurorobotics in the field of medical science. For example, Kumarasinghe et al. [85] presented the proof of concept study for prosthetic control with the Brain-Computer Interface (BCI). They used finite automata theory and a NeuCube evolving SNN architecture. The prosthetic hand was designed to perform tasks such as grasping and touching. Figure 10 shows the basic architecture of the proposed BCI. Here the learning is happening in two stages: the first stage uses an STDP unsupervised learning rule where evolving synaptic connections are formed according to the difference between pre-and post-synaptic neurons. With this approach, SNN cube will activate the same group of neurons when a similar input stimulus is provided. The second stage uses a supervised learning paradigm to update the connection weight between output and hidden layers. The framework (FaNeuRobot) integrates the evolving spiking neural network model of the brain with finite automata that tells the neuromuscular behaviour of forearm muscles and flexor during movement.
Another interesting study focuses on solving movement issues for disabled and elderly people. In most of the existing systems, a user has to control the robot manually which could be difficult for the elder or disabled people. Obo et al. [106] presented a multi-modal interface to control the robot remotely. Besides this, a cognitive platform to control robots based on the concept of the perception-action cycle is also proposed. Here a spiking neural network is used for spatio-temporal modelling of the interaction between the environment and the user. A self-organised neural network based on an unsupervised learning paradigm is used in this system. They developed a seat with pressure sensors on it and the robot movement is dependent on the user's movement while sitting on the seat. Beside this, a stationary eye-tracker (Tobii Eye Tracker 4C) is attached to retrieve the gaze and head pose. Figure 11 shows the summarised proposed system architecture which has two major parts, the perceptual system and the action system. The experimental results show that the teleoperation system can change the sensitivity of the interface according to the operation.  FIGURE 11. shows the summarised system architecture of the system. It has two major parts: 1) the perceptual system, where the information of environmental map find a use to detect the space where a robot can move around. Moreover, the self-organized neural network is utilized to extract perceptual information. 2) the action system, behavioural features in teleoperating are extracted and commands are given to motor control. Based on the perception-action cycle, SNN is used for spatio-temporal modelling.

E. COGNITION AND LEARNING APPLICATIONS
Biological systems generally have a memory, which is defined as the ability to preserve, learn and reproduce past adaptive states. As discussed earlier, synaptic plasticity is an important mechanism of memory and learning on a cellular level. Several mathematical models exist that can simulate cognitive maps, where synaptic plasticity yields the emergence of spatial memory in SNNs. Spatial memory in robots is used for storage and retrieval of information that is used to plan a route to a desired location and to remember where an object is located or where an event occurred [102]. In this subsection instead of discussing the classifications of Here robot and LiDAR are providing inputs to the SNN for mapping [113]. b) The basic block diagram of the prosthetic control through a Brain-Computer Interface (BCI). As shown, the brain provides the EEG signals to the proposed FeNeuCube framework which, in turn, gives instructions to the controller. Finally, the controller forwards the control command to prosthetic hand [85].
learning [1] we discuss the contributions related to spatial memory, learning and using that learning to predict the next action for social robots. Table 7 details learning mode, rule and paradigm in each application.
Spatial mapping is an important component for developing Simultaneous Localization and Mapping (SLAM) in social robots. Tang and Michmizos [113] presented an algorithm that solves the navigation problem for robots by learning the environment using a specialised neural network. This biology-inspired network is integrated with the Loihi neuromorphic chip which interacts with the Robot Operating System (ROS) in real-time, see Figure 12. The robot is equipped with a 360-degree LiDAR sensor. Here the SNN uses Winner-Take-All (WTA) structure and heterosynaptic competitive learning for place field generation and dentritics for reference frame transformation. The algorithm gave an accurate environmental map by using error-free odometer signals from the Gazebo simulator.
Cyr and Theriault [99] used the concept of operant conditioning for a robot to learn the spatial map through reinforcement and punishment. The robot can learn the relationship between different visual stimuli irrespective of their pattern or location on the images. The final results show that when the rewarding rule is changed, the SNN can adapt its behaviour in real-time. They also presented two exploration strategies in a virtual robot controlled by a SNN. A virtual robot controller simulates the thigmotaxis (the movements of one organism either towards or away from the stimulus) and boldness (the propensity to engage in risky behaviour) behaviours. The network performed visual learning tasks solved through an operant conditioning procedure [115], [134]. Similarly, another article presents learning in robots where SNN can implement several variations of learning through classical conditioning with positive or negative reinforcement. The SNN model is implemented on Field-Programmable Gate Array (FPGA) and a Synapto-Dendritic Kernel Adapting Neuron (SKAN) model is used for neural delay [112].
The publication by Zharinov et al. [103] showed the model of spatial memory implemented on SNN. This model was then tested on a robot moving in an environment with neutral and harmful regions. Here the dynamics of the neural population determines the movement of a robot. The STDP learning rule rearranges the SNN coupling and forms spatial memory according to the surroundings. After training, the robot learns to avoid the harmful region. A similar approach was presented by Lobov et al. [102] where the robot has to explore the surroundings and after the training is completed, it should determine where the harmful region is and avoid moving to that region. The proposed network can remap neutral and harmful regions when the dangerous zone moves to another place. Thus, the robot adapts to the changing world. Discussing memory, another model was presented for associative memory in the form of SNN. A mobile robot was used to demonstrate how neuromorphic hardware can reduce energy consumption with the same computing power when implemented on a Loihi neuromorphic chip [111]. Moreover, associative learning tasks were also performed to investigate the abstract relation of the sameness/difference (SD) model in bio-inspired robots. The model uses artificial SNN as the robot's brain-controller and STDP as a learning rule. The experimental results show that robots can learn in different scenarios depending on their previous action and applied reinforcing rules [116]. Improvements are also made in the working memory of robots using SNNs with dynamic synapses. The proposed model was able to refine, overwrite or resist the change in the duration and configuration of incoming stimuli. They showed how local changes in the environment can be accounted for short or long-term changes in synaptic plasticity. The online unsupervised learning with the STDP learning rule is used for working memory of robots so it can perform tasks in evolving environments [105].
Besides this, work has been done in the direction of episodic memory for robots, where a robot can perform more versatile cognitive tasks like exploration, localization, and navigation [110]. For example, if the robot has to make and serve milk tea. For this task, the robot needs to be familiar with the environment through a cognitive map and episodic memory to perform tasks in an appropriate order. Based on the idea of behaviour-based robotics, an algorithm is developed that can robustly perform navigation and exploration tasks. Additionally, another limitation for self-learning in robotics is also explored which is related to having a sense of time. The brain-inspired spatial cognitive system is developed that integrates episodic memory and cognitive map. This system helps the robot to recognize and remember different locations while storing the correct sequence to perform tough tasks. The algorithm was presented to overcome the fundamental issue of designing the topologies and parameters for the SNN. It was evaluated with a simple sensory-motor decision task using evolutionary computation. They showed how some variations in topology and parameters can affect the behaviour of a system. In the experiments, the complexity of the task increased gradually so that the algorithm could evolve with it [42]. This approach can be used in different learning tasks for social robotics where the robot is in a continuous changing environment [135]. Addition to this Huang, Wu, and Qiao [114], shows how emotion plays an important role in participating in value calculation, as an important intrinsic motivation of decision making. The Turtlebot-3 used the model-based decision making approach with emotional intrinsic rewards to solve continuous control problems. Similarly another research illustrated the proof of concept for biologically-plausible socio-emotional robots. The robot was able to execute an amygdala model to determine emotional state from visual input. The model was processed by using SpiNNaker, Loihi and Braindrop neuromorphic chips [119].
As mentioned earlier, developing brain-inspired networks that mimic the complex structure of the brain is a very difficult task. Collaboration between a researcher from biology, neuroscience and computer science helps to design somewhat biologically realistic models of the brain on SNN. To overcome this complexity, the Neurorobotics Platform was introduced to easily establish communication between available brain and body models. It is a web-based environment where brain models can connect to a detailed simulation of robot bodies [136]. There are built-in experimental sequences, environments, conditions, robots and brain-body connectors to help users with less programming background [3]. This project is part of the EU flagship Human-Brain Project (HBP). The functional overview of the Neurorobotics platform is presented in Figure 13. Figure 14 is an expanded form of Figure 1. It shows the sensors and chips in the hardware section. Here the red outline boxes show that the chip is not used in robotics applications yet. The software section contains a list of simulators, frameworks and platforms. It also shows the neuron models typically used in robotics applications.

VI. FUTURE DIRECTION
In the previous sections, contributions and advancements in SNN-based social robotics have been reviewed in terms of their focused applications. This section focuses on the critical analysis of reviewed articles. We discuss the shortcomings in this area and what are the major areas on which future research should focus.
Humanoid robots: Although many experiments are conducted in the area of neuromorphic computing for robotics, most of them are very basic when it comes to cognitive tasks [105], [110], [99]. Usually, they focus on robots moving around a dedicated environment, constructing the spatial map, or storing some information in its memory block. Few pieces of research considered human involvement in the experiments for detection and identification but they lack human-robot interaction in social context. Besides this, none of the experiments were done with a humanoid robot. So, there is still a need to conduct experiments with a humanoid robot where the robot also interacts with humans instead of just interacting with the environment.
Neuromorphic Chips: As we can see in Table 4, only two types of neuromorphic chips are used in our reviewed papers. Experiments with Intel's Loihi chip are mostly related to improving the memory or spatial mapping while experiments with the SpiNNaker architecture are focused on behavioural learning and visual attention. In the experiments with SpiN-Naker, the neuromorphic chip is not attached to the robot. The SpiNNaker system receives inputs from the sensors in the robot via the EIEIO protocol for processing and returns action commands. On the other hand, experiments with the Loihi chip don't contain humanoid robots. In most of the reviewed articles a turtlebot is used, which is equipped with multiple sensors and a Loihi chip. This shows that there are still several open options like using other available neuromorphic chips and conducting experiments in more complex scenarios where the humanoid robot has to make decisions in real-time.
Hardware: Many experiments conducted in our reviewed articles use non-spiking sensors (e.g. web camera is used instead of a retina camera [127]). These sensors send the signal to the processing unit, which converts it to spikes and feeds them to the neuromorphic hardware. This process could get faster and energy efficient if neuromorphic hardware is used instead, which is an important requirement when it comes to social robotics.
Personalization: Another challenge faced by the social neurorobotics field is about personalising the robot. This can only be done through interdisciplinary research of roboticists and neuro-scientists. Usually, roboticists use simplified brain models to make real-time simulations, while neuro-scientists work on detailed brain models which are difficult to embed into the real world due to their high complexity. The community needs more solutions like the Neurorobotics Platform which provides adequate tools to model highly detailed environments, virtual robots, and complex neural networks for both roboticists and neuro-scientists.
Generalised Framework: None of the reviewed articles offers a general-purpose framework that could offer functionalities of training and modelling. Besides this, the available algorithms to convert ANNs to SNNs [137], [138], [57], [58] are still in preliminary stages because accuracy of converted SNN models is much lower than ANNs [139]. The nature of this situation is that training SNNs for deep networks is notoriously difficult. In the future, advancements in these ANNs to SNNs conversion algorithms are required. Moreover, specialized mechanisms for SNN models performance analysis are lacking. There is a need for better mechanisms to evaluate the computational capabilities such as power consumption and speed, which are of vital importance for social robotic applications.
Ethics: Lastly, some aspects are rarely considered till now, such as ethical aspects in social neurorobotics. The development of social neurorobotics is still in its early stages, which makes it an ideal candidate for proactive and anticipatory ethical reflection. The major concern is trust and safety when it needs to be decided whether to use the robot in a social environment or not. Another aspect that might affect the adaptability of social neurorobotics is data privacy. Where and how data from the sensors of robots is being processed and how to share this data with another robot in a socially interactive environment. Therefore, research is needed to ensure that the development of neurorobotics is ethical, desirable, and socially acceptable.

VII. CONCLUSION
The neuromorphic computing approach has shown great promise for achieving human-like robotic intelligence in terms of computation, speed, and energy efficiency. This will be greatly beneficial in building robots that can exhibit realistic human-like social interaction. To support further research toward socially interactive neuromorphic robotics, this article delivered a systematic literature review of neuromorphic computing methods and tools that can be applied to improve socially interactive robotics. It first introduced the biological justification of SNNs along with their general architectures. After that, we presented mainstream neuron and synapse models that are used to design SNNs. The reviewed articles were distributed according to the application or problem they are trying to solve and we highlighted five major areas of application (signal acquisition and processing, pattern recognition, speech recognition, motor control, cognition and learning). Finally, the most relevant neuromorphic chips, simulators, frameworks, and robots are discussed in the view of their application to socially interactive robotics. Most of the reviewed articles focus on improvements of the existing techniques for one of the specific applications rather than exploring their full integration to build more capable systems. Only a few pieces of research demonstrated social interaction between robots and humans in real-world scenarios, but robots had several limitations and were able to interact only in a closed-loop fashion. Besides this, the lack of universal training methods and conversion mechanisms for neuromorphic models is also a major challenge for the development of human-like social interactions in neurorobots. To this end, we remark that more interdisciplinary collaborations, especially between roboticists and neuroscientists, are needed to fully develop the merge between neuromorphic computing and social robotics.