Computational Modeling of Prefrontal Cortex for Meta-Cognition of a Humanoid Robot

For robot intelligence and human-robot interaction (HRI), complex decision-making, interpretation, and adaptive planning processes are great challenges. These require recursive task processing and meta-cognitive reasoning mechanism. Naturally, the human brain realizes these cognitive skills by prefrontal cortex which is a part of the neocortex. Previous studies about neurocognitive robotics would not meet these requirements. Thus, it is aimed at developing a brain-inspired robot control architecture that performs spatial-temporal and emotional reasoning. In this study, we present a novel solution that covers a computational model of the prefrontal cortex for humanoid robots. Computational mechanisms are mainly placed on the bio-physical plausible neural structures embodied in different dynamics. The main components of the system are composed of several computational modules including dorsolateral, ventrolateral, anterior, and medial prefrontal regions. Also, it is responsible for organizing the working memory. A reinforcement meta-learning based explainable artificial intelligence (xAI) procedure is applied to the working memory regions of the computational prefrontal cortex model. Experimental evaluation and verification tests are processed by the developed software framework embodied in the humanoid robot platform. The humanoid robots’ perceptual states and cognitive processes including emotion, attention, and intention-based reasoning skills can be observed and controlled via the developed software. Several interaction scenarios are implemented to monitor and evaluate the model’s performance.


I. INTRODUCTION
Today, the humanoid robots are working together with humans as a personal assistant. However, they will take on more duties in the future. Due to its bipedal structure and physical capabilities, humanoid robots are very compatible to interact with human nature. Recent advancements in paradigms of artificial intelligence and cognitive neuroscience can contribute to the evolution of their technologies [1], [2]. In the next decades, it is expected that humanoid robots having cognitive skills will be widely used in social areas such as assistive, entertainment, and rehabilitation fields [3]- [6]. Robots using social areas require some human-like cognitive skills such as reasoning, decision making, problem-solving [4]. These embodied skills may organize very complex behavior patterns rather than perform deterministic or repetitive tasks [5]. Humanoid robots with enhanced embodied cognitive abilities can be used The associate editor coordinating the review of this manuscript and approving it for publication was Wei Zhang.
to assist disabled individuals struggling to interact with their social environment by guiding their accessibility and communication [4], [5].
As future deliverables of this study, it is considered that humanoid robots competing with the intelligence of the human race can socially interact and collaborate with every social area as behaves like being a part of humanity so that they increase the living standards of its society. In order to realize that, the main purpose of this study intends to construct a suitable computational framework of a cognitive model for a humanoid robot. Because cognitive skills that are biologically existed in the human race can help to realize these goals [7]- [9]. Therefore, it is essential to investigate the biological nature of cognitive systems by the viewpoint of a humanoid robot.
In nature, these cognitive functions and skills could be biologically realized via some cortical and cerebral lobes in the human brain [10], [11]. The anatomical structure of the cerebral cortex includes two main cortical structures called frontal and posterior parts [12], [13]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. Human brain cortical zones [7].
Cognitive functions related to reasoning, planning, and working memory skills involve in the prefrontal cortex which is a part of the frontal lobe of the cerebral cortex ( Figure 1). This part of that region is divided into four sub-cortical areas including dorsolateral, ventrolateral, anterior, and medial prefrontal regions [14]- [16]. Dorsolateral and ventrolateral prefrontal cortices realize spatial and temporal reasoning tasks respectively. The anterior prefrontal cortex (or Brodmann Area 10) perform emotional reasoning tasks. The medial prefrontal cortex including anterior cingulate cortex (ACC), dorsomedial PFC (dmPFC) and orbitofrontal cortex (OFC) regions, involves working memory and meta-cognition. Monitoring and association of these reasoning skills are realized by this computational unit.
As a big challenge, this study focuses on solving interaction games, logical puzzles to investigate planning problems, interaction failures during human-robot interaction (HRI) [17]- [27]. The content of interaction game or logical puzzle can include memory-based sorting or prediction tasks. Interaction games, logical puzzles can useful testbed for integrating emotion, intention, and attention-based reasoning skills into these problems [26], [27]. Therefore, the problem statement of the paper covers and aims whether a humanoid robot which is a personal assistant or companion can achieve these goals by establishing rational social interaction with humans. Since human-like social interaction of the robot with humans is a very hard issue and requires high-level mental skills including joint attention, the performance of spatial-temporal and emotional reasoning, meta-cognitive planning gain very critical importance in the computational mental model of a humanoid robot, during the interaction experiments between robot and human [17]- [20]. While the humanoid robot is struggling to interact with a human, they can establish a link including communication channels of emotion (feeling), skill, and physical resource sharing [21]- [23]. Thus, humanoid robot as a multi-modal mediator can be utilized in assistive case studies such as rehabilitation of the children suffering from attention deficit or autism spectrum disorders [24], [25]. Because the humanoid robots like these children may not achieve accurate meta-cognitive planning, focusing, sensory-motor association and may not execute right decisions, rational emotional responses, precise motor commands which help to exhibit spatial behavior patterns (e.g., movement skills like imitation gestures, tracking the objects) [22]- [24].
Recently, brain-inspired computational cognitive architectures were developed for human-robot interaction (HRI) with humanoid robots to solve behavior planning, meta-cognitive reasoning problems [17]- [21]. Such projects are rapidly increased, and they are expected to rise as an emergent issue in the future. Some very good and remarkable examples related to a computational architecture based on cerebral/neocortex models of the brain were presented in recent years [1]- [6]. In these studies, various machine learning methods were employed for modeling of embodied cognition and adaptive behavior. Khamassi et al. developed a computational neurocognitive model of the prefrontal cortex for navigation of rat-like robots [28]- [34]. In their model, reinforcement learning was utilized to realize the cognitive functions of these cortical structures. Yildirim et al. [35] introduced an integrative cognitive architecture for object driven cortex. They dealt with a combination of multiple AI methods including causal generative models, hybrid symbolic-continuous planning algorithms with physics-based dynamics. In order to realize these, the object-driven cortex system benefits from the biological evidence of the specific regions in the human brain. Gómez-Martínez et al. [36] put forward a bio-inspired selfresponding emotional behavior system for virtual creatures. This model enabling virtual creatures works as a concurrent and parallel distributed system to accord to their environment and generate more reliable behavior. A case study with the interaction scenario to be executed was implemented by the 3D simulation platform so that the results of the study were observed when the creature interacts with the environment. Hernández García et al. [37] utilized a biologically plausible spiking neural network to realize visual attention and object labeling for the iCub humanoid robot. According to this, the cortical neural network model imitating the human brain functions provides an associative memory that can learn words of the objects by visual attention. Experiments were conducted with both a simulated and a real iCub robot platform which is capable of associating a label to an object when visual and word stimuli are provided simultaneously in the experiment scene. The planning and decision-making based computational model considering effective information was developed by Cervantes et al. [38]. This computational model inspired by some brain areas organized as many sub-modules providing related cognitive functions such as emotional system, rational decision-making, episodic memory, and motivational system. The claims of their study were tested by a case study that experimental results included a remarkable comparison. Mizutani et al. [39] endeavored to realize whole-brain connectome architecture for achieving general purpose artificial intelligence. Their study aims to develop biologically meaningful AI architecture which includes neural circuit representations of the entire brain. Verschure recently reported a set of design principles underlying the mind, brain, and body nexus (MBBN) [40], proposing a brain-inspired cognitive architecture composed of several computational columns and layers. An embodied biologically plausible model which has adaptive behavior skill studied by Maffei et al. [41]. In both models, the computational model involving the brain's ability to maintain strong interaction between an embodied agent and its environment through action is ensured via distributed adaptive control (DAC) architecture. The DAC-X architecture [41] is organized into two complementary structures of layers and columns, providing the computational neuro-cognitive architecture for use in robots. A set of layers, including reactive, adaptive, and contextual layers, describes the developmental stages of the architecture. The columnar structures, including the processing of states of the world, the self, and the generation of action, describe functional sub-modules of the architecture. A range of hypotheses related to the work has been verified utilizing humanoid and mobile robots in a range of application scenarios. Zeng et al. [42] struggled on imitating self-consciousness using brain-inspired modeling for a humanoid robot. In their study, it is aimed to build a computational approximate model of the primate mirror neuron system to realize cognitive functions such as self-recognition. In the first stage, the correlations between self-generated actions and visual feedbacks in motion are learned by the robot using a spike timing dependent plasticity (STDP) algorithm. In the second stage, the expected appearance of body gesture that the visual feedback is consistent with its motion is learned. During these tasks, the robot utilizes multi-sensor fusion to learn its own body in the real world and in the mirror. Vernon et al. [43] pursued a study for joint episodic-procedural memory in cognitive robotics. According to their idea, joint episodic-procedural memory enables the internal simulation to be conditioned by current context, semantic memory, and the agent's value system providing the motives, while context and semantics restrain the integrated explosion of potential perceptionaction associations and allow effective action selection to achieve the goals. Samsonovich proposed an emotional biologically inspired cognitive architecture (eBICA) based on mental schemas, producing a model involving schemas and mental states as the major components and moral schemas embodied into appraisals as attributes [44]. In this model, patterns of social emotions and semantic spaces are represented and controlled by these appraisals, and emotional modulation processes behavior and clusters emotional states in the arousal-valence domain. In an experiment involving human subjects and virtual agents, the proposed architecture was tested based on a simple paradigm involving a virtual world. The results revealed that eBICA with a moral schema was able to manipulate human behavior. Such a framework may be useful for enabling collaboration between virtual partners and humans, self-regulated learning of virtual agents, and the development of realistic emotional intelligence. In addition, Sengor et al. reported a robot model based on cortico-striato-thalamic circuits, developing a computational model of the basal ganglia for use with a mobile robot [45]. The main purpose of this field of research is to investigate the potential use of robot models for implementing intelligent systems to inspire new approaches and techniques. A number of studies have examined reward-based computational mechanisms, shedding light on biological explanations of animal action selection processes. One study used a Khepera II mobile robot to study the implementation of goal-directed behavior. Moreover, computational neural models of cognitive processes have been utilized to describe higher-order processes like goal-directed behavior. A neuro-computational model of the auditory-cue fear acquisition was investigated by Navarro-Guerrero et al. [46], elucidating several principles of fear conditioning necessary for developing adaptive, self-protective systems, and suggesting that sensory-motor processing is an essential component of fear learning. This hybrid approach has been reported to be an effective tool for examining the temporal relationship between auditory sensory cues. A detailed study of computational mechanisms based on neural circuits in the brain has supported the development of safer robots and a better understanding of fear processing.
In this study, we present a novel complete solution that covers a computational model of the prefrontal cortex which is composed of several sub-modules including dorsolateral, ventrolateral, anterior, and medial prefrontal regions for humanoid robots. The main contribution of this framework is to develop a computational approximation of the consciousness, awareness so that the humanoid robot can perform high level (meta-cognition) reasoning which associates spatial-temporal and emotional reasoning skills. In this architecture embodied into a humanoid robot, one of the major impacts is that the robot as a synthetic live form establishes human-like interaction with society and its environment. Using reinforcement learning algorithm, spatial-temporal and emotional reasoning skills realize several processes including attention, short-term memory, decision making (e.g. arithmetic and logical), planning, analysis of cause-effect relations, and problem-solving. These activities are monitored by the medial part of the computational prefrontal cortex model which is working as a reward mechanism. All cognitive transactions in the proposed prefrontal cortex model are organized into working memory. The working memory which is a type of associative memory is stored in the weights of an attractor network. According to forward and backward propagation in the network, imagination, recalling and prediction abilities can appear. For this framework, weights (synaptic strengths) of the network related to working memory are updated by various learning paradigms corresponding with the prefrontal cortex model. The unsupervised learning method is utilized for the clustering of the observation data constructed in the cognitive map (working memory). For attention, intention, emotion and meta-cognitive reasoning, the probabilistic model utilizing reinforcement learning can be considered for learning skills including inference, decision making and planning in the robot. In the proposed model, post VOLUME 8, 2020 linguistic processing is realized by applying grammar rules, and acquired linguistic statements are located in the working memory.
The article follows with chapter 2 that explains the problem statement and description of related hypotheses. In the following section, chapter 3 gives background knowledge about materials and methods. In chapter 4, it is determined the design principles of the computational model of the prefrontal cortex realizing working memory, decision making, metacognitive planning, spatial-temporal reasoning, and inference on the humanoid robot. In chapter 5, the detailed information about the experimental platform and implementation of test scenarios are introduced. Then chapter 6 depicts experimental results and performance evaluation statistics. Finally, discussion, concluding remarks, and future works are presented in chapter 7.

II. PROBLEM STATEMENT AND EVALUATION CRITERIA OF THE GOAL
For decades, as a concept of cognitive control, functions of prefrontal cortical regions where behavioral procedures require to be modified or reorganized, responsible for the regulation of complex decision making, meta-cognitive planning, and reasoning, have been widely worked by neuro-cognitive and robotics researchers [47], [48]. These efforts contributed to the social interaction experiments with humanoid robots.
In this section, the major concern to be addressed is how to evaluate or measure the objectives of the problems in a rational social interaction between the humanoid robot and the human. In order to do this, an interaction game like a logical puzzle including memory-based sorting or prediction tasks is utilized to test hypotheses achieving the objectives of this study. Therefore, the proposed study is hypothesized not only to involve relations between subdivisions of the lateral PFC including dorsolateral, ventrolateral, and anterior regions but also to involve relations between the medial and lateral PFC for the field of computational neuro-cognitive robotics and its social interaction applications.
In shed light of these facts, I proposed a hypothesis (H1) to ensure that memory-based sorting or prediction tasks like a logical puzzle game are robustly solved by using spatiotemporal and emotional reasoning skills trained by reinforcement learning in a computational model of lateral PFC during rational social interaction. Spatial reasoning skills performed by dorsolateral PFC correspond to behavioral (movement based) intentions. Attention-based mental activities are associated with temporal reasoning skills realized by ventrolateral PFC. Emotional reasoning skills are performed by anterior PFC region. According to this, if all these skills including attention, intention, and emotion-based reasoning activities are simultaneously employing to contribute solution of the problem, it is expected that durations of achieving simple sorting or prediction tasks in the puzzle game are decreased while the accuracy of the tasks is increasing during the experiments.
Also, I put forward a hypothesis (H2) to ensure that spatialtemporal and emotional reasoning skills in lateral PFC are regulated by a computational model of medial PFC during human-like social interaction with a humanoid robot. A computational model of medial PFC composed of several sub-modules including anterior cingulate cortex (ACC), dorsomedial PFC, and orbitofrontal cortex (OFC) involves association and supervision procedures in order to monitoring and reorganizing of these mental transactions. In the medial PFC, the computational aspect of meta-cognition which is interpreted as a multi-modal reward mechanism provides support reinforcement learning activities of lateral PFC involving in spatiotemporal and emotional mental skills. The reward computation activities of these mental skills, based on the action-reward association are proceeded by ACC dynamics. And stimulus-reward association grounded reward computation activities are executed by OFC dynamics. For decades, by many researchers, concepts of consciousness and selfawareness have been associated with meta-cognition which is considered to take a major role in human-like cognition. For meta-cognition, if the meta-cognition mechanism in a computational model of medial PFC is integrating appropriate rewards to all mental functions of lateral PFC for regulating complex decision making, meta-cognitive planning and reasoning activities, convergence errors in reinforcement learning of lateral PFC related skills including attention, intention and emotion-based reasoning activities are decreased while success rates of the simple sorting or prediction tasks in the puzzle game are increasing during the experiments.
Also, another key issue is that dopaminergic gain released by ventral tegmental area (VTA) for tuning the parameters (e.g. learning rate, reward discount factor and exploration rate, etc.) affects not only reinforcement learning process phase but also reward computation phase in medial PFC. In order to realize synaptic neural plasticity which can provide a great advantage for adaptation, an optimization algorithm is employed by an unsupervised learning method. Furthermore, according to different conditions, synaptic neural plasticity can be parameterized via dopamine modulation for optimizing convergence speed. In shed light of this, as a neural plasticity mechanism, if an optimization algorithm contributes to obtaining a better dopaminergic gain parameter, learning processes will be faster and decision-making activities will be more robust. Also, this optimization mechanism with parametric modulation is a very important issue in which short-term memory in PFC interacts with episodic memory (e.g. historical data related to past events during the interaction) released by hippocampal regions.

III. BACKGROUND ABOUT MATERIALS AND METHODS
In order to achieve goals and examine proposed hypotheses, anatomical and functional features of PFC in a human brain should be analyzed by principles of mathematics, cognitive science, and neuroscience. According to cognitive-neuroscientific evidence and the viewpoint of computer science, the prefrontal cortex which is the key point for developmental meta-cognition might be interpreted by combinations of different computational methods and machine learning paradigms. As an artificial intelligence methodology, this approach can be employed by neurocognitive robotics, human-robot interaction, and social robotics applications.
The neural models are very useful methods for representing and encoding cognitive activities. Naturally, computational modeling of cognitive and mental processes in the human brain, for application in humanoid robots, requires high-density neural structures [7], [8]. Among these models, spiking neural networks (SNN) [49]- [51] including spike response models, integrate-fire models, Izhikevich model, biologically plausible models (Hodgin-Huxley model, conductance or ion flow-based models) are the most similar to biological neural systems. Spiking neural cells can be represented as a nonlinear circuit with a capacitor like integrator. In addition to this, stochastic and chaotic behaviors like dynamic attractions are taken into consideration. The conductance-based neural model and its population activity can be generally expressed by where I ext represents the external input current, including synaptic and interconnectional. Also, V j depicts the j th neuron's membrane potential, with a capacitance coefficient C. Potential E i indicates reversal potentials related to their ion channels. Their conductance parameters g i and reversal potentials E i help to generate ion currents through gate variables ϕ i , which are computed by ordinary differential equations. Ensemble or neural population activity is computed as A k (t), where δ, N k are a number of spikes in the time interval ( t), neuron numbers in k th neural population respectively. The cloud of spiking neurons is driven by firing rate models of a neural population. Besides the main neuron pool, there can be some kind of sub-populations called excitatory and inhibitory populations. The general form of neural population density can be interpreted by Jansen-Rit's neural mass model equations [52]. There is a very close relationship between the signal relay from a population density of a neural mass and electroencephalography (EEG) activity [53]. The term A k (t) which is the mean ensemble firing rate of presynaptic excitatory input is passed into the computational model of the neural mass. Then population dynamics driven by neural mass equation provide information flow in the computational neural tissue. A neural population activity stimulating the neural mass is generated by the contribution of mean firing rate coding which involves synaptic firing density, and spiking counts in a neural population of spiking neurons. Assembling small groups of neural populations produces larger complex structures (e.g., cortical regions or neural tissues) [54].
Their connection topologies allow brain activities with chaotic dynamic characteristics to be computationally reproduced for artificial brain frameworks.
As a large-scale neural activity, field dynamics that visualize cognitive behaviors might be associated with functional magnetic resonance imaging (fMRI) [55], [56]. Dynamic neural fields (DNF), a dynamic field (DF) branch, are driven by the Amari equations [57], [58]. The field activities which exhibit certain stochastic and nonlinear dynamic properties behave like wave packets traveling along the neural field.
where = x − x is a spatial information distance from the mean of the cortical field. A weight matrix w i ( ), which is a function of the spatial information distances, includes connection strengths of the synaptic activation inside the field U i (x,t). The function f is a sigmoid activation function.
The parameter h corresponds to the bias. External effects can be realized in the exogenous connection matrix S j (x,y) and exogenous input field activity I d (y,t). As parallel and distributed computing blocks, the field dynamics of the computational neural tissue improve the model employing biophysical, meaningful spiking neuron populations and neural masses. According to field activity U i (x,t), the dynamic associative memory related to cognitive skills can be shaped by multi-modal or strange attractors including chaotic behaviors, bifurcations, limit-cycles, and chattering. The attractive or repulsive forces are occurred by the gradient of the field potentials in n-dimensional U i (x,t). As a fragment of the short-term memory, the associative memory pattern related to a cognitive behavior is formed by the trajectory of the specific memory state expressed with q i (t). The attractor network behavior which is a nonlinear dynamic system can be expressed by following differential equation; where α(t) defines the magnitude of the attractive or repulsive forces ∇U i (x,t) and the speed of memory state transition along with the trajectory. Also, stability features of the attractor dynamics can be evaluated by Lyapunov analysis which affects convergent/divergent characteristics, saddlepoints, cyclic behaviors (oscillations), or point attractors (e.g. stable focus, a basin of attractor) [59]. In addition, nonlinear neural dynamics contributing realization of the cognitive skills provide a good opportunity for solving problems of the dynamic programming system [60], [61]. Thus activities of cognitive skills can be interpreted as the dynamic programming system like a typical finite state machine with discrete state transitions [60], [62]. The probabilistic representation of the discrete state transitions can be evaluated by Markov process which is a stochastic system [60]- [62]. According to whether sequential states VOLUME 8, 2020 are observable, the Markov process which is similar to Bayes networks is divided into several types such as hidden Markov models (HMM), Markov decision process (MDP), partially observable MDP (POMDP) [63], [64]. The Bayesian networks driven by Bayes rule can be represented as graph topology of the state network including probabilistic causal relations a ij between the nodes (states).
The causal relations or transition probabilities are stored in a matrix that defines the network topology related to Bayesian network-based models or Markovian models [65], [66]. Also, these models which are essential for reasoning and inference process can utilize different training algorithms including maximum likelihood estimation, mixture models (e.g. Gaussian mixture model (GMM)), Viterbi algorithm, forward-backward, Baum-Welch algorithms and Bayes learning rule [64]- [67]. Usually, the MDP based models that can be solved with value iteration algorithms are related to reinforcement learning methods employing Bellman learning rule [61]. The representation of every sequential node (states) in a network of Bayesian or Markovian domains can attribute extra computations including some decision-making processes [60], [66]. These processes related to reasoning and inference operations might involve not only deterministic or rule-based approaches but also probabilistic or stochastic approaches. The rule-based decision making methods (e.g. decision trees, random forests, fuzzy logic tools like fuzzy cognitive maps) are used to make specific rule-based decision operations that can provide ''if-then'' like cause-effect reasoning activities [68]- [72]. Decision tree approaches that constitute stacked rules employ hierarchically structured form including several nodes and branches [68], [70]. As a bundle composed of multiple decision trees, random forest which is an ensemble learning method is aggregated via a stochastic approach for classification, regression, and other complex decision-making tasks with multi-modal goals [69], [70]. Several supervised or unsupervised learning methods can be applied to these models. In addition, probabilistic or fuzzy uncertainties can be attributed to decision-making tasks. Different rules related to these tasks can be created via several fuzzy variables as uncertainty which is a result of membership function [71]. The fuzzy cognitive map (FCM) introduced by Bart Kosko is defined as a network topology of causal relationships between the nodes (e.g., concepts, events) [72], [73]. For rule-based FCM, these nodes can be replaced by the rules with fuzzy variables [74], [75]. As a typical variation of FCM, the causal relationships can have probabilistic functions. Hebbian learning or genetic algorithm based training methods have been mostly utilized in FCM [76].
Because of the fact that very huge amount of data processing is required for the realization of the cognitive skills  and executing mental activities in PFC, different deep learning paradigms can be considered such as a convolutional neural network (CNN), recurrent neural network structures like long-short term memory (LSTM) and reinforcement deep learning model as deep Q-network (DQN). Especially, according to conditions, they might be combined as a hybrid machine learning approach.
For the classical structure of a convolutional neural network which has strong feature learning capacity resembling high-level abstraction processes in the human brain, the sequence of the cascaded layers is stacked as [input (x)convolution layer -ReLU -max pool layer] (Figure 2). The convolution operator requires convolution filters (weights) as a rectangular prismatic tensor. Rectified linear unit (ReLU) is an activation function. This neural network is trained by stochastic gradient descent (SGD) with momentum so that features to be encoded are hierarchically extracted from the pattern of large scale neural transactions [77]- [79]. As a recurrent neural network (RNN), the LSTM (Figure 3.a) resembling memory activations in the human brain can be constructed by convolutional layers. It is capable of storing short-term memory contexts for a long-term period in the episodic memory pattern [77], [78], [80]. A conventional component of the LSTM is composed of a memory cell including input, output, and forget gates. For the training session, the backpropagation through time algorithm can be preferred [80], [81].
Deep Q network (DQN) combining the Q-learning algorithm with the deep neural networks can be re-organized by a LSTM network embodied with convolutional layers [82]- [84] (Figure 3.b). As a reinforcement learning method, DQN based models which allow sequential decisionmaking tasks can be associated with meta-cognitive activities in medial PFC in the human brain. In addition, instead of a conventional reinforcement learning approach, these methods with deep neural networks provide a more flexible function approximation feature to reach human-level performance on rational social interaction involving complex decision-making skills. For optimization in the training phase, the loss function of the network is computed as follows, where q is a learning parameter set. The experience memory replay (D) is a set of tuples composed of (s,a,r,s'). r and Q(s,a) are a reinforcement reward and Q value including state (s) and action (a) pairs respectively. SGD algorithm updates the experience memory replay, while it is struggling to minimize the loss function. For the computational PFC model of the humanoid robot, additional model parameters (e.g. dopaminergic gain, motivation factor, learning rate, etc.) can be optimized by an adaptation algorithm. The optimization mechanism which employs an adaptation algorithm is mostly chosen by unsupervised learning methods (e.g., self-organizing map (SOM)) or metaheuristic methods (e.g., genetic algorithm (GA)). According to the chosen fitness criteria, the parameters to be adapted can be searched by a genetic algorithm. During this process, some conventional operations including mutation and crossover are utilized to obtain optimal values of the parameters [85], [86]. The self-organized map is preferred for tasks such as clustering, dimension reduction on the parameters to be adapted [87], [88].

IV. COMPUTATIONAL PREFRONTAL CORTEX MODEL
The computational model of the prefrontal cortex ( Figure 4) which is crucial for many mental abilities such as metacognition, consciousness, self-awareness is built on large scale ensemble bio-inspired spiking neural structure-based dynamic domain such as neural field, attractor model. By this approach, computational modeling of the mental activities including cognitive skills, behaviors, emotions and rational thinking (planning or sequential decision-making tasks) which are represented as a sequence of statements, stacked rules, can be dynamically encoded by the temporal synaptic activations (e.g. spiking, ensemble and field dynamics) in the high-density complex network topologies (weights). The computational neural structure of PFC which is suitable to neuro-morphological equivalent (e.g., human brain) is composed of several subcortical regions (or modules) such as dorsolateral (dlPFC), ventrolateral (vlPFC), anterior (aPFC) and medial (mPFC) prefrontal cortex areas. Two major data streams coming from parietal and temporal regions (lobe) of the sensory cortex are propagated into dlPFC and vlPFC modules of the computational model of lateral PFC by dorsal (where) and ventral (what) pathways respectively. In addition, the prefrontal cortex is supported by connections of limbic system components including amygdala, thalamus, hippocampus, basal ganglia, hypothalamus so that it regulates cognitive skills like decision making and planning abilities. Extensive parallel and distributed neurocognitive transactions employing the stochastic recursive mathematical models and dynamic nonlinear programming methodologies occur in the computational PFC model including hybrid machine learning algorithms like a recurrent and reinforcement deep learning algorithm with partially observable variables. These partially observable variables or internal states are represented by a sequence of statements, stacked rules in the proposed huge neural network structure. In order to achieve adaptation, some tunable hyper-parameter set as a dopaminergic gain vector, learning, and exploration rates can be optimized by metaheuristic learning methods (e.g., genetic or evolutionary algorithm) so that learning processes and decision-making tasks are performed faster and more robust.

A. LATERAL PREFRONTAL CORTEX (LPFC)
The computational model of lateral PFC including dlPFC, vlPFC, aPFC allows that memory-based sorting or prediction tasks like a logical puzzle game are robustly solved by using spatiotemporal and emotional reasoning skills during rational social interaction. The observations o t are fetched as system As the exogenous stimuli, dlPFC accepts and process spatial data stream (dorsal or where pathway). In the computational model of lateral PFC, dorsolateral prefrontal cortex (dlPFC) enables to realize abilities of spatial cognitive skills including intentions, spatial planning, and decision-making tasks. These tasks are related to objects or events of spatial properties including coordinates, orientations, distances, dimensions and movements. Moreover, dlPFC involves in developing working memory related to spatial thinking. It is linked to ventrolateral, anterior (Brodmann area 10), and medial prefrontal cortex regions. As an output data stream, the connections between basal ganglia and dlPFC are associated with action selection on sequences of motor commands and procedural memory development. In addition, the connections between dlPFC and motor regions (or motor cortex) are responsible for the generation of motor commands (or actions). The dorsolateral part of the state transition P dlPFC (dl(t+1)| e(t), vl(t), a(t)) and the observation models P dlPFC (od(t)| e(t+1), vl(t+1), a(t)) are derived by a computational dynamic memory model including high level neural transactions such as field activations and spiking behaviors which are generated by the certain number of spiking neurons. In the computational model of lateral PFC, temporal data stream (ventral or what pathway) which is an exogenous stimulus is processed by ventrolateral PFC (vlPFC). It allows realizing abilities of temporal cognitive skills including attention, temporal planning, and decision-making tasks. These tasks are related to objects or events of temporal properties including descriptions, shapes, colors. Furthermore, vlPFC is effective in developing the temporal part of working memory related to non-spatial thinking. It is connected to dorsolateral, anterior (Brodmann area 10), and medial prefrontal cortex regions. The neocortex inspired computational dynamic memory model including high level neural transactions such as field activations and spiking behaviors, is employed for derivation of P vlPFC (vl(t+1)| e(t), dl(t)) and P vlPFC (ov(t)| e(t+1), dl(t+1)) which are the ventrolateral PFC part of the state transition and the observation models respectively.
Emotion related cognitive skills like emotional memory processes, emotional reasoning, and inference tasks are performed by anterior prefrontal cortex (aPFC) module. The aPFC accepts exogenous input data stream including emotional responses released from the amygdala as a part of the limbic system. In addition, aPFC, providing excitatory or inhibitory control on dlPFC and vlPFC modules, decide how emotional information (states) reacts to behavior sequences. A computational dynamic memory model driven by the neocortex like high level neural transactions such as field activations and spiking behaviors is utilized to obtain the anterior prefrontal cortex (aPFC) part of the state transition model P aPFC (e(t+1)| dl(t), vl(t)) and the observation model P aPFC (oa(t)| dl(t+1), vl(t+1)). The belief propagation is calcu-lated by the following equation; The reward function ρ(b,a) depending on the belief states is computed using the reward function based on the world states.
The Q values of the system are learned according to Bellman equation; The optimal value V t (b) is obtained by maximizing the Q values with respect to actions. The computational model of lateral PFC is optimized by minimizing the loss function. The loss function L(b, a;θ i ) to be minimized is written; A computational model of lateral PFC ( Figure 5) consisting of a deep reinforcement learning algorithm with partially observable dynamics is constructed by a LSTM network embodied with convolutional layers. Where α is a parameter related to training speed. The network weights θ i which are updated by the gradient descent algorithm provide to be obtained the optimal Q values via decreasing the difference between approximate Q t (b, a;θ i ) and actual (target) Q values Q target (b, a;θ i ) which are computed by Bellman equation. Obtained network output h t is associated with approximate Q values. The observation information and the computed reward coming from mPFC are concatenated and represented with x t which is fed into the LSTM. c t indicates the cell memory. Due to contribution of the cognitive skills including attention, intention, and emotion-based reasoning activities for solution of the rational social interaction problem with a humanoid robot, it is expected that durations of achieving simple sorting or prediction tasks in the puzzle game are decreased while the accuracy of the tasks is increasing during the experiments.

B. MEDIAL PREFRONTAL CORTEX (MPFC)
As one of the most critical regions, the medial PFC module which is composed of several subcortical areas including dmPFC, vmPFC (OFC), and ACC, has supervisor or regulator functionality on the lateral PFC. During a rational social interaction between human and humanoid robot, the states of the spatial-temporal and emotional reasoning skills are  monitored by a computational model of mPFC ( Figure 6) which is responsible for some cognitive and mental progress such as monitoring and reorganizing of complex decision making (e.g. planning or problem-solving), attention, metareasoning, associative working memory, adaptive learning, behavior execution, emotion regulation, multi-modal integration of action-reward and/or stimulus-reward association, language processes and expectation (prediction) tasks.
An arousal signal produced by mPFC is released to the hypothalamic module so that motivational gain (e.g., hypothalamic responses) modifies perceptual parameters for optimizing cognitive perception skills in the sensory cortex regions. In addition, past information related to long-term or episodic memory, coming from the hippocampus is observed for reward computation processes. Besides of that, distinctive information related to short-term or working memory, processed by modules of the PFC is broadcasted to the hippocampus model. In order to optimize the parameters of the reward computation process, the dopaminergic gain released by the ventral tegmental area (VTA) is employed by sub-modules in the computational medial PFC. As a property of meta-cognition on a rational social interaction scenario between the human and the humanoid robot, the reinforcement learning activities on the lateral PFC involving in spatiotemporal and emotional mental skills are controlled via multi-modal rewards computed by sub-modules of mPFC. This approach serves like inverse reinforcement learning methodology realizing reward estimation function. At the first stage, the inference of the internal sub-states (c) extracted from b(s t ) and hippocampal information (o t hc , a t hc ) are estimated by the random forest model. R acc (s t , a t ) = P(c|b, a t hc ) (16) As a part of multi-modal reward computation activities, the action-reward association functions R acc (s t , a t ) are performed by the ACC module which is sensitive to past actions. A sequence of past actions shapes these activities. Also, the reward computation processes related to the stimulusreward association functions R ofc (s t , a t ), which are realized by the OFC module, are modified via a sequence of past observations (stimuli). The fusion of these associative functions (the stimulus-reward and the action-reward) is supervised by dmPFC. In addition, the dmPFC module undertakes the failure detection events (or conflict monitoring tasks).
In order to realize reasoning tasks which can create causeeffect relationships through these internal sub-states (c), a deep belief network (DBN) consisting of convolutional layers is utilized to train the rule-based fuzzy cognitive map, which executes the rule base with if-then like statements, which yields to consecutive rational (cause-effect) relations such as (c ϕ i −→ c ), where ϕ i defines the weight matrix of the network. (18) This network is tuned by an evolutionary algorithm with respect to network cost function (fitness) so that its performance is guaranteed to increased or maintained. Its training procedure is based on a gradient descent algorithm.
The reinforcement learning progress with computed reward released from mPFC is performed on the spatiotemporal and emotional reasoning skills which are hosted by modules of the lateral PFC. As a result of that, when the success rates of the memory-based simple sorting or prediction tasks in a rational social interaction scenario including a logical puzzle game are increasing, meta-cognitive planning and reasoning activities, convergence errors in reinforcement learning of lateral PFC related skills including attention, intention and emotion-based reasoning activities are decreased during the experiments.

V. EXPERIMENTAL SETUP AND IMPLEMENTATION
In this study, the main experiment platform is a Bioloid humanoid robot of Robotis (Figure 7) [89]. Bioloid is composed of 24 smart servo actuators (AX-12A), several peripheral body sensors (e.g. SparkFun 9 DoF Razor IMU M0, IR transmitters, 2 axis gyro, proximity sensor for distance measurement) and the main controller CM530 containing 32 bit Arm Cortex based microcontroller, external VOLUME 8, 2020  . In order to perform visual perception processes, the humanoid robot is supported with a ZED 3D stereo camera with 2K video, 6 DoF positional tracking, 20m depth spatial mapping [91]. As an audio environment, the additional equipment including a 2 × 2 microphone array, the stereo speaker is mounted on the robot's head with 2 DoF pan-tilt neck. The computational workload of system architecture is hosted in a Nvidia Jetson TX2 embedded platform [92]. Brief specifications of the embedded system include 64 bit quad-core Arm Cortex A57 processor @2,1 GHz with 2 MB L2 cache and dual-core Denver processor @2,0 GHz with 2 MB L2 cache, 8 GB 128-bit DDR4 ram @59,9 GB/s, Nvidia Pascal architecture GPU with 256 CUDA cores and 32 GB eMMC, SATA storage. In order to control, connections to the humanoid robot can be realized by Bluetooth or USB 3,0 ports. We mostly preferred USB connection in the experiments.
As an implementation experiment (Figure 8), the interaction scenario is based on game like a logical puzzle including memory-based sorting and prediction tasks (e.g. Tic-Tac-Toe game, memory matching game and Wisconsin card sorting test (WCST)) which are used for evaluation of the cognitive functions and diagnosing the neurological disorders such as schizophrenia, dementia, and the other prefrontal cortex lesions [93]- [96]. In the WCST, people must categorize cards with respect to four different criteria such as color, shape, number, and symbol via the only feedback whether the classification is correct [97], [98]. The classification rule changing in every 10 cards has been expected to solve by the participant who has probability making one or more mistakes when the rule changes. The task evaluates how well participants can adapt to the changing rules [99], [100]. First of all, the cards are randomly mixed up and placed as to face down on a nxn grid board in the memory matching game. In each move, the player who turns over any two cards should remember what was on each card and where it was. If the two cards match, keep them otherwise, turn them back over. Tic-tac-toe is a turn-taking logical game that the participant place an ''X'' card in a square while the opponent is placing an ''O'' card in the other square. In this game, the player who makes all ''X'' or ''O'' cards placed in a straight line (e.g. row, column, or crosswise), wins the match on a nxn grid board.
Achieving these tasks may involve some cognitive functions such as intention, attention, and emotion simultaneously [101]- [103]. The intention which involves requests of the action sequence is related to spatial skills originated by the dlPFC module. The attention which requires temporal focus and contextual attributes of an object (or event), is related to temporal skills hosted by the vlPFC module. As a control mechanism allowing the regulation on the behaviors, the emotional reasoning skill driven by the aPFC module associates the events (e.g. experiences, memory fragment during the interaction) with the artificial feelings (emotional states) such as satisfaction (e.g. well-being), excitation and inhibition (e.g. frustration). Then, a logical puzzle game including memory-based sorting or prediction tasks shaped according to this emotional modulation (control). Meta-cognition property of the computational model of the mPFC module allows supervision over spatial-temporal and emotional reasoning skills which are ensured by lateral PFC.
The performances of the spatial-temporal and emotional reasoning skills which are ensured by the functions of the computational lateral PFC are tested in the first experiment. While prediction works are testing the non-spatial or temporal decision-making skills which are related to attentional abilities, the spatial planning skills associated with intention-based functions are evaluated by the sorting tasks in some kind of logical puzzle like memory-based interaction game. During the solving of these tasks, emotional states of the humanoid robot are monitored so that the emotional reasoning functions can provide decisional bias (inference) to the mentioned skills of the computational lateral PFC. Durations and accuracies are measured through the experiment. At the end of the experiment, it is expected to verify the hypothesis (H1) via the contribution of the cognitive functions (e.g., attention, intention, and emotion-based reasoning) in the lateral PFC model for the solution of the rational social interaction problem with a robot.
In the second experiment, the efficiency of the metacognition property of the computational model of mPFC is evaluated via reward-based monitoring activities on the functions of the computational lateral PFC model. The stimulus associated reward activities originated in the OFC module are examined using human feedback events perceived by the humanoid robot in the interaction game. The action related reward activities derived from the ACC module are dealt with the humanoid robot's social clues through the human.
Then the performance of managing both functionalities handled by the dmPFC module is evaluated so that self-awareness and consciousness are investigated. During the experiment, success rates and convergence errors are observed periodically. It is promised to justify the hypothesis (H2) via the contribution of the reward computation based meta-cognition functions which are hosted by the mPFC model at the end of the experiment.
For the last experiment, the performances of all mental operations such as associative learning, complex decision-making and meta-cognitive planning are improved by an optimization algorithm that will ensure neural plasticity. In addition, the parameters including learning rate, reward discount factor, and exploration rate are adaptively tuned via dopaminergic gain. Moreover, depending upon different conditions, the dopamine modulation is parametrically driven by synaptic neural plasticity as tuning convergence speed. Training durations, convergence speeds, and learning performances are monitored during the experiment.
At the end of the experiments, the robot reports learning and interaction statistics. Under the supervision of the operator, levels of rehabilitation, and contribution of the neurocognitive architecture embodied in the humanoid robot are elaborately discussed.

VI. EXPERIMENTAL RESULTS
The computational prefrontal cortex system framework for a humanoid robot was modeled, simulated, and tested in the robot operating system (ROS) middleware with Kinetic distro run over Ubuntu 16,04 LTS operating system. This framework which is a ROS package is composed of many nodes written by python (rospy) and C++ (roscpp) client libraries of ROS [104]. For image processing and computer vision tasks, OpenCV and PointCloud libraries were preferred [105], [106]. In addition, TensorFlow library which is a machine learning framework was utilized to handle neural networks and deep learning applications [107]. Also, there exists a software development kit (SDK/API) for a bioloid robot platform. For communication between the hardware of the humanoid robot and SDK, the firmware of bioloid robot platform is updated before the experiments. The developed system is visualized by several 2D/3D graphical user interface (GUI) tools of ROS (e.g. Gazebo simulator, rviz visualization environment, rqt_graph, rqt_plot, rqt_bags) The snapshots of the experiments implementing humanrobot interaction scenarios are shown in figure 9. In the conducted experiments, the experiment scene ( Figure 10) that the humanoid robot and human are presented as face to face is considered in where several game cards with different features (e.g. size, shapes, colors, number, location, and symbol, etc.) are located on the spatial workspace (e.g. small platform or desk) which resides between the human and the robot. In these experiment scenes, Wisconsin card sorting test (WCST), memory matching and Tic-Tac-Toe game scenarios were executed so that the memory-based logical puzzle game  involving sorting and prediction tasks were implemented during the experiments.
Audio, visual, and set of joint information are received by the humanoid robot platform. While visual inputs are regarding as saliences (e.g. ID, color, location), gaze direction, and skeletal information of human for hand pointing for object or gesture recognition, audio input is evaluated as sound localization and speech recognition. The internal stimuli of the robot's body are joint angles, gyro, and proximity information. The robot platform can execute some gestural behaviors like a hand pointing as output actions (or motor commands). In addition to that, the system enables the robot to realize speech communication with a human.
In the experiments, turn-taking interaction tasks including WCST, memory game, and tic-tac-toe game were spontaneously performed between a humanoid robot and a human participant. To verify the proposed hypotheses, the computational prefrontal cortex model was realized and evaluated for a humanoid robot. The data presented in the table1 provide indicators of performance evaluation, including ability scores for a performed activity and average response times. VOLUME 8, 2020  In order to investigate H1, the performance of the LPFC model functions including working memory, attention, intention, and emotional reasoning was evaluated using the average of the response times and the accuracy scores.
The accuracy scores were computed by ''(achieved_ activity_count) x 100 / (total_activity)'' in the experiments. Besides of that, the success rates were appraised by ''1/(error_rate)'' for a task (e.g. memory game, WCST and tic-tac-toe). The state-based events such as content matching evaluate the performance of attention skills associated with vlPFC. The performance of the intention skills related to dlPFC is dealt with the movement-based events (direction prediction). The emotions including ''excited'', ''relaxed'', ''bored'' and ''stressed'' are triggered by the responses of spatiotemporal events. Accordingly, the average response times of achieving basic sorting or prediction tasks decreased, meanwhile the average accuracy scores of the tasks increased during the first experiment, since all of these skills, including attention, intention and emotion-based reasoning behaviors, are simultaneously utilized to help solve the problem involving LPFC based planning tasks.
The performance of the meta-cognition mechanism in a computational model of medial PFC referred to in H2, was investigated by observing success rates (reward) in figure11, and convergence errors (cost) in figure 12 for the experiment scenarios including memory game (figure 11.a), WCST (figure 11.b) and tic-tac-toe (figure 11.c). When the results are examined, the rewards are slightly better in the case of experiment 3 with respect to ones in case of experiment 2, while the rewards in experiment 1 are remaining behind of both experiments for all scenarios. The reward trend in the WCST task seems to have slightly less fluctuation according to the other tasks. The most oscillations occurred in the memory game task. The convergence errors in reinforcement learning of lateral PFC related skills including attention, intention and emotion-based reasoning activities decreased while success rates of the simple sorting or prediction tasks in the puzzle game are increasing during the experiments, because of that the meta-cognition mechanism in a computational model of medial PFC is integrated appropriate rewards to all mental functions of lateral PFC for regulating complex decision making, meta-cognitive planning and reasoning activities.
The learning performances involving training steps (episode), convergence speeds (cost) and are monitored during the experiments (Figure 12). The results show that costs in the reinforcement learning process of mental functions such as attention, intention, and emotion-based reasoning activities are decreased more efficiently by the meta-cognition mechanism integrating optimization mechanism for optimizing learning performances of the cognitive skills in experiment 3. The effectiveness of this mechanism is better for tic-tac-toe (figure 12.c) with respect to the other both scenarios. Despite the fact that there is a small difference between memory game (figure 12.a) and WCST ( figure 12.b), the cost in the memory game task is slightly worse than the cost of the WCST task. Using this optimization algorithm, it is shown that the learning process is faster and the decision-making activities are more robust.
Size of convolution filter (weight tensor) is chosen as a stack of 4-time frames with a dimension of 84 × 110×2 representing the image pattern of width, height, and depth  respectively. Alpha coefficient known as the learning rate is 0,00025. The exploration probability which is constrained its minimum by 0,01, is 1,0 at the start and its decay rate is 0,00001. The reward discounting rate (gamma) is 0, 9. In table 2, different models related to LPFC are evaluated with respect to training performances. In order to achieve intention, attention and emotional reasoning skills, deep reinforcement learning centered models such as double deep Q networks (DDQN), DQN with LSTM, partially observable deep reinforcement learning with LSTM, and spiking neural dynamics are compared to show supremacy between them. According to results, although the results were close to each other, it is seen that there exist three bands when examined in detail. At first band, model 1 consist of DQN recorded the worst result with average accuracy %56-58 for train session and %51-55 for the test session. As the second band, the model 2-3 involving DDQN and DQN with LSTM respectively, acquire better scores than model 1. In the last band, deep POMDP with LSTM (model 4) and model 4 with spiking neural dynamics as model 5 gained better results from the other models. The best performance was slightly achieved via model 5 with average accuracy %82-91 for the train session and %74-78 for the test session.
The advantage of the models used in mPFC is presented by comparison with different models in table 3. As the inverse reinforcement learning technique, various types of reward generator models composed of deep belief networks (DBN) with a random forest model, rule-based fuzzy cognitive map (FCM), and an optimization mechanism (genetic algorithm) are evaluated so that meta-cognition has been achieved. When the results are investigated, it is easily seen that the third model with GA optimization with average accuracy %79-87 for train session and %67-71 for test session has a remarkable advantage with respect to the other models. While DBN with random forest model gave a better result than the model with rule-based FCM, the scores of DBN with random forest-based FCM model prevailed against model 1 and model 2.

VII. CONCLUSION
The brain-inspired computational modeling approaches are expected to make huge impacts on human-robot interaction studies since as a synthetic life form, the socially aware robots equipped with neuro-cognitive architecture will be widely used in social areas such as assistive, entertainment and rehabilitation fields. Therefore, all these studies to enable human like cognition for the socially aware robots will completely improve the living standards of people [108]- [110]. As a result, this study is very important in that humanoid robots with enhanced embodied cognitive abilities can be used to assist disabled individuals struggling to interact with their social environment by guiding their accessibility and communication. In addition, it is promised that this study will lead to a progressive impact on human-robot interaction researches achieving computational approximation of consciousness and human like cognition for social aware robots. These challenges which are mentioned require recursive task processing and meta-cognitive reasoning mechanism. Naturally, the human brain realizes these cognitive skills by prefrontal cortex which is a part of neocortex.
In this study, a new computational framework of the prefrontal cortex model which is composed of several sub-modules including dorsolateral, ventrolateral, anterior, and medial prefrontal regions was developed and tested for human-robot interaction (HRI). The new computational framework was embedded in a humanoid robot platform. The major novelty related to this framework is to develop a computational representation of the human brain enabling artificial consciousness and imitating self-awareness so that the humanoid robot achieved human level mental activities such as complex decision-making, goal-oriented behavior planning, and meta-cognitive reasoning until it attained the optimal goal state during the experiments involving human-robot interaction scenarios. In order to realize spatio-temporal and emotional reasoning skills involving attention, intention, short-term memory, decision making (e.g. arithmetic and logical), planning, analysis of cause-effect relations and problem solving, the computational prefrontal cortex framework based on the neuromorphic foundations of human's mental activities utilizes deep reinforcement learning algorithm with partially observable state dynamics and LSTM for LPFC model. As a reward computing mechanism, the activities of LPFC are monitored and regulated by the mPFC model of the computational prefrontal cortex framework, which employs deep belief network, rule-based fuzzy cognitive map (alternatively fuzzy random forest model) and genetic algorithm to achieve meta-cognitive reasoning. In addition, the mental activities in the computational framework constitute a working memory stored in the weights of an attractor network, as a sort of associative memory. The network weights in this working memory are updated by machine learning algorithms corresponding with the prefrontal cortex model. Mental functions such as emotional responses and spatiotemporal behavioral planning activities are observed during the experiments. The developed computational framework executing interaction scenarios worked on the software environment including related libraries (SDK/API) and ROS middleware hosted by Ubuntu operating system.
Three experiments were performed to verify the proposed architecture and related hypotheses H1 and H2. The experiments contain the interaction game scenario based on a logical puzzle including memory-based sorting and prediction tasks (e.g. Tic-Tac-Toe game, memory matching game, and Wisconsin card sorting test (WCST)). The first experiment deals with the performances of the spatio-temporal and emotional reasoning skills which are ensured by the functions of the computational lateral PFC. During experiment1, it is seen that durations of achieving simple sorting or prediction tasks in the puzzle game are decreased while the accuracy of the tasks is increasing during the experiments. Thus, the hypothesis (H1) is verified by the contribution of the cognitive functions (e.g., attention, intention, and emotionbased reasoning) in the lateral PFC model for the solution of the rational social interaction problem with a humanoid robot. The reward generation based meta-cognition activities on the functions of the computational lateral PFC model evaluates the efficiency of the monitoring property of the computational model of mPFC in the second experiment. At the end of the experiment2, the fact that meta-cognitive planning and reasoning activities, convergence errors in reinforcement learning of lateral PFC related skills including attention, intention and emotion-based reasoning activities are decreased is observed while success rates of the simple sorting or prediction tasks in the puzzle game are increasing during the experiments. Therefore, the hypothesis (H2) is validated by the contribution of the reward computation based meta-cognition functions which are hosted by the mPFC model. In addition, the set of hyperparameters as a dopaminergic gain vector including learning rate, reward discount factor and exploration rate are adaptively tuned by an optimization algorithm. This mechanism ensuring neural plasticity makes significant enhancement on the performances of all mental operations such as associative learning, complex decision-making and meta-cognitive planning activities in the last experiment. Training durations, convergence errors (costs) and learning performances (rewards) are monitored during the experiment. At the end of the experiments, the humanoid robot reports learning and interaction statistics.
As a reverse engineering perspective of artificial intelligence, this approach may be extended to computational modeling of cognitive and mental functions related to various cortical regions such as limbic system (components including basal ganglia, amygdala, hippocampus, and hypothalamus), motor cortex, cerebellum, brain stem, parietal, temporal and occipital (visual cortex) lobes so that whole brain model is realized for a humanoid robot. In future studies, I will have pursued to investigate the computational realization of the other neuro-cognitive functions for meta-cognition, which are the key issues for modeling consciousness and self-awareness during a rational social interaction scenario between human and humanoid robot as beyond deliverables of this research. In addition, computational methods for adaptation of additional tuning in hyperparameters may provide a more detailed understanding of the processes involved. Beside of that, this proposed architecture could be further improved in future research, by integrating the personality model to brain-inspired neuro-cognitive architecture for the humanoid robot. Also, it may be considered that these efforts will build artificial life organisms (systems) and lead to achieve artificial general intelligence in the long future.

ACKNOWLEDGMENT
The Jetson TX2 embedded development platform used for this research was donated by NVIDIA Corporation.