Cognitive System of a Virtual Robot Based on Perception, Memory, and Hypothesis Models for Calligraphy Writing Task

In this paper, we propose a robotic cognitive system which can learn itself to do a specific assignment by accumulating experiences through bottom-up thinking to make decision by itself via top-down thinking according to the experiences. That is, the cognitive system has a self-learning ability which can accumulate its experiences to make itself smarter. In essence, the cognitive system possesses a perception model, a memory model, and a hypothesis model. The perception model converts image information into perception codes. The memory model stores experiences in the past and present to provide to the perception model and the hypothesis model. The hypothesis model, which generates the next decision according to the experiences from the memory model, is the most important part of the proposed cognitive system. To validate the performance of the proposed system, we utilize Chinese calligraphy writing tasks by a virtual robot through simulation to evaluate the abilities of the cognitive system. In order to generate the coordinates of the writing brush, we made the virtual robot practice to learn Chinese calligraphy through bottom-up thinking to construct the writing patterns. The illustrative examples in this paper show that the virtual robot can learn to write Chinese calligraphy by top-down thinking according to its own experiences.


I. INTRODUCTION
Most of the studies of robotics focused on mechanical, control, and program design. For program design, programmers write computer codes according to a fixed execution flow for the robot, so that the robot can perform primitive actions such as assembling, clamping and placing objects through precise positioning control. This process is very suitable for fixed operating procedures in factories. As workpieces do not The associate editor coordinating the review of this manuscript and approving it for publication was Alessia Saggese . change very often in the field of manufacturing, programmers do not need to frequently rewrite computer codes. However, some scenarios require flexible production rather than a fixed process, such as low-volume automation. In these cases, tasks are frequently changed, so reprogramming becomes inevitable. To deal with this issue, programming by demonstration (PbD) was developed [1], [2], [3], [4], [5] to solve this problem, where the robot can mimic tasks through demonstration and teaching from a human. However, PbD generates coordinates for the robot by reproducing the human demonstration. If there is a slight change in the environment, the robot will only be able to perform according to the previously recorded coordinates. It won't be able for the robot to know the causal relationship of input and output, which may cause it to react incorrectly. To solve this problem, a fundamental cognitive system needs to be established to allow the robot in recognizing the relevance of the task's input and output.
By combining cognitive psychology and robotics, [6], [7] revealed a psychological structure with schema and chain for robots to learn simple tasks. These studies showed good effect on the application of higher-level cognitive strategies for robots. However, it appeared difficult for these methods to learn high-dimensional and complex low-level cognitive processing. Note that the relationship between the image observed by the robot and the motion trajectory for the robot is complicated. It would be difficult to evaluate these highorder nonlinear relationships via a statistical approach. Artificial neural networks, on the other hand, are well suited for finding these complex associations. Thus, the proposed cognitive system is designed using a deep learning-based approach. Furthermore, by referring to the concept of cognitive computing, the proposed cognitive system also allows the robot to adapt to different environments.
The cognitive computing systems are generally information adept, dynamic training and adaptive, probabilistic, highly integrated, meaning-based, and highly interactive [9], [10], [11] in nature. Information adept property is an ability that can synthesize ideas or answers through integrating multiple heterogeneous sources. Dynamic training and adaptive property makes the model possible to learn when it receives new information. Probabilistic property predicts the probability of valuable connections between relevant patterns based on context. Highly integrated property requires a central learning system which can interact with historical data of all modules. Meaning-based property concerns the structure of language, semantics, and relationships which are explainable. Highly interactive property is designed for human-model interaction, which allows the model to be able to communicate with a human.
To design a cognitive system, it is necessary to use the components of cognitive computing [8], [10], [12], [13], [14], [15]. We refer to the mental process of human thinking as the design concept of the cognitive system, and then use the framework of cognitive computing to design the system. In a human cognitive system, perception, memory [16], and hypothesis generation are the key operating elements. Therefore, to establish a cognitive system that resembles a human's cognition, these three elements must be considered.
Most cognitive computing research focuses on perception and information organization, similar to methods for pattern recognition [17], [18], [19], [21], [21], by referring to human perceptual processing computer vision is used to implement feature extraction process. The proposed cognitive system extends the perception process to the sensory motor system. In addition to perceptual processing, we also add processing of memory and hypothesis generation. Through the interaction of these models, the cognitive learning process of image to action is established.
The first key element in cognitive systems is perception [22], [23], [24], [25], [26], where the brain organizes and interprets incoming stimulus signals. The processing method of the perception is divided into bottom-up and top-down processing [27], [28], [29]. Upon receiving the stimulus signal, the human brain will extract, analyze, and organize the characteristics of the stimulus. This process is called bottom-up processing. Top-down processing establishes a connection between stimulus signals and experiences based on memory. Humans perceive the world and establish concepts through two perceptual ways of processing (bottom-up and top-down processing). The perception process is divided into three stages [28]: sensation, perceptual organization, and recognition. These functions correspond to the current convolutional neural network's (CNN's) [29], [30] input layer, convolutional layer and fully connected layer, respectively. In designing the cognitive system of robots, we have not used it for visual recognition. The proposed cognitive system only includes the process of perceptual organization of robot perception, which enables the robot to convert physical signals to neural signals.
Memory is a space for storing and retrieving information. The proposed cognitive system also includes a memory model to make sure that the system has the ability to remember its past experiences. The most classic memory storage operation in psychology is called the Multi-Store Model [31], [32], which describes how information is processed in memory, where the memory is divided into three different types: sensory register (SR), short-term memory (STM), and long-term memory (LTM) [27], [28], [31], [32], [33], [34]. Through the interaction of the three types of memory, we can adopt the concept of Multi-Store Model theory to the proposed cognitive system, so that the robot has the memory and can use its past experiences for future decision-making.
Over the past years, a lot of research has designed explainable AI models [35], [36]. However, the proposed method focuses only on information adept, dynamic training and adaptive, probabilistic, and highly integrated characteristics of cognitive computing. Although meaning-based property is also an important problem to be deal with, the behavior of our proposed model can only be analyzed in an indirect way by checking long-term memory. The proposed cognitive system is capable of selfadaptation and accumulating experiences for learning, where the operating mechanism uses the perception model (the sensation and perceptual organization), memory model (Multi-Store Model), and hypothesis generation model [12], [38] of cognitive psychology. These models can be established using the learning ability of deep neural networks. Thus, we use convolution neural networks to perform complex multi-model training and simulate basic mental processes.
How to establish cognitive abilities for robots is an important research topic. The research revealed in [39] combined VOLUME 10, 2022 cognitive science and reinforcement learning to design a cognitive controller to establish a state-of-the-art cognitive system for cognitive tracking radar. By referring to cognitive science, the structure of their design consisted of a cognitive controller, a short-term memory, a cognitive preceptor, and an environment. This cognitive system was successfully applied to the field of communication with satisfactory results. Unfortunately, in the field of robotics, robots are often required to generate an entire motion at one time. That is to say, the robot won't be able to receive a reward corresponding to a single output pose in the motion. This has made it difficult to implement such a reward-based reinforcement learning method in this situation. Through the proposed cognitive system of robots, the relationship between outputs and inputs can be established by itself without a reward function. The outputs can be modified according to the understood state associations, so that the robots have a high environmental adaptability. By combining cognitive computing, robot theory and cognitive psychology, a fundamental robot cognitive system can therefore be established in this paper.

II. COGNITIVE SYSTEMS OF ROBOTS
The important parts of the proposed cognitive system include a memory model, hypothesis model, and perception model. Figure 1 illustrates a learning-based robot with the proposed cognitive system, in which the architecture of the proposed cognitive system is shown in Fig. 2. For clarity and easy reference, the nomenclature is shown in Table 1. In Fig. 1, the perception model encodes image data (inputs) to extract information. The memory model records past perception codes and past hypotheses. The hypothesis model creates hypotheses to generate actions to send to the motor system. The motor system controls the motions (outputs) of the robot according to the actions sent from the hypothesis model. R mp = M k ,Õ k is the information from the memory model which helps the perception model adapt to various {A k } are the information from the perception model and the hypothesis model that need to be retained in the memory model. R ph = C O * k , C M k is the perception code which helps the hypothesis model make new hypotheses. O * k is the expected observation image as the input of the system. O k is the observation image as the output of the system which is captured by a camera after finishing the motion.
With these fundamental components, the proposed cognition system can learn to solve simple tasks. Most robotic research focuses on offline programing instead of online learning to solve uncertain tasks. With the proposed cognitive system, the robot is able to learn tasks online from accumulated experiences.
In Fig. 2, the proposed cognitive system can learn to make the current observation vector O k (output) approximate the expected observation vector O * k (input) through a learning process. A Hypothesis Net (DNN1) can generate the hypothesis A k as the action according to the expected observation vector O * k (input). When switch S 1 is closed, the robot performs actions through the motor system according to the hypothesis A k . The motor system controls the robot's movement based on the hypothesis of the action which is sent by the hypothesis model. After the motor system finishes the action, the proposed cognitive system observes the result (output) as the observation vector O k . Then, the cognitive system retains the observation vector O k , perception code C O k , and action vector A k in the memory model. The memory consolidation process controlled by switches S 2 , S 3 and S 4 helps Memory Net (DNN2) memorize past experiences. Finally, the hypothesis model can generate new hypotheses A k+1 as actions of the motor system via these experiences.

III. PERCEPTION MODEL
Image processing technology has solved many problems and made robots more intelligent than ever before. However, it is hard to describe abstract information in image form. Inspired from human perception, we built neural networks for robot perception in order to encode images into the form of neural information, because humans do not process image data directly, but rather covert images into neural information in the brain.
The striate cortex (primary visual cortex) is the first image processing area in the occipital cortex, which processes the edge and contour information at the first stage of the occipital lobe [40], [41], [42]. Then the neural signal is divided into two paths: the dorsal stream and the ventral stream [43]. The dorsal stream projects the signal to the posterior parietal lobe for guiding action, and the ventral stream projects from the primary visual cortex to the inferotemporal cortex which identifies the concept of the object.
Convolutional neural networks extract features with rings or edges in the first layer, which also occurs in classification and auto encoder [44] tasks. We tried to train the auto encoder with the cifar-10 dataset, and visualized the first layer of the filter as shown in Fig. 3. The filters in the first layer are some rings, rays, and simple directional shapes. The kernel maps are very similar to the stimulus of the striate cortex tested on cats [41], Macaque monkeys [45], and also humans [42]. The results of the first convolutional layer in the auto encoder are very similar to the information that is processed by simple cells and complex cells. They are also like the stimuli of past experiments that excite neurons [40], [41], [42], [45]. The proposed cognitive system utilizes convolutional neural networks to model the primary visual cortex, which processes the information in the image data. The perception  Fig. 4, including an encoder, a decoder and two switches. The encoder converts image information into neural code, which is called the perception vector or perception code. The purpose of the encoding process is to make the cognitive system extract features from the image, which can be directly processed by other models of the cognitive system.
O k is the current observation image of the system output, and C O k is the perception vector of O k . M k is the image information which is imagined and constructed from the Memory Net (DNN2). C M k is the perception code of M k .Õ k is the image information in the memory model. CÕ k is the perception code ofÕ k . S 6 and S 6 are switches that control the training process of the perception model in order to adapt to different situations.

IV. MEMORY MODEL
The memory model can help the robot store and consolidate memories, making the cognitive system to remember past experiences. The hypothesis model does not generate hypotheses randomly, but rather according to the memories to achieve the expected results. The memory model shown in Fig. 5 includes a Memory Net (DNN2), a short-term memory (STM), and a long-term memory (LTM). A k is the hypothesis generated by the Hypothesis Net (DNN1). A k is the set of the past hypotheses from the short-term memory and long-term memory. X k is the sampling data vector which consists of the past observation vector X o k and the past perception code X c k of the observation vector. S 2 is a switch which controls the training process of Memory Net (DNN2) to store and consolidate memories. S 3 is a switch that chooses the data, M k or C M k , to consolidate memories. S 4 is a switch that chooses the memory retrieval process A k or the memory consolidation process A k . The main parts of memory model are shown below: (1) Memory Net (DNN2): A 16-layer deep neural network, which is the important part of the memory model [8]. It learns to imagine the result of the observation image according to the hypotheses from the hypothesis model. Memory Net (DNN2) can help Hypothesis Net (DNN1) generate better hypotheses through accumulating experiences M k .
(2) Short term memory (STM): A memory space which stores recent hypotheses A k , observation vectors, O k andÕ k , and perception codes, C O * k , C O k , and CÕ k , with a structure like a queue. The size of short-term memory is limited to l s .
As soon as a new hypothesis is created, the hypothesis, observation vectors, and perception codes are added into the short-term memory shown in Fig. 6.
(3) Long term memory (LTM): A memory space that stores past experiences including hypotheses, observations and perception codes from the short-term memory. LTM consists of three memory subspaces: positive memory, negative memory, and impressive memory. The positive memory space stores past experiences that have a good quality score of the past perception code. In other words, the past perception code evaluates if the expected observation C O * k and the current observation C O k are similar. On the other hand, the negative memory is a memory space which stores the past experiences that have the bad quality score of the past perception code. The size of the positive memory is limited to l p l , and the size of negative memory is limited to l n l . The impressive memory is defined as a memory space which stores past experiences that have various scores of the past perception code to the longterm memory. The size of the impressive memory is limited to l i l . The Long-term memory is the important part to consolidate memories in Memory Net(DNN2). The positive memory provides experiences which could be used to solve the tasks, and the hypothesis model can easily refer to these experiences to make better quality hypotheses that lead the cognitive system in the right way. The negative memory remembers bad hypotheses generated by the hypothesis model, to avoid making the same mistakes. The impressive memory ensures the diversity of the long-term memory, and allows the hypothesis model to avoid always making the same hypotheses. This increases the imagination of hypothesis maker by preventing Hypothesis Net (DNN1) from overfitting.
(4) Memory consolidation: Considering the number of neurons is limited, we cannot provide the cognitive system with infinite memory space. As like a human, the cognitive system needs to consolidate its memories by recalling them again and again. The consolidation process is divided into two stages. The first stage is observation imagination, where Memory Net (DNN2) learns to imagine the observation vector X o k related to the hypothesis A k shown in Fig. 6. At this stage, S 2 is closed, S 3 switches to M k and S 4 switches to A k . In this process, M j approximates O j , where j = 1, 2, . . . , k. The second stage is perception code imagination. The memory model consolidates the past perception code X c k of the observation vector, and S 3 switches to C M k . In this process, C M j approximates C O j , where j = 1, 2, . . . , k. Figure 7 shows the proposed hypothesis model for making new hypotheses, which consists of a Hypothesis Net (DNN1), an action switch S 1 , a judgement switch S 5 , and a memory retrieval. The Hypothesis Net (DNN1) is a 11-layer neural network that generates hypothesis A k according to its weights and the expected input image O * k . The action switch S 1 decides if hypothesis A k will be sent to the motor system. The judgement switch S 5 changes hypotheses through updating the weight of Hypothesis Net (DNN1).  It is important to update the weights of Hypothesis Net (DNN1) to change hypotheses. The key part of the memory retrieval process is to evaluate the similarity between the perception code C O * k of the expected observation and the perception code C M k of the memorized observation vector as shown in Fig. 8. The loss function is defined as:

V. HYPOTHESIS MODEL
where n c is the length of the perception code, and C O * k and C M k are the perception codes sent from the perception model. By minimizing the loss function (1) between the perception code of the expected image and the perception code of the memorized observation, Hypothesis Net (DNN1) can generate the next hypothesis with better quality.

VI. CHINESE CALLIGRAPHY WRITING TASK
The challenge of writing calligraphy is not the same as writing with a pen, because the brush tip made of goat or wolf hair of the writing brush will always change in different shapes corresponding to the posture of the shaft of the writing brush. When writing with a pen, only the position coordinates on the 2D plane [x, y] T need to be considered. When writing with a writing brush, multi-dimensional postures x, y, z, θ,θ T need to be considered. In Chao et al. [46] used generative adversarial nets to allow robots to learn to write calligraphy characters. Through the deep learning models, the robot can practice writing calligraphy on its own. However, this method  ended up controlling only the position coordinates [x, y, z] T and failed to consider the movements of rotation and tilt. Our proposed cognitive system, however, not only allows the robot to learn to generate position coordinates [x, y, z] T but also the movements of rotation and tilt x, y, z, θ,θ T .
To evaluate the abilities of the proposed cognitive system in this paper, we chose a Chinese calligraphy writing task which requires complex motions to be controlled. With a calligraphy writing sample from a human, the system can generate coordinates to control a virtual robot [8], [47] to write Chinese calligraphy which is shown in Fig. 9. In this paper, the Hypothesis Net (DNN1) makes hypotheses as actions to write Chinese calligraphy, and the Memory Net (DNN2) judges these hypotheses. Then the virtual robot can improve the writing result by retrieving historical experiences. Instead of using the mean squared error (MSE) of the image data as the loss function, we use the MSE of the perception code of the image as the loss function in (1). The proposed cognitive system has the perception, the memory, and the hypothesis models to enable the virtual robot to learn and write Chinese calligraphy.

A. PERCEPTION MODEL SETTING
Image processing has been rapidly developed to solve many problems to make robots more intelligent. Unfortunately, images are very difficult to describe. Inspired by human perception, we built neural networks for robot perception in order to encode the writing of Chinese calligraphy to the   neural information, because humans do not process image data directly, but covert the image to neural information in the brain.
In [8], a Writer Net split a stroke of calligraphy writing into several 20 × 20 region of interest (ROI) images to reduce the complexity of the vision data. To integrate encoder with Hypothesis Generation Networks (HGN) in [8], we design an Encoder in Fig. 10 with a sequence of multiple ROI images to encode the image of Chinese calligraphy writing.
The architecture of the Encoder in the proposed perception model in Fig. 10 is shown in Table 2. The Encoder is similar to, but smaller than Writer Net in [8]. All of the nonlinear activation functions in the convolution layer are ReLU [48] VOLUME 10, 2022 and all of the masks are 3 × 3 filters. Instead of a max pooling layer, we used average pooling to reduce the dimension. Calligraphy words are binarized images with filled shape. Therefore, average pooling can keep more information about the shape than max pooling. There are 13 layers in both Encoder and Decoder in the perception model, and they have a symmetric network architecture.
In order to generate the stroke data of Chinese calligraphy, we split the stroke to several ROI images. Figure 11 shows the ROI images of different stokes. Even though different strokes in the image domain are not similar, the pattern of writings is relevant. If we pick some of the ROI images, these images may also exist in other stroke images or appear in the future.

B. HYPOTHESIS NET (DNN1) AND MEMORY NET (DNN2) IN CHINESE CALLIGRAPHY LEARNING TASK
In [8], we have already designed a fundamental Hypothesis Generation Networks (HGN) in Chinese calligraphy learning tasks. In this paper, we proposed a cognitive system utilizing a 15-layer Writer Net [8] and a 16-layer Estimator Net [8] as the Hypothesis Net (DNN1) and Memory Net (DNN2), respectively. The difference between the Writer Net in [8] and the Hypothesis Net (DNN1) in this paper lies in the memory retrieval process shown in Fig. 12. With the proposed memory model shown in Fig. 13, the learning process of the proposed cognition system in this paper is more stable than the HGN's learning process in [8]. The proposed cognitive system stores

VII. SIMULATIONS
Considering that there are multiple neural networks (DNN1, DNN2, encoder, and decoder) in the cognitive system, sufficient memory space is required in the training stage (including gradient computation). Although the computational load of individual neural networks is not heavy, sufficient memory is nevertheless required to train the multiple networks at the same time. Table 3 presents the time complexity and space complexity of the proposed cognitive system. Since the cognitive system is online learning, the complexity shown in Table 3 refers to training rather than just testing. By setting the batch size to 5 in our experiments, the space complexity is about 13GB. This has imposed a difficult burden for most GPU devices to train the cognitive system. As a result, the Taiwan Computing Cloud (TWCC) is used to execute the practicing process of Chinese calligraphy writing, where two Tesla V100 GPUs each with 64GB memory are used to simulate the learning process of calligraphy writing. To write Chinese calligraphy, a virtual calligraphy writing robot [8], [47] is chosen to train the proposed cognitive system. Fig. 14 shows the Chinese calligraphy writing simulation system [47], where an action vector controls a series of the control points to draw the writing results. The Chinese calligraphy writing task is to learn 8 fundamental strokes shown in Fig. 15, including pecking, horsewhip, passing lightly, sideways, jump, dismemberment, bridle, and crossbow. These 8 strokes can be used to build up FIGURE 14. Chinese calligraphy writing simulation system [47]. many kinds of Chinese words. That is, the virtual robot with the proposed cognitive system can write many Chinese words through learning these 8 fundamental strokes. In this experiment, the dimensions of action space are n 0 = 5, indicating an action vector a i k = x, y, z, θ,θ T . n 1 , the number of action points a i k in an action vector A k , is set to 123. The number of chosen memories to help consolidate the memory n 2 is set to 5. The length of the perception code is n 3 = 128. The size of the short-term memory should be less than the size of the long-term memory. We chose the size of memory space n 4 = 35 which consists of short term memory l s = 5, positive memory l p l = 5, negative memory l n l = 5 and impressive memory l i l = 20. We pre-trained the perception model with a few calligraphy images to speed up the learning process of the cognitive system. Adam optimization algorithm was utilized as the optimizer for all neural networks in the cognitive system. The learning rates of Switches S 2 , S 5 , and S 6 were set to 10 −5 and all of them used the loss function in (1).
The simulation results are shown in Table 4 and Table 5 to evaluate the performance of the memory model and perception model for three methods, including the original hypothesis generation networks (original HGN) in [8], the hypothesis model with memory model (the proposed system without perception model), and the hypothesis model with memory model and perception model (the proposed system). Writing results of the three methods to practice the pecking, horsewhip, passing lightly, sideways, jump, dismemberment, bridle, and crossbow strokes 500 times are shown in Tables 4 and 5, respectively. With the memory model, the writing processes of the two proposed systems are more stable than the original HGN in [8]. These two proposed cognitive systems can remember past mistakes through retaining and consolidating memories to prevent making the same mistakes. With the loss function in (1), the loss curves of the writing processes are shown in Fig. 16 for (a) pecking, (b) horsewhip, (c) passing lightly, (d) sideway, (e) jump, (f) dismemberment, (g) bridle, and (h) crossbow, demonstrating the learning performance for Chinese calligraphy tasks. We utilized the three methods to compare their learning ability.
With reference to Fig. 16, the first method is the original HGN in [8] (in green) which does not have psychological memory processing (long term memory, short term memory and memory consolidation process) and the perception model. The second method (in red) is the proposed system with the memory model which has psychological processing but doesn't have the perception model. The third method (in blue) is the proposed system which has both of the memory model with psychological processing and the perception model. The original HGN in [8] sometimes converges to a local minimum (Fig. 16 (d) and (e)) because the memory in the HGN is unclear. With the memory model, the second and the third method are significantly more stable than the first method.
All Chinese words are composed of fundamental strokes in different positions, sizes and angles. In order to show the learning ability of the proposed cognitive system, we combined these eight strokes by linear transformation to draw a full word ''yong'' ( ) shown in Table 6. As a result, we let the cognitive system learn the 8 fundamental strokes of calligraphy first. We then assign each stroke a specific position, size and angle through linear transformation whose parameters are determined through trial and error. Finally, we can complete drawing a full Chinese word with the parameters of these strokes. To allow a fair comparison, these methods are evaluated using the same linear transformation parameters. Figure 17 shows the MSE loss curves of writing process ''yong'', where the two proposed methods have good writing results, and both of these methods converge. The proposed system (in blue) (i.e., the third method) has a better learning ability (lower mean squared error) than the proposed system without perception model (in red) (i.e., the second method). Table 6 shows the writing results at epoch 1, 250 and 500, so that the final result of the two methods can be compared. Note that the writing results by the proposed system with perception model at epoch 500 is very similar to the human writing sample.

VIII. DISSCUSSION
A. LIMITATION The proposed cognitive system is able to accomplish a given goal through self-learning. By combining deep learning and cognitive psychology, the cognitive system has the ability to think on its own to generate appropriate motion coordinates to accomplish a calligrapgy writing challenge. The proposed cognitive system currently has certain limitations, making it unable to deal with too complex tasks. If the complexity of the hypothesis net does not match the complexity of the challenge to be solved, the cognitive system may fail to generate appropriate motion. This will lead to different suitable network structure designs for different challenges, and has not yet achieved a truly universal problem solving system.

B. THREAT TO VALIDITY
The experiment in this paper is to apply the proposed cognitive system to a virtual robot which learns to write calligrahy in a simulated environment. Different from the real environment, there are fewer factors that will affect the validity since there are few uncertainties in the simulated environment. Nevertheless, there are two factors that may threaten the validity of the experiment results in this paper: (i) Inappropriate hyperparameters: Different from most deep learning research, the proposed cognitive system is internally composed of four individual deep neural networks. Therefore, the number of hyperparameters is at least 4 times more than that of a conventional deep learning method. In addition, the hyperparameters of each neural network affect each other. For example, the process of memory consolidation in the cognitive system involves both the networks of the memory model and perception model. Also, in the memory retrieval process of the cognitive system, the hypothesis network, the perception network, and the memory network are simultaneously involved. These cognitive processes are similar to human cognitive processes in which the memory, perception, and hypothesis generation all affect each other. In this paper, we set the learning rate to the same value to allow a fair comparison.
(ii) Inappropriate initial network weight values: Since deep neural networks in the cognitive system have a large number of weights, the initial weights will affect the behavior of the cognitive system to a certain extent. Fortunately, as long as the learning rate is sufficiently small, the cognitive system can eventually converge stably during the learning process. That is to say, even if the initial state of the cognitive system is undesired, the cognitive system can complete the task through acquired experiences. Nevertheless, different initial weights generally lead to different convergence time, which may affect the validity of the experiment. In order to reduce the influence of the uncertainty of the initial weight values, we use Xavier Initialization [49] to initialize the weight values of the deep neural networks in the cogitive system. Xavier Initialization method serves to limit the initial weight values to a range that is not too large, so that the initial state of the deep neural network will not deviate too much.

C. CONTRIBUTION
The contribution of this method is to add memory model, and hypothesis generation model to the cognitive computing   which usually deals with the perception process only. The original HGN [8] has been able to learn action generation, but the convergence of its learning process is sometimes unstable. By adding the memory model and perception model to the original HGN, so that the HGN can converge stably. In addition to supervised learning through given training data, the ability to formulate hypotheses is also the focus of current and future research on artificial neural networks. The cognitive science combined with deep learning has great potential and may also serve as the fundamental part for the reasoning capabilities of robots in the future.

IX. CONCLUSION
Cognitive systems are an important area of research in the field of artificial general intelligence (AGI). According to research in modeling the learning processes of humans, robots could finally learn to solve different tasks via selflearning abilities. In this paper, we have presented the proposed cognitive system which can learn to solve tasks. By controlling switches, we can implement mental processes to perform different kinds of tasks. The simulation results show good learning abilities in generating complex motions by the proposed cognitive system for a virtual writing robot that learns to write Chinese calligraphy. The proposed cognitive framework can apply not only to Chinese calligraphy writing but also other different tasks. However, the neural networks in the proposed system (Hypothesis Net, Memory Net, encoder and decoder) are not universal. That is, different scenarios (other than Chinese Calligraphy writing) will require appropriate network architecture. Moreover, there are four artificial neural networks that interact with each other and many parameters have to be designed. Therefore, the cognitive system sometimes cannot produce good results because of the sub-optimal parameters or initial weights. Taipei