Purposeful Communication in Human–Robot Collaboration: A Review of Modern Approaches in Manufacturing

The uncertainties arising from the imperfection of the shared understanding during human-robot collaboration (HRC) are critical challenges in the development of real-world robots, which has attracted attention in various fields, especially in the manufacturing sector. Many research efforts have explored several key components and elements of HRC to reduce uncertainties. However, these efforts are mostly isolated from each other, and few attempts have been made to develop generic frameworks to combine them for better HRC frameworks. This article contributes to this issue by reviewing the components of HRC research and developing a generic framework of purposeful communication to solve HRC uncertainties holistically. The aspects of HRC components that can affect the shared understanding of humans and robots include the type of collaborative task, communication modalities, and decision-making of robots. After examining these aspects, we repositioned the central problem to the cause of these uncertainties and proposed a new categorization of the available HRC scenarios considering communication channels, because communication strategies should be the main focus for reducing these uncertainties. This categorization will help to design better HRC frameworks that will lead to improving shared understanding and task performance, reducing uncertainties, and establishing trust, transparency, and safety. This article proposes a comprehensive literature review and a new categorization of the currently used communication approaches by analyzing forty-nine selected articles from a wide range of articles in various databases.


I. INTRODUCTION
Human-in-the-loop (HITL) systems are hybridization strategies that anticipate, evolve with, and maintain a current understanding of the elements with which they interact, be they human, mechanical, environmental, or otherwise [1]. One of the possible scenarios for HITL systems is human-robot collaboration (HRC), which could be applied to various sectors such as healthcare, education, and manufacturing. In the manufacturing sector, with advances in collaborative robotics, humans are now working side-by-side with robotic peers, each offering complementary skills to the productivity The associate editor coordinating the review of this manuscript and approving it for publication was Marco Martalo . of manufacturing. The benefits of this approach are well recognized and include flexibility, since non-experts quickly train collaborative robots, and efficiency, as humans can work directly with the robot rather than through a series of fixtures, cages, and conveyance mechanisms [2]. Although manufacturing productivity is improved by these updated practices, collaborative robots are based on human interaction and need to be more intelligent. The collaborative robot does not have all the capabilities required to perform many tasks independently, so a human collaborator is needed in the process.
In HITL systems with a focus on human-robot collaboration (HRC), the goal of research is to equip robots with intelligent human capabilities; however, robot automation to emulate human intelligence and flexibility is one of the most difficult unsolved robotics problems [3]. Therefore, considering the difficulty of emulating human intelligence in robots, providing flexibility and efficiency in HRC requires careful implementation of the physical embodiment of sensory and motor capabilities in robots, safety concepts, communication, collaborative task, and robot decision-making. In other words, to guarantee flexibility, intelligence, and efficiency in HRC, it is necessary to design appropriate HRC frameworks by adding modern and intelligent components to the framework and reducing the uncertainties in each of these components, so humans and robots could collaborate efficiently; as a result, physiological safety and well-being of the human user will be developed [4], [5].
This article reviewed forty-nine selected articles from a wide range of articles in various databases to identify the most important components and proposed a new categorization for HRC frameworks, which will lead to better design approaches in HRC. Consequently, our study revealed three key critical components: 1) type of collaborative task; 2) Communication modality and approach; and 3) robot decision-making. Each of these components requires some basic considerations in their implementation; however, developing and maturing more advanced technologies for each of the components/factors will result in the reduction of uncertainties and cultivating human users' safety, physically and physiologically. Furthermore, based on our analysis, we concluded that communication strategies should mature and be the main focus to reduce these uncertainties. In other words, this study has highlighted the importance of human-robot communication in building a trustworthy relationship in automated processes. As a result, a new categorization of the available HRC scenarios was proposed considering communication channels. The results of this study will have serious implications for future researchers who are seeking to develop human-robot prototypes in the new era of advanced manufacturing.
The remainder of the paper is structured as follows: research questions are provided in Section II; materials and methods used to perform database research are described in Section III; types of collaborative tasks are described in Section IV; Section V discusses the communication modalities exploited in selected articles; Section VI provides information on the decision-making algorithms used for a robot in HRC; a new categorization of articles is provided in section VII; sections VIII and IX are the discussion and conclusion sections, respectively.

II. RESEARCH QUESTIONS
This paper aims to recognize critical components in building an effective HRC framework in terms of physiological safety of human users in the manufacturing sector by reviewing the available frameworks of HITL systems in the mentioned sector, proposing a new categorization of the available frameworks, and presenting an appropriate framework of HITL systems in the context of HRC considering communication as the main factor. Therefore, the following research questions were selected to motivate this article and address these goals.
1) What are the available frameworks for HITL systems in the context of HRC in manufacturing scenarios? 2) What are the key components in building efficient HRC frameworks? 3) What is the most critical component of uncertainty resources in HRC to convey the feeling of physiological safety to human users? To answer these research questions, we conducted a systematic review focusing on HITL systems in manufacturing scenarios, particularly with the goal of identifying sources of uncertainties during HRC. For Question 2, the knowledge resulting from our systematic review revealed three critical components to build an efficient HRC framework: type of collaborative task, communication modalities, and robot learning algorithms. The type of collaborative task will help the HRC designer determine the scope of parameter spaces and the limits of the exploration of uncertainties. Communication modalities will not only resolve the representations of information exchange, but will also introduce constraints and tension to the communication channels. Finally, to answer Question 3, we also reviewed existing research efforts on robot learning that aim to reduce the uncertainties mentioned above to facilitate safe and trustworthy HRC. These algorithms significantly expand the robots' ability to estimate human-related uncertainties, such as changes in intentions, preferences, and goals, to ensure both parties share the same mental model during the collaboration processes. Ultimately, we summarized these critical components and the knowledge resulting from our review and developed a novel categorization of HITL systems to answer Question 1.

III. MATERIALS AND METHOD
This section describes the selection criteria for the article and the search strategy.

A. ARTICLE SELECTION CRITERIA
The particular focus of this systematic review was to find articles whose main contribution lies in the development of HRC frameworks that could be used in manufacturing scenarios. Frameworks that focus on improving HRC efficiency by optimizing different components, such as robot embodiment, physical safety, physiological safety, implicit communication, explicit communication, and robot decision making, to name but a few. Therefore, to revisit some of these aspects and propose a new categorization of the available framework, some criteria for the selection of articles were defined. The selected articles should focus on the following.
• Human-robot collaboration or human-robot cooperation • Experimental or simulated setups in industrial settings • Direct collaboration between human and robot • Model and knowledge-based robot-learning techniques • Online decision-making for the robot (in-the-loop decision-making) Based on the criteria for selecting the articles, the human and robot should be involved in a scenario to collaboratively or cooperatively complete a task. Both experimental and simulated research works could be included in the final list of selected articles. The collaborative task should be an example of a task that could be implemented in a manufacturing environment or a task that could be generalized to a manufacturing workspace. HRC frameworks that have a task allocator for both humans and robots (i.e., indirect collaboration) were not our focus, so articles that study direct collaboration between humans and robots (i.e., robot and human make their own decisions) were selected to be analyzed. In addition, collaborative robots could be controlled through different methods, including the implementation of classic control algorithms or the integration of physiological signals such as electroencephalogram (EEG) and electromyography (EMG) with robot control. However, the focus of this article was on reviewing articles with model-and knowledge-based robot learning techniques. Also, robot decision-making should be done in the loop (i.e., pre-programmed robots with no in-loop decision-making did not fit the defined criteria).

B. SEARCH STRATEGY
The preferred reporting items for the systematic review and meta-analysis framework (PRISMA) were followed to perform the literature review [6]. The literature review was performed by searching databases, including IEEE Xplore, ISI Web of Knowledge, Scopus, and the journal of communication studies. Keywords used were the combination of: 'human-robot collaboration,' OR 'robot task planning,' OR 'decision making in human-robot collaboration,' OR 'robot learning in collaborative tasks,' AND 'communication.' The title, abstract, and keywords of the journal articles and conference proceedings were searched. The ones written in English from 2010 to 2021 were selected according to the paper selection criteria mentioned above. The paper selection process in this systematic review is shown in Figure 1. According to Figure 1, a total number of 176 articles from ISI Web of Knowledge, 382 from IEEE Xplore, 289 from Scopus, and 13 articles from the journal of communication studies were identified. The elimination of duplicates was the next step in the identification phase (53 articles); then the title and abstract of the remaining articles were screened to remove unrelated ones (screening). The remaining articles were evaluated for eligibility according to the eligibility criteria defined in the next phase. One hundred articles were thoroughly reviewed and, in the end, forty-nine articles were included in the quantitative analysis.

IV. COLLABORATIVE TASKS
The level of human-robot interaction (HRI) has been categorized as fully programmed, co-existence, assistance, cooperation, collaboration, and fully autonomous. The fully programmed level refers to traditional work cells where robots are located inside cages; there are no shared tasks and shared workspace defined, and physical contact is not allowed. At the co-existence level, still shared tasks and work spaces are not defined, and physical contact is not allowed. However, compared to the fully programmed level, robots are not fenced and some technologies, such as laser scans, are used to separate the robot workspace from the human workspace. For the level of HRI assistance, there is no shared task between the human and the robot, as the robot does not have any independent task to perform, while the workspace is shared and physical contact is allowed based on the nature of the assisting task. Both the cooperation and the collaboration levels have shared tasks and shared workspaces. However, physical contact is not allowed at the cooperation level, since humans and robots have the decoupled task (sequential); however, at the collaboration level, physical contact is allowed since the task is supposed to be accomplished simultaneously by the human and the robot. Finally, the fully autonomous level is operator-independent and there is a shared workspace for the human and the robot [7]. For the sake of this article, among the aforementioned levels of HRI, articles with a focus on collaboration, cooperation, and assistance levels depending on the nature of the task have been reviewed.
The type of collaborative task is a determinant factor in designing the HRC framework, and depending on the type of task, the robot and the human need to have access to different kinds of information related to the workspace. As a result, the lack of any appropriate technique for the defined task and workspace will cause uncertainty in the framework. Therefore, the selected articles were reviewed considering the type of collaborative task performed on them [8]. According to the selected articles, collaborative tasks between humans and robots could be classified as a shared workspace, shared manipulation, handover task, sequential task (e.g., assembly and the objective of the task are known), or leaderassistant, which will be discussed in this section (as shown in Tables 1 and 2).
A. SEQUENTIAL COLLABORATIVE TASK/ASSEMBLY TASK Sequential collaborative tasks/ assembly tasks were the focus of most of the selected articles since these types of task have applications in many other fields such as healthcare and education, in addition to manufacturing. Normally, sequential collaborative tasks get completed by completing various sub-tasks in that either a human user or robot takes a role depending on the task design. The completion of sequential tasks is guaranteed through the design of appropriate frameworks where the robot could communicate with human users and anticipate their behavior/latent states [8]; the robot could have access to the observable status of both the human user and the environment to leverage them for decision-making; robot's status could be visualized for the human user using different methods such as an interactive GUI in that the human user could select a suitable action for the next step according to the set of possible next steps that are proactively proposed in the system [9]. Furthermore, it should be noted that, depending on the nature of sequential/assembly tasks, human users may have different preferences in the assembly process, so the HRC framework should equip the robot with the ability to predict the preferences of human users to assist in assembly tasks accordingly [10], [11]. Assembly tasks are collaborative tasks; however, other types of collaborative tasks could be used in manufacturing settings, such as a maze game, which is a collaborative motor task in that humans and robots collaboratively navigate a ball to a goal [12]. In addition, joint construction could be considered as another example of assembly tasks [13], [14].

B. JOINT REACHING TASK
In Joint reaching collaborative tasks, human users and robots are jointly supposed to perform the task [15]. For example, table clearing is a joint-reaching task that could be done by a robot and a human to clear a table from objects. It would be possible to define a supervisory role for a human user in this type of task and involve human intentions such as trust in the process; If the human trusts the robot, it will let the robot do its task; otherwise, the human will do the task jointly with the robot [16], [17]. Joint reaching tasks could also be implemented in inventory scenarios [18].

C. OBJECT HANDING TASK
Object handling is another form of the human-robot collaborative task in that the human and the robot move a jointly grabbed object from one location to a different location. In this type of task, it would be important for the robot to be able to predict the intention of the human user or inform the human about its intention through different communication modalities (e.g., implicit or explicit communication) depending on the design of the HRC framework [19], [20].

D. OBJECT HAND-OVER TASK
Finally, object hand-over is the last category in which a robot and a human exchange an item in the collaborative framework [21]. Object hand-over is considered a simple task in HRC but requires some cognitive abilities in a robot, since it needs to predict the intention of the human user while performing the task. Human intent could change during the task and the robot must adjust its behavior based on [22]. Object handover tasks could also be part of an assembly task that is required to be completed in several steps (e.g., it includes several object handover steps). In this type of scenario, the robot may need to select the correct object at each step to hand over according to the human user's desire, so the capability of human intent estimation should be added to the framework [23].

V. COMMUNICATION MODALITIES
The robot's ability to build and maintain mental models of other team members (human and robot) facilitates collaborative manufacturing processes. In HRC, human and robot must establish a shared mental model (SMM) to improve positive outcomes, such as team performance and safety in manufacturing processes [24]. Therefore, to build an effective human-robot teaming, just like human-human teaming and establish an SMM, communication is one of the essential steps that should occur between the human and the robot along with coordination and collaboration. Communication is essential in HRC, since purposeful communication helps build a shared mental model, transparency, and trust in a collaborative workspace [25].
In general, explicit and implicit communications are the two common approaches used in HRC. In explicit communication, which is obtained through verbal communication or gesture, the intent is obtained directly via communication. However, the intent is estimated through observed and predicted human behavior in implicit communication, a nonverbal communication method. In addition, a combination of explicit and implicit communication modalities (multi-modal communication) is used in the HRC field. The communication modality or modalities for human-to-robot (HTR) and robot-to-human (RTH) communication are usually chosen based on their reliability, robustness, cognitive load, and delay. Furthermore, some task-related factors, such as the type of task; extent of use, flexibility, duration, and additional classification, are other critical factors in the choice of communication technology in HRC [26].
Our analyzes revealed that communication, both HTR and RTH communication, in HRC frameworks is another key component of HRC, and advancements in communication modalities and techniques could lead to efficient collaboration with a greater sense of safety for the human user. However, once the necessity of having communication channels is approved, three important questions must be addressed to promote the HRC framework [ In this section, the communication modalities used in the selected articles will be examined and described in two categories of human-to-robot (HTR) and robot-to-human (RTH) communication. The methods used to address the problem of communication time and communication type will be addressed in Section VII.

A. HUMAN-TO-ROBOT COMMUNICATION
Communication between humans and robots (HTR) is an important factor in making HRI possible. The artificial agent, the robot, must access the information required to complete tasks in collaborative spaces that could be provided by the human user. Therefore, various techniques are used to establish communication channels between humans and robots, which will be discussed in the next subsection.

1) VERBAL COMMUNICATION
Verbal communication is the most straightforward method of explicit communication in HRC. Human and robot could communicate through speech/verbal commands, so that humans give commands to the robot, or the robot replies back to the human user. It is also possible to have bidirectional or two-way communication through speech in HRC; however, there are some challenges in using verbal communication in HRC. For example, it is difficult to establish a foundation only through verbal communication [19]; foundation refers to the fact that speakers understand the messages of others as intended [28]. Moreover, when it comes to time and cognitive resources, verbal communication is considered a costly approach [29].
Verbal communication could be used for HTR communication to establish bidirectional communication so that human users could also ask robots to act at any time [18].

2) NON-VERBAL COMMUNICATION
The vision system is the commonly used non-verbal communication channel through which humans and robots could have both explicit (e.g., gesture, text) and implicit communication (e.g., gaze). In the final list of articles, most research studies have chosen vision as a channel of communication and information exchange [30]. The vision system is used primarily for object detection, human body tracking, or information display. A vision system could be a set of RGBD sensors and an interactive GUI used to perform an assembly task. 3D RGB sensors are used for action recognition and help the robot understand the status of the environment, while the interactive GUI is used to interact with the user [9]. The vision system could monitor the workspace and recognize objects to reach them and move them to a different location [16], [17]. Furthermore, different types of information related to the environment and workspace could be obtained using the vision system, including the position of the assembly part, the position of the user in the environment, the physical characteristics of the environment, and the position and orientation of the robot end effector [11]. Human user behavior could also be sensed through a vision system (e.g., webcam and Kinect RGB-D sensor) in addition to tracking the positions / orientations of the various objects [31].
Although direct communication (for example, speech or gestures) provides reliable methods to establish a joint intention [23], it can require a human to stop performing another task to communicate with the robot, reducing the efficiency of the team. On the other hand, the estimation of intent allows the human to focus on task completion, resulting in a more intuitive and efficient relationship. Still, it requires the robot to interpret intent from information obtained via measurements (e.g., physiological states). This interpreted information can be used to answer binary yes/no questions about human intent, select between discrete modes of intent, or establish intent along with a set of continuous variables such as future limb trajectories and an approximate level of intent (e.g., based on speed or physical force exertion).
Additional research on task environments and interactions has shown that intent estimation may be improved by leveraging the context of spoken commands or interpreting human/object interactions based on object affordances. Although video-based sensing is often sufficient to recognize the gestures, objects, and approximate motions required to estimate intent, shared control over objects in HRC typically requires more accurate coordination than video processing permits. To this end, HRC systems often estimate intent using inertial measurement units (IMUs), force sensors, and physiological monitoring equipment such as EEG and muscle activation measurement devices (i.e., myography). Physiological signals, such as EMG signals that have information on body movements and record muscle activities, or EEG signals that record brain activities, have recently been used to give a command to a robot, control robot actions and movements, and create a shared control architecture in the context of HRC [32], [33]. However, such measurements can be invasive or uncomfortable and often provide noisy signals that may require significant processing and machine learning (ML) efforts to extract useful information. On the contrary, force sensors and IMUs provide more reliable but less instantaneous information (compared to neurological signals) [34]. In addition, haptic communication is another commonly used approach to accomplish an HRC task.

3) MULTI-MODAL COMMUNICATION
Using multiple communication modalities (that is, explicit and implicit) is another communicative approach in HRC that improves HRC flexibility and robustness, quality of communication, task performance, human safety, and production efficiency [35], [36]. Furthermore, it has been shown that the grounding issue related to the use of only verbal communication can be improved when verbal communication is used with other types of communication modalities, such as haptic communication [19].
Multi-modal HTR communication could be a combination of various communication channels such as speech-totext (STT), feedback channel, and error channel to create a collaborative environment and complete joint tasks [37]. Furthermore, the integration of force sensors with other communication modalities, such as web interfaces, feedback channels, STT systems, and emergency channels, could be another representative of multi-modal communication [14]. In addition, image-based (e.g. depth cameras, RGB cameras, body tracking cameras) and non-image-based (e.g., inertial measurement units force sensors) modalities have been used to provide smooth and efficient information sharing in HRC frameworks [38].
The combination of vision and speech is another way of information sharing in a multi-modal fashion in HRC. In these types of setup, the robot camera could provide information regarding objects in the environment to initiate activity and appropriate action selection in the workspace. The verbal commands are then used to transfer additional information, such as the result of the selection of actions by a robot or a human; if the selection of actions is correct in a way that follows the task procedure [15], [21].
Another way to establish multi-modal communication in HRC frameworks is by combining the human gaze, the human hand / body gesture, and the vision system [39]. Hand gestures (hand gestures can be divided into manipulative and communicative gestures and instruct robots) are an effective way of communication, as there is a close relationship between hand gestures and the semantic content of the verbal language, as well as because they provide spatial information of the user's hand [23]. Furthermore, human physiological signals, such as muscle activity obtained from force sensors, could be used with other sensing and communication modalities, such as speech and vision systems, to complete a collaborative task [40].

B. ROBOT-TO-HUMAN COMMUNICATION
In HRI, the human user's understanding of the robot's intentions is important because it can improve trust in the robot. However, compared to HTR communication modalities, RTH communication methods are not widely studied, and recent research efforts have been made to add new communication technologies, such as extended reality (XR), to the HRC context to fill the gap [26].

1) VERBAL COMMUNICATION
In some of the selected articles, verbal commands were used to establish RTH communication and provide different types of information for the human user, such as how the human needs to do a task or why the human user should follow a specific method to do a task to allow human decision-making based on the information provided [19]. Additionally, RTH verbal communication could be a method to inform the human user about the state of the environment, the goal, the plan or action, and asking the human partner to act [18], [41]. Furthermore, the use of verbal commands for HTR and RTH communication could facilitate the creation of bidirectional and two-way communication channels in HRC [8].

2) NON-VERBAL COMMUNICATION
A display installed on the robot, an interactive GUI, robot gaze, robot hand gesture, and robot body gesture are other ways of building RTH communication [9], [23]. The hand gesture is not limited to being used by the human user; robots could use this approach as a way of communication alongside other techniques such as gaze. The gaze of the robot could be an indicator of its readiness to execute a task [42] or to signal planned actions followed by an action [43]. The robot's gaze could help to accomplish two primary purposes, establishing mutual belief (that is, the user is indicated about the action to be taken) and indicating readiness for the next instruction. In this specific case, whenever a robot decides to close or open its hand or reach out to an object, it could look at its hand or look at the object in the task [23]. Robot gestures are an appropriate and informative communication medium in HRC, so more innovative methods, such as zoomorphic gestures, are introduced in the field [44].

3) MULTI-MODAL COMMUNICATION
RTH communication through multiple communication modalities could also bring more robustness and clarity to the HRC context. The text-to-speech (TTS) channel combined with a web interface [14] or TTS combined with a vision system to display information to human users [37], [45], [46] is used in the execution of collaborative tasks. Communication modalities such as speech, gaze, and gesture are also used to establish a common understanding of the environment for the human user and robot [39]. Robots could be equipped with human-like features to represent different emotions (such as animated eyes) and express more natural and user-friendly interactions. [46].

4) COMMUNICATION VIA EXTENDED REALITY TECHNIQUES
Extended reality (XR) techniques (i.e. virtual reality (VR), augmented reality (AR), and mixed reality (MR)) bridge the virtual environment to the physical and real environment, and there are many attempts to integrate these techniques with HRC. XR techniques provide four types of solutions for HRC: 1) operator support, 2) simulation, 3) instruction, and 4) manipulation [47]. In this categorization, operator support is provided by enabling communication between the human and the robot through XR techniques. Simulation solutions give users the opportunity to understand the collaborative task and environment. Virtual instructions could provide a chance for the exploitation of virtual and augmented environments, which help the human user follow the hierarchy of tasks. Finally, the manipulation solution examines existing solutions on how these techniques enable the operator to operate and manipulate the robot remotely.
According to these solutions provided by XR techniques, communication between the robot and the human could be established using the XR technique to provide support to the human users. It has been argued that these technologies solve some communication problems, including poor information exchange and difficulty understanding intent [48]. XR technologies provide alternative communication paradigms and could be used to project dynamic visual cues into the environment [49] and display task information in the workspace [50]. In addition to using XR technologies as a communication method, they are used as a platform to compare and evaluate other communication methods [51], improve communication in HRC [52], improve trust [53], evaluate the robot decisionmaking algorithm, or even assess whether a robot equipped with a communication panel could be a good partner for humans [54].
MR has been introduced as a new communication paradigm that could be combined with a vision-based object tracking algorithm (OT) with a context-sensitive projection mapping technique (PM), through which robots communicate with humans and instruct them in collaborative tasks [55]. Additionally, AR visualizations through HoloLens are used to perform collaborative tasks. The human partner receives information about robot states and plans related to human safety and trust, such as the intended movement of the robotic arm or the mobile platform navigation plan. AR visualizations are used to plan the navigation path of the robot as a sequence of green 3D spheres; plan manipulation movement for grasping as a sequence of 3D spheres and robot workspace as a semi-transparent red sphere; display warning message to the user in case of detected potential collision with the robot workspace and view of detected potential collision [56].

VI. ROBOT DECISION-MAKING
HRC frameworks are becoming more intelligent with the introduction of intelligent robots that could make decisions on their own and adapt to the environment that facilitates the human-robot partnership and promote human safety. However, the choice of decision-making algorithm added to the HRC framework depends on multiple factors, such as collaborative task and available communication channel, which all affect the robot learning algorithm, the decisionmaking model, and interaction planning [57], [58]. It is necessary to know the characteristics of the task because it would determine the defined or undefined parameters in the environment. For example, in assembly tasks, tolerance or completion time may be well defined, while other details, such as the preferences of the individuals, may change in different individuals. However, the robot must learn the preferences of each operator and adapt to the situation. This section will describe the different robot decisionmaking algorithms used in the selected articles as one of the key components of HRC frameworks.

A. MACHINE LEARNING
Conventional machine learning (ML) techniques are mostly used to create mapping functions to link human physical capacities with HRC design, such as mapping human actions as robot input using support vector machines (SCM) [59].
In order to build the connections between humans and robots, existing HRC research has proposed many ideas that link human physical capabilities (e.g., vision, voice, motion, and haptic) to the computational paradigm of robot design. Most of these links are represented by machine learning algorithms, which take human physical behavior data as input and robot computational commands as output. In addition, these algorithms explore the potential to mimic human cognitive advances, such as human brain architecture, as the most recent variants of these techniques, dynamic neural systems (DNS). DNS is a time-variant system that tries to mimic the firing dynamics of actual neurons [60]. The cognitive functionalities of robots could be provided through the implementation of a DNS that applies brain-like computations to learn the sequential order of object transfer to complete a collaborative task. A neuro-inspired model based on DNSfor action selection in a human-robot join action scenario could be implemented where several DNS layers consisting of different pools of neurons are used to obtain information in the form of self-sustained activation patterns regarding the object location, action goals, and context. Connected populations provide input to trigger these patterns, which evolve continuously over time under the influence of recurrent interactions [46].

B. DEEP LEARNING
Besides creating mapping functions between human physical capacities and robot computation, another challenge in HRC design is the computational representation of human-related factors that are embedded in multi-modal and multi-scale sensor data. Therefore, there have been attempts to develop new ML methods that could act as a feature extractor that ''transform the raw data of human capacities into a suitable internal representation or a feature vector from which the learning subsystem can be derived'' [61]. As a result, deep learning (DL) algorithms were developed, which are: representation learning methods with multiple levels of representation, obtained by composing simple but nonlinear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level [61]. Convolutional neural networks (CNN) and recurrent neural networks (RNNs) are commonly used DL methods. CNN is designed to process data that come in multiple arrays, such as a color image composed of three 2D arrays containing pixel intensities in the three color channels, and has application in image processing. Numerous research studies have shown that CNN is a promising method for learning representation from human visual perception in the HRC context [62]. On the other hand, RNNs are trained using the backpropagation technique and are suitable for tasks with sequential input, such as speech and language. DL methods are also combined with other neural networks [61] or reinforcement learning (RL) methods to create DRL architectures in real-world HITL settings to study collaborative learning between humans and robots [12].

C. PROBABILISTIC GRAPHICAL MODELS
Another challenge of representation learning in HRC is the mental models that are supposed to be shared between humans and robots. In particular, the states and actions of both parties should be learned and represented as a theoretical uniform to achieve shared mental models. Furthermore, the representation of these mental models should be capable of integrating uncertainties to show robustness and tolerance to environmental changes, such as human preferences or task changes. Probabilistic graphical models (PGMs) are currently considered one of the promising methods to solve this problem. PGMs are defined as: ''Probabilistic graphical models (PGM) comprise any model that uses the language of graphs to facilitate the representation and resolution of complex problems that use probability as a representation of uncertainty [63].'' A graph structure consists of several nodes, and the edges represent the probabilistic relationships or conditional dependency/independence among a set of variables in a system. The nodes represent variables, and an edge between two nodes indicates a conditional dependency between the two variables (the absence of edges means conditional independence). PGMs are used for different purposes in HRC, including supervised classification, clustering, abductive reasoning, decision-making, and optimization [63].
Bayesian network models (BNM) are the crucial case of graphical models in that the joint probability distribution of variables is represented through a directed acyclic graph. Future states and robot actions are predicted using a multitime-slice dynamic Bayesian network (DBN) [21]. ''DBNs are multi-time-slice Bayesian networks where variables are connected over adjacent time steps as well as within the same time step. They are a computationally efficient generalization of hidden Markov models and have been used to model multi-modal robot behavior in uncertain'' [21]. Bayesian sequential optimal decision-making models were used to allocate autonomy between humans and robots while performing a collaborative assembly task and integrating unobservant human states, such as human regret in decision-making into Bayesian strategy [64]. In addition, the hidden Markov model (HMM) is another recurrent type of graphical model in which the nodes represent timestamps of a sequence of the same variable. The primary assumption is a dependence on timestamps from the previous one only. Furthermore, these variables could only be observed through indirect observations that correspond to a node connected only to the time stamp being considered. Variations of the Baum-Welch algorithm are also applied to an HMM, and patterns of human actions could be recognized by an HMM in two steps. First, a training set consisting of observed human workers performing the task considered in the simulation was used to complete the model learning step. Then, this model was used to predict the type of supportive behavior that a robot should choose while working with a human partner [65]. Both DBNs and HMMs are among the popular computational methods in HRC scenarios to represent mental models of humans and robots.
Another category of PGMs is timed Petri nets (TPNs), an extension of Petri nets with additional modeling of timing. ''A basic Petri net, or place/transition net, is a bipartite multi-graph comprising two finite disjoint sets of nodes, places, and transitions. A multi-set of directed arcs connects the node types in an alternating fashion. The places contain a natural number of tokens; control is transferred through token movement throughout the graph '' [39]. TPNs are not widely used in the field of robotics, but some researchers believe that a natural representation for multi-modal interactions could be obtained using TPNs in HRC [39]. For example, in [39], the real-time turn-taking TPN framework was selected as a computational representation that monitors resources and generates multi-modal reciprocal behavior for a robot to engage in cooperative activity with a human.

D. LEARNING FROM DEMONSTRATION
Learning from demonstration (LFD) that could be implemented through kinesthetic teaching, teleoperation, and passive observation is a robot learning method in that the robot learns to do a task by imitating a demonstrator. This method could be considered a supervised learning method, since an expert and a robot provide the information that the tries to follow. In this robot-learning method, even non-experts could interact with the robot, learning could be done using a small number of demonstrations (i.e., data efficiency), robot learning is done in a safe condition, it is a reliable method, and the learned task could be implemented in a different platform (i.e., platform independence). However, there are some limitations, including that it is difficult to demonstrate complex behaviors, labeled data are needed, and learning from sub-optimal and inappropriate demonstrators would not be accurate [66], [67].
In HRC frameworks, LFD is used to learn and infer the preferences of human users based on the sequence of sub-tasks and actions in collaborative tasks. Learning and inferring could be done through a two-stage approach: for the training phase, a set of user demonstrations of the entire task is obtained as input, and the human user's preference is learned and gets clustered based on the demonstrations. The execution phase is implemented online, where the probability of a new user's preference is classified as one of the pre-learned clusters based on the observed actions, and the robot's following action gets predicted [10]. LFD is a data-driven learning system in that robot behavior and interaction model are learned during training from the demonstrations of the various sub-tasks. The motion of the robot (both spatially and temporally) continuously could be coordinated with the human partner at run-time [68]. LFD is also used as direct policy learning (LFD) in combination with other formulations, such as the formulation of relational activity processes (RAP), which is a semi-Markov decision process (semi-MDP) to formalize relational concurrent activity processes in relational cooperation scenarios [69].

E. REINFORCEMENT LEARNING
Reinforcement learning (RL) is one of the commonly used learning methods in HRC, in that the robot is considered an agent that has some observations from the environment. These observations are used to infer the current state of the environment. The relationship between states and observation could be modeled through the Markov decision process (MDP). There are three alternatives to this modeling algorithm: Markov decision process (MDP), in which all states are observable for the agent; partially observable Markov decision process (POMDP), in which the states of the system are not observable for the agent and it has to maintain a belief over states; and mixed observability Markov decision process (MOMDP) in which some components of the state may be observable, even the state itself is not fully observable [70], [71]. RL algorithms are divided into two categories of model-based RL and model-free RL, since obtaining a model for a system with various states is difficult, model-free RL is the most commonly used.
Q-learning is one of the RL algorithms widely used in HRC in the robot learning phase [9]. This learning method is used in combination with other ML techniques and in the deep reinforcement learning (DRL) architecture. DRL is implemented to train robot policy so that multiple layers of graph convolutional networks (GCN), long-short-term memory blocks (LSTM), and a variational autoencoder (VAE) are considered as another type of neural network (NN) that is supposed to use input data to find different representations of it [60]) were combined to extract a representation of short-term human action data and a recurrent deep Q-function [72]. In other words, VAE provides the input of LSTM blocks, and all ML steps resulted in the production of Q values for the framework.
In addition to Q-learning, inverse RL (IRL) is another popular model-free RL-based algorithm. In this method, an agent's behavior is observed to obtain the reward function, which is opposite to the Q-learning approach. IRL is kind of an unsupervised/semi-supervised approach in that all kinds of data are valuable. This method is used in HRC, as sometimes problems are so difficult and complex to train with a reward function [60]. The preferences of the human user in a collaborative task could be trained through an IRL framework in a MOMDP, in a way that the hidden state of a MOMDP is used to model the user's preferences, which in real time are adjusted depending on the particular type of user [73]. Learning through IRL could be integrated with LFD (for the teaching phase) in a way that teaching is done through human demonstration and natural language, while robot learning is done through IRL algorithms to calculate the reward to develop a collaboration algorithm [40]. IRL method is also used to optimize the reward function for non-parametric motion flow models trained from human demonstrations [22]; IRL is used as one of the learning methods in RAP formalization in relational concurrent cooperation domains [69]. The relational action processes (RAP) framework, which relies on relational MDPs, is used to model the concurrent task.
Concurrent execution of multiple actions with the possibility of asynchronous initiation or termination of actions is allowed in RAP that uses relational representations for both state and action space in the decision-making process [11]. RL methods are considered interactive robot learning systems so that the training and execution phases could be combined.
In HRC, the collaborative framework must be properly designed to have a transparent architecture to instill trust. As discussed previously, transparent and purposeful communication is one way to convey trust and a feeling of safety, while researchers try to enrich robot decision-making through RL methods. Human mental states such as trust are integrated into robot decision-making by proposing trust-POMDP to model human trust evolution in the process of performing different collaborative / cooperative tasks [16], [17]. Furthermore, some studies focus on integrating RL-based decision-making for action and communication in the RL framework. CommPlan is a new framework proposed to integrate decision-making for action and communication in sequential decision-making under uncertainty. The proposed computational framework, which consists of a model specification process and a POMDP execution time planner, was designed to address the question of if, when, and what to communicate during human-robot collaboration. In this study, the multi-agent MDP (MMDP) was used to represent the sequential task model; a parametric model to specify communication costs, and the agent Markov model (AMM) to represent the sequential decision-making behavior of humans [8].

F. OTHER TECHNIQUES
There are other techniques that are used to handle decision-making and learning in HRC scenarios to improve communication between humans and robots. These methods could not be strictly considered ML techniques but use different methods to narrow the scope of uncertainties during information exchange.
One type of these methods focuses on developing feedback to instruct humans. One of these techniques is visual signaling frameworks that provide visual instructions to human users by providing 3D real-time object recognition and tracking. Model-based object recognition and tracking (OBT) algorithms in a framework combined with a projection mapping system and mixed-reality cues are used to instruct human partners in collaborative assembly tasks [55], [88].
Another direction of these methods emphasizes on solving uncertainties through task planning. For example, high-level task planning using the hierarchical task network (HTN) is a learning and decision-making approach to accomplish a task in a shared workspace in the HRC framework. In this method, tasks are divided into two categories of primitive and compound tasks; these tasks are represented using an initial task network based on the hierarchy of their execution. The goal would be to decompose all compound tasks in the initial task network, and the solution is a ''plan which equals a set of primitive tasks applicable to the initial world state.'' In addition to having an initial task network as an objective to be achieved, it is necessary to have an initial state description and domain knowledge consisting of networks of primitive and compound tasks [89]. These task planners could be used in architectures consisting of modules such as a situation VOLUME 10, 2022 TABLE 2. Extracted features from the articles: task type, communication modality, robot decision-making, categorization, and paper focus. assessment module, a theory of mind (ToM) manager, a highlevel task planner, a geometric action and motion planner, a dialogue manager, and a supervisor. In such architectures, the robot will have the ability to estimate the other agents' mental states about the environment; about the state of goals, plans, and actions while interacting with humans [18]. Furthermore, in HRC an extension of HTN planning is implemented, a task planner based on a hierarchical agent-based task planner (HATP), integrated with reference expression generation (REG) to include communication actions in the process [41].
In contrast to task-planning methods, HRC is also cast as a set of combinatorial search problems. The trajectory from an initial state to a goal state is defined as an optimal search problem to identify combinatorial rules and actions to solve uncertainties during the search process [90]. Answer set programming (ASP) is one of these learning approaches used in HRC. ''ASP is a form of knowledge representation and reasoning paradigm oriented toward solving combinatorial search problems, as well as knowledgeintensive problems. The idea of an ASP is to represent a problem as a program whose models (called answer sets) correspond to the solutions. The set of answers for the given program can be computed using specially implemented systems called ASP solvers, such as Clingo'' [91]. Hybrid conditional planning based on ASP (HCP-ASP) is also used for planning in collaborative tasks [45]. Finally, graph search (GS) and trajectory optimization (TO) are proposed as a novel bilevel optimization formulation in HRC [74].

VII. NOVEL CATEGORIZATION OF HITL SYSTEMS
The reviewed articles had some similarities and differences, but two distinguishing factors were identified that were used to categorize the available HITL systems: 1) level of communication and 2) adaptability in communication and task execution. The first factor focuses on the level of communication in HRC where information related to the task or any of the human and robot's intent could be shared. Information sharing should be done through the appropriate communication modality in the framework that has been discussed in Section V, information sharing could be done through HTR and RTH communication in that humans and robots need to be informed about each other's intention before the execution step. Moreover, communication time and communication type affect the shared understanding of the task and the user's intent. Information sharing will bring caution to humans in a collaborative workspace. The second factor investigates adaptability in task execution as another factor that affects human trust in robot and team performance [20].
Consequently, using the factors mentioned, a new categorization of the available HITL systems was proposed in the context of HRC. Figure 2 shows a schematic of the DHITL and PCHITL architectures.
• Delayed Human-in-the-loop (DHITL): The robot starts doing an action based on the prediction of human intent; the human observes the robot's action and then provides feedback to the system.
• Pre-cautious Human-in-the-loop (PCHITL): In addition to estimating the intent in some steps, the human or robot could inform each other before the execution of the task in a purposeful way embedded in an intelligent architecture. According to Table 1 and Table2, of the 49 articles reviewed in this article, 12 were classified as PCHITL, one as semi-PCHITL, and the rest were considered DHITL. This section will discuss details related to the frameworks identified as PCHITL while considering communication as the deriving factor. In the categorization, the mode of communication was not considered a factor in itself, but the way the communication modalities were used to promote purposeful communication by addressing any of the issues of how to communicate; when to communicate; and what to communicate was the focus of this categorization. 1) Some HRC frameworks allow the robot to decide if, when, and what to communicate while performing collaborative tasks with humans. The robot informs the human on what action it is going to do (e.g., I am going to do an action at a landmark); asks the human counterpart about her intention for the next step (e.g., Where are you going?); and also commands the human counterpart what to do (e.g., please make the next sandwich at landmark).
In this type of framework, there is bidirectional communication between humans and robots, which is considered an indication for PCHITL [8]. In these types of studies, generally, a decision-making algorithm is provided for different types of collaborative tasks, including sequential tasks that work across tasks and communication modalities [8]. Decision-making for communication in HRC has some challenges, including modeling human teammates, estimating the benefit of communication, the inherently decentralized nature of multi-agent tasks, and the need for execution-time communication decisions [92]. Additionally, it is crucial to study the feasibility and cost of verbal communication in the task planning step [41]. 2) Verbal communication also is combined with actions to provide information to the human user about how to perform a task. In scenarios where there are multiple ways to perform a task, the robot can choose one based on its preferences, which could give the robot more information about the environment for some reasons, such as the location of the sensor. In this case, verbal communication could be used in two ways: 1) the robot provides verbal commands that explain to the human how it wants to do a task, and 2) the robot informs the human why it chose to act in a specific way (state-conveying actions). Since the robot informs the human how and why this framework is classified as PCHITL [19]. 3) Furthermore, defining different types of communication actions based on the nature of the collaborative task improves the fluency of teamwork and the performance of the task. For example, in a collaborative table assembly task, communication actions such as 'Confirm Attach', 'Ask Help', 'After Help', 'Request To Unhold', and 'Request To Attach' allow the robot to request a collaborative human teammate to perform some action, initiate / end conversations, and provide explanations. In this type of collaborative assembly planning, the robot would be able to decide when and how to communicate with a human teammate, and the human teammate is fully aware of the robot's decision/intention/desire [45]. 4) It is also possible that human users have multiple options to perform a task with different behaviors. In these cases, equipping a robot with verbal communication could promote efficiency, fluency, and acceptability when needed. For example, in a block building task, two types of behavior could be defined for the human partner: 1) an adaptable human who is adaptable to robot verbal commands and 2) a non-adaptable human who is not adaptable to the robot verbal command (there is no change in decision that has been made). The robot can adapt its behavior based on humans and their decision. Verbal communication is done not only when needed, but also improves task efficiency and team fluency [13]. 5) There might be a collaborative task where the robot is unaware of the final goal. The human user knows the task's goal, so the robot would need to communicate with the human user to know its role and ask its partner which sub-task to do. This communication could be done using verbal communication by directly asking/informing humans or non-verbal communication such as gestures to point out a specific object or area and inform the human. In both cases, since the robot informs the human user of its intention, it is framed as PCHITL [41]. 6) In general, most architectures try to create a PCHITL by adding different communication modalities to the HRC framework. For example, through a dialogue manager module, the robot could verbalize information to the human and recognize basic vocal commands. Whenever verbal communication is done before task execution, the entire architecture creates a pre-cautious situation for the human partner [18]. In addition to a dialogue manager module, multiple communication channels (e.g., TTS, STT, and head display) are added to the architecture of the HRC framework to provide the ability to verbally / non-verbally interact with the human to the robot and provide feedback to the robot about the internal states and intents of the human user to the robot [37]. The force sensor is another non-verbal communication channel that could be added to TTS to enable bidirectional communication in a joint construction task and result in PCHITL [14]. Furthermore, gaze and gesture are used to share intentions and task information, while speech is an active communication channel in the process [39]. Furthermore, physiological signals, such as muscular activities, could be combined with various communication channels in a teachinglearning-collaboration (TLC) model in which robots learn to complete a collaborative task, such as assembly, using natural language instructions, and respond using speech [40]. Vision systems are commonly used with verbal commands / speech synthesizers that could be used in the design of a cognitive control architecture for joint action to share information about sub-tasks, or the robot's intent/decision for the next step of the task [46]. 7) XR techniques are alternative approaches to creating PCHITL frameworks. AR techniques are used to provide the robot's status and intended movements for the human partner as visual cues [56] or display task information such as the pose and state of physical objects on the shared workspace ahead of time and in-time instructions for the human teammate to perform a collaborative task [55]. In conclusion, a PCHITL framework in HRC is developed either by using one or some communication modalities or by proposing computational algorithms to make RTH and HTR communication a needed-based communication. Table 3 summarized the articles reviewed belonging to the PCHITL category and divided them by whether they have focused on the necessity of communication (that is, if communication is needed), the time of communication (that is, when to communicate), or the type of communication (what to communicate) that resulted in a PCHITL framework. As shown in Table 3, PCHITL could be created simply by choosing an appropriate communication modality without adding complicated computational algorithms to the framework to make communication more purposeful and efficient [37], [39], [40], [46], [56], [56].

VIII. DISCUSSION AND FUTURE OF WORK IN HRC
Collaborative robots work side by side with human partners in various fields; of course, applications of collaborative robots vary from industry to industry and do not all require the same type or extent of human interaction. Ensuring the human user's safety during interactions is essential and requires trade-offs in performance. Typically, physical safety in HRC is obtained by a combination of mechanical design (e.g., low inertia links and soft structures), actuator selection (e.g., variable impedance actuators), sensor selection (e.g., vision systems and tactile skins), and planning/control strategies (e.g., collision avoidance, velocity scaling, and impedance or passivity-based control) to reduce quantitative risks of injuries [93], [94]. In addition to physical safety, physiological safety in HRC is very important and should be given [95].
To guarantee both physical and physiological safety of the human user in HRC frameworks: 1) key components and elements in HRC frameworks should be recognized; 2) and then uncertainty resources in each of these components should be detected. This article identified the available HITL frameworks in the context of HRC in manufacturing settings with a new approach, recognized the key components of these frameworks, and detected the most important factor that any kind of limitation could cause a high level of uncertainty. The HRC frameworks were classified into two classes, DHITL and PCHITL, after reviewing some of the key components such as the type of collaborative task, communication modalities, and robot decision-making.
Among these factors, we posit that in an era where miscommunication contributes to a widening distrust in humans, there is a need for developing HRC frameworks that enhance communication and promote bidirectional communication between the robot and human user. In HRC frameworks, bidirectional communication could be created to exchange the human user and robot's needed information about the task or each other's intent in a proper time with a proper method, PCHITL. However, once there is not enough information sharing between humans and robots, a feeling of distrust and lack of physiological safety will be conveyed to the human user, DHITL. Tables 1 and 2 showed that articles in the PCHITL category have verbal / natural language / speech commands as their primary or one of the communication modalities. Therefore, based on this result, verbal communication is the standard or the most straightforward approach to creating bidirectional communication in HRC. According to a new research study, effective spoken language interaction with robots could benefit the research area in some ways. The interaction of spoken language is considered one of the fastest communication methods between humans and robots; interactions through spoken language will be more motivating, satisfying, and reassuring; people will expect a robot to talk in the future; talking robots will be liked more; it could be combined with other methods, such as robot gestures and actions, to reinforce or clarify a message [96]. Emotional expression is a factor that enhances human-robot communication and could be verbal or non-verbal forms. In nonverbal communication, emotional expression has been shown to have a positive impact on human-robot collaboration [97]. In addition to verbal communication, communication through visual or haptic devices could also result in bidirectional communication in HRC.
Regarding the other components, assembly tasks were the focus of most of the articles, and, as mentioned before, the type of task has an impact on how to communicate, what to communicate, and when to communicate. However, bidirectional communication capability in a PCHITL framework improves the quality of task execution in any type of task and feels of safety for human users. Furthermore, for tasks with an unknown goal for the robot; bidirectional communication would convey a feeling of security to the human user, since the robot's decision regarding the following action may be unclear to the human. Furthermore, the choice of a robot decision-making and learning algorithm in an HRC framework also depends on the characteristics of the task, the environment, and the available information (i.e., observable or non-observable states). Our results indicated that RL algorithms are the most commonly used approach in HRC, especially when there is uncertainty in the working environment. It is possible to integrate the human user's latent states, such as trust and stress, as well as communication states, with the task execution model through RL algorithms. This integration will facilitate bidirectional communication and, as a result, create a PCHITL framework.
Some attempts have been made to integrate physiological signals with HRC frameworks in different ways, such as closed-loop robotic control [98], shared control of a robotic platform [33], supervisory control tasks by combining EMG and EEG [32], collaborative task performance with adaptable robot physical behavior based on human motor fatigue [99]. Physiological signals are also used in HRC to create frameworks for ''physiologically aware human-robot collaboration''; robot actions are adjusted according to the mental states of the human user [100]. We propose a PCHITL system that integrates robot decision-making for communication with robot decision-making for task execution in that human brain signals would be the source of detecting the human need for communication in collaboration with a robot. The robot will be able to infer the human need for communication through human brain signals, communicate on the basis of that information, and leverage this to improve the efficiency of communication and collaboration. This research tries to leverage the real-time availability of data from the human brain to control robot communication with a human in physically interacting scenarios to achieve cost-effective verbal and bidirectional communication. In addition, the effect of adding some stressors, such as time, to the framework could be investigated to further improve robot decision-making in a PCHITL framework.

IX. CONCLUSION
The general objective of this document was to find current trends in HRC on how human users are involved in the collaborative task and how their role is defined in the loop because we believe that it affects human physical and physiological safety and team performance. Therefore, a systematic review was performed and the total number of articles was selected according to the defined criteria. This systematic review studied different aspects of HRC scenarios, including the type of collaborative task, communication modalities, and robot decision-making algorithms. The selected articles were classified into two categories, DHITL and PCHITL; Most of the articles belonged to DHITL, as there was no communication before task execution and information sharing. The assembly task was the focus of most of the selected articles with vision systems, which is a commonly used method of communication in HRC frameworks. ML and RL algorithms, mostly RL, were implemented as robot decision-making in the framework to grant human user both physical and physiological safety and make collaboration possible. Communication was proposed as the most important factor in the development of the PCHITL framework in HRC. Although researchers are using various types of communication channels and methods; further research is warranted to improve communication in HRC. Future work should examine how various communication modalities and decision-making for communication affect human safety in collaboration.