Autonomous Social Distancing in Urban Environments Using a Quadruped Robot

Corona Virus Disease 2019 (COVID-19) pandemic has become a global challenge faced by people all over the world. Social distancing has been proved to be an effective practice to reduce the spread of COVID-19. Against this backdrop, we propose that the surveillance robots can not only monitor but also promote social distancing. Robots can be flexibly deployed and they can take precautionary actions to remind people of practicing social distancing. In this paper, we introduce a fully autonomous surveillance robot based on a quadruped platform that can promote social distancing in complex urban environments. Specifically, to achieve autonomy, we mount multiple cameras and a three dimensional light detection and ranging sensor (3D LiDAR) on the legged robot. The robot then uses an onboard real-time social distancing detection system to track nearby pedestrian groups. Next, the robot uses a crowd-aware navigation algorithm to move freely in highly dynamic scenarios. The robot finally uses a crowd-aware routing algorithm to effectively promote social distancing by using human-friendly verbal cues to send suggestions to over-crowded pedestrians. We demonstrate and validate that our robot can be operated autonomously by conducting several experiments in various urban scenarios.


I. INTRODUCTION
C OVID-19 pandemic has quickly become the most dra- matic and disruptive event experienced by people all over the world.People may need to live with the virus for a long time.One of the most effective measures to minimize the spread of the coronavirus is to promote social distancing.To achieve it, some related applications are developed in the existing on-site closed-circuit television (CCTV) systems to detect social distancing.However, the on-site monitoring system is not ubiquitous in some areas and sometimes it may not be able to cover all public corners.Furthermore, although this sort of monitoring system has detected social distancing violations, it fails to take any proactive actions to promote social distancing.
Compared to the on-site monitoring system, the surveillance robots can be flexibly deployed and patrol in the desired public areas.Moreover, the robot can take precautions to promote social distancing rather than monitoring them.These potential applications have been validated by teleoperation robots [1] and hybrid systems [2].The hybrid system introduces the external devices such as CCTV to help robots monitoring social distancing.However, developing such a fully autonomous surveillance robot in complex urban environments without any external device still encounter several challenges.First, to monitor the social distance between pedestrians without any external device, a robot-centric real-time perception system is demanded on the on-board devices with limited computation.Second, in many urban scenarios, the robot needs to safely navigate through unstructured and highly dynamic environments.Third, more intelligent interactions with humans need to be designed to improve the efficiency of promoting social distancing.
In this paper, we introduce a fully autonomous surveillance robot to promote social distancing in complex urban environments.To achieve this autonomy, we first build the surveillance system with multiple cameras and a 3D LiDAR on a legged robot, which empowers the robot omni-perceptibility and extends its traversability in complex urban terrains with uneven terrains and stairs, which are challenging for normal wheeled mobile robot.Then, we develop an on-board realtime social distancing detection system with the ability to track the robot's nearby pedestrian groups.Next, the Crowd-Move [3] algorithm is used to navigate the robot in highly dynamic environments.Finally, we develop a crowd-aware routing algorithm to allow robots to approach over-crowded pedestrian groups and effectively promote social distancing using verbal cues.We also investigate the influence of human voices to the effectiveness and acceptability of quadruped surveillance and social distancing, because it has been reported that a robotic patrolling inspector can be terrifying for general citizen * * .We demonstrate that this surveillance robot can be automatically operated with satisfactory human response by conducting experiments in various urban scenarios.
The rest of this paper is organized as follows.Section II reviews the related work.Section III describes the hardware platform that the surveillance robot builds on.Section IV presents the robot's tracking algorithm used for social distancing detection.Section V illustrates the robot's navigation in urban scenarios.Section VI discusses the robot's interactions with humans through verbal communication.Section VII presents the experiments conducted to validate the proposed algorithms.Section VIII concludes this paper.

II. RELATED WORK
In this section, we will give a brief overview of algorithms related to our system, including the robotic perception, navigation, and interaction for surveillance robots.

A. Perception for Surveillance Systems
Pedestrian tracking has been widely applied in surveillance video analysis and is well developed based on research on multi-object tracking problems [4]- [7].
Discrete velocities are used to model pedestrians' motion [8], [9].Although discretization improves the efficiency of prediction, this approach cannot fully satisfy real-life continuous situations.Chung et al. developed cognitive models to improve the performance of their model [10], but they did not consider the people's facing direction by only using circulars to model pedestrians.
Helbing et al. proposed social force to model and predict people's move according to energy potential which is caused by people and obstacles [11].Then the tracking performance is improved by detecting abnormal events among pedestrians [12].Pellegrini et al. developed Linear Trajectory Avoidance (LTA) to improve the accuracy of motion prediction [13].
[14]- [16] developed social interactions among pedestrians to improve the accuracy of behavior models.Sheng et al. proposed the Robust Local Effective Matching Model to solve the issue of partial detection of objects [17].However, these approaches cannot describe pedestrians' dynamics in dense situations because they only use linear models.In our system, a nonlinear model, Frontal RVO (F-RVO) [4] is used to simulate motions in crowds and also model the dynamic behaviors considering pedestrians' facing directions.
With the blossom of deep learning, CNN is well developed to extract the trajectory of a single object [18]- [20].Chu et al. developed STAM to detect more objects [21].Fang et al. improved the performance of tracking by using RNN [6].The authors in [7] developed the SORT model to track pedestrians.However, by tightly coupling detection and tracking, these approaches cannot always provide satisfactory performance in pedestrian detection.Mask R-CNN [22] and YOLO [23] are two state-of-the-art detection networks with sufficient performance for detecting purposes, where YOLO is much faster than Mask R-CNN, and thus is more suitable for realtime tracking tasks.

B. Navigation in Urban Environments
Compared to the fixed video surveillance system, the surveillance robot not only has the above perception capabilities but also endowed the surveillance camera with mobility.However, navigating a robot in urban environments is nontrivial.
First, the robot would inevitably interact with dynamic obstacles like pedestrians, bicycles.Some studies have been proposed to deal with the collision avoidance problems in such dynamic scenarios.[24], [25] proposed that each agent in dynamics scenarios should take half of the responsibility of collision avoidance.Based on that, they develop the multi-agent collision avoidance algorithm with zero-communication.[26], [27] presented the interacting Gaussian processes to capture the cooperative collision avoidance behavior, and introduced the cooperative planner for robot navigation.However, these algorithms fail to track a moving pedestrian without the assistance of external devices.[28], [29] deployed a LiDAR with multiple cameras on robot to track surrounding pedestrians.To navigate the robot in the crowds, they utilized the reinforcement learning algorithm to train the socially aware collision avoidance policy.Different from the above algorithms, [30]- [32] proposed a sensor-level collision avoidance policy learned via reinforcement learning, which can directly process the raw LiDAR data to generate collision-free actions.

C. Robot Interaction
Human-like characteristics of social robots would influence users' response.Among various social traits, gender is important for interpersonal relationships and evokes social stereotypes [33].Previous research has pointed out that the participants were more accepting of the robots if their perceived gender of a robot conformed to their occupation's gender role stereotypes (e.g., male security robots or female healthcare robots).However, perceived trust of the social robots was Fig. 2: Our system contains functional modules of tracking, mapping, localization, patrol planning, routing, and motion planning.Tracking module uses YOLO [23] and F-RVO [4] to extract similar detected objects of consecutive frames and to keep track of people.Mapping is achieved by using LeGO-LOAM algorithm which is based on 3D Lidar sensor.For localization, we used NDT localization algorithm to match lidar data and localize robot in the generated map.According to the detected crowds and map, crowd-aware routing algorithm and patrol planning algorithm would help robot to determine current target to approach.With all information needed for motion planning, an end-to-end algorithm, CrowdMove, is used to drive robot toward the goal position.During the approaching, if the robot detects its distance to the crowds is lower than 5 meters, it starts to play a recorded vocal command to remind people to keep a proper social distance.
In contrast, Kuchenbrandt et al. [35] found that participants, regardless of gender, evaluated the male and female robots as equally competent while performing a stereotypically female task but, in the context of a stereotypically male task, the female robot was rated as more competent compared to the male robot.Another study examining the effects of robot gender on human behavior found that participants were more likely to rate the robot of the opposite gender as more credible, trustworthy, and engaging [36].Thus, the effects of users and robot attributes, as well as gender-role stereotypes, are still open questions.

III. HARDWARE PLATFORM
First, we will introduce the hardware setup of our surveillance robot, which includes three components as shown in Figure 1: the mobile platform, the perception sensor-kit, and the computational platform.
• Mobile Platform: We deploy the Laikago (a dog-like legged robot) as our mobile platform for navigating in complex urban environments.Comparing to the wheeled robot, the legged robot has superiority on traversability and thus is more suitable for uneven and unstructured urban scenarios with stairs and bumps.• Perception Sensor-Kit: To effectively detect and track pedestrians, we mount four color cameras evenly in the horizontal plane of the robot.Each camera is equipped with a short focal lens with the horizontal FOV of 80 o , and thus a combination of four cameras can almost cover all directions around the robot.Moreover, for better spatial perception, we use a robosense 3D LiDAR with 16 channels to measure the social distance between pedestrians.The 3D LiDAR also serves the navigation applications in mapping, localization, and collision avoidance.• Computational Platform: Two on-board computers are mounted to process the aforementioned sensor data for different tasks.We use NVIDIA Jetson AGX Xavier as the vision computational module that supports a maximum of six lanes CSI cameras as the input and uses 512 CUDA cores to GPU-accelerate the processing of images captured by the cameras.Since other tasks like mapping and localization would mostly consume CPU resources, we also deploy an Intel NUC with Intel i5 8259U CPU.These two computers are connected by wired network, and the processed data is shared by Robotic Operating System (ROS).

IV. SOCIAL DISTANCING DETECTION
The tracking algorithm used in our system composes of object detection, bounding box prediction, feature extraction, and sparse feature matching.We use YOLO to detect pedestrians, and match sparse features with help of motion modeling algorithm F-RVO to update the traces of pedestrians.

A. F-RVO
Modeling pedestrians behavior in crowds from the front view is challenging, not only because of the non-linear varied motions (turning shoulder, side walking, back stepping, etc. [37]), but also due to the occlusions that front view may encounter.In this work, we use a velocity-obstacle based algorithm, F-RVO [4], to model the pedestrians motion.
In F-RVO, each pedestrian, p i , is represented by an 8dimensinal vector: Ψ t = [x, v, v pref , l, w], where x is the current position, v is the velocity, v pref is the preferred velocity that we assume people would prefer to walk along the front direction.l and w are the length and width of human's shoulder.For each frame τ , a half-plane constraint is used to determine the range parameter in F-RVO.Within the range, each pedestrian p i has an area of velocity obstacle V O τ pi|pj with respect to another neighbor pedestrian p j .The convex region of velocity obstacles considering all neighbors can then be computed as: where H i is the set of all neighbors of pedestrian p i .Out of the velocity obstacle area, the best velocity is chosen with the nearest distance to preferred velocity v pref : where v / ∈ F RV O τ pi .

B. DensePeds
The tracking algorithm, DensePeds, includes three components to track pedestrians: object detection, feature extraction, and feature matching, as shown in upside of Figure 2. In each time step, We firstly use YOLO to detect pedestrians and generate bounding boxes for them.These detected pedestrians make a set P .Then we use F-RVO to predict another set of bounding boxes around the pedestrians p i ∈ P .Given the bounding boxes computed in two adjacent time steps, we use DeepSort CNN [7] to extract binary feature vectors from the sub-images as determined by the bounding boxes.Then we perform matching over these sparse features to find in frame t + 1 the best matched pedestrians of frame t and assigned IDs to the pedestrians accordingly.In particular, the sparse features are matched in two steps.First, we find the most similar detected pedestrian of a predicted pedestrian using the cosine metric, i.e., where d(•, •) is the cosine metric, f (•) is the feature extraction function, p i is one pedestrian in the set P of all pedestrians in a frame, h j is one detected pedestrian in the set H i , which is the set of detected neighbors around the pedestrian p i .In the second step, we maximize the IoU overlap, i.e., the overlapped area between predicted boxes and original YOLO-detected boxes, where B pi and B hj are the bounding boxes around p i and h j respectively.Matching a set of detected pedestrians to a set of predicted pedestrians with maximum overlap eventually becomes a max weight matching problem over the matrix (i, j), which can be accelerated using the Hungarian algorithm [38].
According to the computed bounding boxes, we can roughly estimate the range and bearing information between the robot and pedestrians.To more accurately estimate the crowds, we reproject the bounding boxes to the LiDAR coordinate to query the depth of each pedestrian.The RANSAC algorithm [39] is applied to filter out the possible outlier points.If there is a large inconsistency between LiDAR and visual estimates due to the occupation between pedestrians, the visual estimates would be trusted.Finally, we obtain the social distance between pedestrians.In addition, we will describe the routing algorithm enabling a robot to effectively select a crowded region to approach in order to accomplish the patrol.

A. Mapping and Localization
Since LiDAR-based SLAM approaches have been well developed in recent years, we are not going to develop a new SLAM approach for this paper.To achieve the best performance for this legged robot, we choose the LeGO-LOAM algorithm, which is a light-weighted system and is optimized for the grounded platform [40].The generated map is shown in Figure 3a.
After the robot obtained the 3D point cloud map about the scenario, the Normal Distributions Transform (NDT) scan matching algorithm is used for localization [41], which have been demonstrated in [42] to be able to provide more reliable result than other matching methods such as Iterative Closest Points [43].
Although we can compute the 3D point cloud map and the robot's localization, it is not easy for the robot to determine the traversable region in the 2D plane.Therefore, we transform the 3D point cloud to the 2D laser scan, by taking the closest point within the certain height as a 2D laser point.Note that, during the navigation the robot may encounter uneven terrains like stairs or steps.Thus, the transform ignores the point cloud on the ground plane by filtering out the cloud points lower than 30 cm.After the transform, we obtain a 2D occupancy map for the following navigation algorithm as shown in Figure 3b.

B. Patrol and Routing
Based on the generated map and current position, we proposed a patrol planning algorithm to navigate robot traveling around the mapped area.As shown in Figure 3, in different crosses and corners, the robot would choose different navigation directions optimal for social distancing.In particular, the robot would prefer the direction where there is a high probability that a crowd would appear.
When the robot detected gathered crowds, it would suspend the patrol algorithm and switch to the routing algorithm to find an optimal way to approach the crowds.Considering the time constraints and size of crowds, we propose a crowd-aware routing algorithm based on the depth-first search method to find a sequence of intermediate waypoints for the robot to follow.
We formulate the routing problem as follows.Assume that there are N groups of people within the robot's perception range.Each crowd is denoted as a node n i , with its specific time-window constraint t i , and its relative location to the robot.Each crowd is assigned a weight w i according to the number of persons in the group.The routing algorithm aims at finding an optimal path for the robot to approach as many crowds as possible with the least energy consumption.The optimization objective is: where p c is the current trajectory which contains a set of points and edges denoting the positions of crowds and the paths connecting them.Each edge p j ∈ p c between two positions has the energy cost e j .The number of crowds explored in p c is denoted as n c .Given the directions and positions of the crowds after routing algorithm, we implement the SBPL lattice planner [44] to generate a smooth patrol route passing through these waypoints.

C. Learning-based Collision Avoidance
During patrol, the robot will not only encounter the static obstacles, but also interact with moving pedestrians.For this case, we deployed the learning-based collision avoidance approach, CrowdMove [3], for robotic navigation in crowds.
The main training framework refers to our previous work [30], which takes a 2D laser scan as the input and outputs the velocity command.The multiple training scenarios are designed with multiple robots in the Stage simulator as shown in Figure 4. We introduce the centralized learning, decentralized execution training paradigm, which shares the same navigation policy during the training.Then, we obtain a multi-robot collision avoidance policy with zero communication.Furthermore, we validate that the trained policy can be transferred from the simulation to the real world without any re-tuning, and it is also suitable for the single robot navigation in crowds [31], [32].To make the training framework work for our hardware platform, we take the transformed laser scan which represents the local traversable area as the input.

VI. VOICE INTERACTION
In our surveillance scenario, we use verbal cues to send suggestions from robot to human.As we mentioned before, the user's gender and the robot's gender may influence the user's acceptance and trust in the robot.Thus, to reach an effective surveillance result, we gave our robot four types of gendered voice and designed a user study to select the best one.In this section, we introduce the study for investigating (1) the user's gender-based effects of the autonomous robot and, (2) user's attitude, acceptance, trust, and perceived trust through robot with different voices.
A. Method 1) Gender of the robot: We manipulated the gender of the robot through non-verbal cues by changing the vocal characteristics.Because we aim to find the robot voice with best performance, the voice selection is not strictly limited to robot gender effects.In this experiment, we prepared four types of voices: three gendered voices and a child voice.The gendered voices include a computer-generated neutral voice, a male and a female recorded by real adult human, a child voice by a girl.
2) Procedures: Among the various issues in human-robot interaction, trust was nominated as one of the primary factors to be considered.In this particular task, trust is performed as how much a human would follow the advice sent by the surveillance robot.This factor would crucially influence the performance of the robot.To better measure the users' experience of the robot, we suggest four dependent measures which include the users' attitude towards the robot, perceived trust, and acceptance of the robot.The details of the measures are shown in Table I.
As part of a larger study investigating the users' perceptions in an autonomous surveillance robot, the participants filled out a survey measuring the factors shown in Tabel I.Each measure was assessed on a 5-point Likert scale (1 = strongly disagree, 5 = strongly agree).
This experiment was done in between-subject mode to minimize the learning and transfer across conditions.Each participant in this study viewed two videos and then responded to survey items related to the videos.Both of the videos demonstrate the same scenario with the same robot voice.One video was from a third-person perspective, where the robot is walking towards the crowds while asking them to keep the social distance.The other video was recorded from a firstperson perspective where the robot is walking towards the camera while asking the human to keep the social distance.For both scenarios, the robot starts to play the voice at about 5 meters away from the crowd.The screenshots of the two videos are shown in figure 5.In this way, a total of 8 videos were recorded, which are 2 perspectives times 4 types of robot voices.For each scenario, we add a description "The robot shown in the videos is a surveillance robot working on keeping a low density of humans during COVID-19.When the robot finds a crowd, he/she/it will walk toward the crowd while asking them to keep a proper social distance.Please watch these two videos, and imagine you were one of the humans in the video, then answer the following questions." 3) Participants: A total of 218 adults (119 males; 99 females) between 20-55 years old (M=29.49,SD=12.02)participated in the between-subject experiment.Participants were mostly students and staff from the Southern University of Science and Technology.The participants were recruited through the posters and links shared in a social media app.Each participant needs to read and sign a consent form before they start the questionnaire.

B. Data analyses
A manipulation check was performed to ensure that the robots could manifest gender and age successfully.The perceived gender was measured through a sliding bar with 0 the most femininity and 100 the most masculinity.The perceived age was measured through a sliding bar between 5 to 70.The one-way ANOVA showed that participants perceived male voice more masculine (M = 76.14),female voice more feminine (M = 49.25) an neutral voice in the middle (M = 64.28).The F and p value is F = 9.902 and p < 0.0001.The participants also significantly perceived robot with child's voice (M = 20.93)younger than others (M = 28.02,p = 0.008).
We calculated Cronbach's alpha values to assess the internal consistency of each psychometric measure.The reported alpha values were between 0.8-0.9,indicating that the items have relatively high internal consistency.To calculate the significance of user gender and robot voice type effect, a one-way ANOVA was conducted.The robot voice and user gender were treated as independent variables.For factors reached significant differences according to conditions, we   used the least significant difference (LSD) to make a pairwise comparison.

VII. EXPERIMENTS
In this section, we first validate the effectiveness of the proposed approach individually.Then, we integrate all the modules to realize the autonomous surveillance robot.To further investigate the performance of surveillance robot on promoting social distancing, we conduct some real-world experiments in the end.

A. Crowd Gathering Detection
We first record vision and LiDAR data to better analyze and tune the social distancing detection system.The recorded dataset includes a wide variety of pedestrian group behaviors, such as walking, standing, gathering, and scattering.
Crowd gathering is not easy to be well quantified, especially the occlusion between pedestrians makes the robot difficult or even impossible to accurately acquire the location of each pedestrian.To detect each possible crowd gathering, we establish a graph-based pedestrian network called social graph, with one example illustrated in Figure 6a.In the social graph, each node represents the pedestrian's position.The green, yellow and red edges represent the safe, warning, and dangerous social distance, respectively.We connect the nodes between red and yellow edges into a subgraph called the crowd graph, which is considered as possible crowd gathering.In this way, we can reduce the dependence of the crowd gathering detection on the accuracy of estimating pedestrian positions.

B. Navigation in Urban
The urban navigation would mainly encounter two challenges, the unstructured environments and the dynamic obstacles.Thanks to the superior mobility of the quadruped platform, our robot can navigate over uneven terrains such as steps without extra visual estimation effort as shown in Figure 7 and thus can handle unstructured environments easily.
To validate the dynamic collision avoidance performance among pedestrians, we create a crowded and narrow indoor scenario in the lab, as shown in Figure 8.In this experiment, the robot is required to perform tasks of tracking a specified target (a bone in this work).We install in the lab several ultrawide band (UWB) tags accounting for indoor localization.During the experiments each lasting about 30 minutes, the robot dog mounted with a 3D LiDAR can achieve nearly zero collision in this scenario.This experiment indicates that our learning-based collision avoidance policy can be successfully transferred and deployed to the real-world robotic dog.

C. Voice Preference
Table II shows the means and standard deviations of all measures according to different robot voice types and user  genders.The score of each factor was calculated by averaging both/all the related items.It can be seen that the male voice type got the lowest average score on all the measures.Also, female users marked the highest on all the measures for neutral voice type.Table III demonstrates the F and p-value.The result shows that for male users, there is no significant result among all different robot voice types while for female users, the robot voice type significantly influences the user's acceptance (p = 0.021) and perceived trust (p = 0.092).To find which condition differs for female users, We used the least significant difference (LSD) to make a pairwise comparison between different robot voice types.Surprisingly, For female users, the acceptance, perceived trust, and attitude toward robot in neutral robot voice condition are higher than other robot voice condition, especially for male voice condition (p = 0.003, 0.013, 0.061 respectively).

Out of the FOV
We also compare the effects of users' genders in different conditions.It is found that female user has higher perceived ability than male users (p = 0.024), especially in male voice condition (p = 0.027).In the neutral voice condition, female's acceptance and attitude toward robots are significantly higher than male's (p = 0.028 and p = 0.057 respectively).
There is no significant difference for male users markings according to different robot voice types.However, it is quite surprising that the female users marked very high for the neutral robot voice.In addition, surveillance should be a masculine job but both male and female users marked all four factors the lowest in male voice condition.Therefore, we do not suggest the usage of the male robot voice.Considering the means of the four factors among all robot voice conditions, we selected the neutral voice for our surveillance robot.

D. Real-world experiment on Promoting Social Distancing
Finally, we integrate all the above modules together, and investigate whether the robot can navigate in the complex urban environments with satisfiable social distancing effectiveness without terrifying general citizens.The real-world experiment was conducted in two public areas including a university campus and a park. Figure 9 shows some examples from the real world experiment.
The result shows that our robot successfully fulfills the task of social distancing.For people who have been interacted with our robot, about half of them followed the robot suggestions.For the other people, most of them glanced at the robot and then just walked away, some of them stopped and looked at the robot.It's worth to notice that at the time of our experiment, there were no existing COVID-19 patients in the testing city, which tends to reduce the pedestrian's compliance with the verbal social distancing commands.During the experiment, we selected some people randomly, then asked them about their attitude towards the robot and why they followed/didn't follow the robot's advice.Some people reported that they felt it is a great idea to use the surveillance robot and they thought the robot's advice is reasonable.Besides, many people reported the robot looks like it came from the world of science fiction so they were very curious about the robot.However, some people felt the robot is not friendly enough so they just wanted to walk away.For the people who ignored the robot's advice, most of them said that the pandemic is not severe so they felt it's unnecessary to keep the distance.

VIII. CONCLUSION
In the context of the COVID-19 pandemic, we develop the autonomous surveillance robot system to promote social distancing.The robot system is mainly composed of social distance detection, urban navigation, and intelligent voice interaction.The legged robot shows good adaptation to different terrain so that they can work well in human life scenarios.The real-world experiment also demonstrates our robot successfully keeps human's social distance.In this end, we successfully deploy the system in a real environment to prevent the spread of COVID-19.

Fig. 1 :
Fig.1: Overview of our hardware and software system.
Fig. 3: (a) We used LeGO-LOAM for mapping.The blue arrows in (b) shows the potential directions in each cross and corner that robot can go.Based on the information of the map and current position, the patrol algorithm chooses for the robot a navigation direction with the maximum probability of crowd appearance.

Fig. 4 :
Fig. 4: Multi-robot multi-scenario training environments in the Stage simulator

Fig. 5 :
Fig. 5: The screenshots of two videos in the questionnaire.Top: thirdperson perspective; bottom: first-person perspective.

FOVFig. 6 :
Fig. 6: Illustration of the crowd gathering detection within the right camera's field of view (FOV).Although the estimated position of pedestrians is not very accurate, we can still detect possible crowd gatherings by establishing the crowd subgraph.

Fig. 9 :
Fig. 9: Examples from the real-world experiment.The top and bottom images describes two different scenarios.Left: The robot detected and approached the crowds, then persuaded them to keep social distance.Right: The crowds density decreased.

TABLE I :
Dependent measures in the user questionnaire the surveillance robot, if he/she gave me advice.I trust that the surveillance robot can keep me away from health risks.I would follow the advice that the surveillance robot gives me.If given a chance, I'll use this robot in a college campus in the near future.If given a chance, I'll use this robot in a park in the near future.If given a chance, I'll use this robot in a shopping mall in the near future.

TABLE II :
Means and standard deviations of all the measures

TABLE III :
F value and significance of robot voice effect and user gender effect