An Open-Source Low-Cost Mobile Robot System with an RGB-D Camera and Efficient Real-Time Navigation Algorithm

Currently, mobile robots are developing rapidly and are finding numerous applications in the industry. However, several problems remain related to their practical use, such as the need for expensive hardware and high power consumption levels. In this study, we build a low-cost indoor mobile robot platform that does not include a LiDAR or a GPU. Then, we design an autonomous navigation architecture that guarantees real-time performance on our platform with an RGB-D camera and a low-end off-the-shelf single board computer. The overall system includes SLAM, global path planning, ground segmentation, and motion planning. The proposed ground segmentation approach extracts a traversability map from raw depth images for the safe driving of low-body mobile robots. We apply both rule-based and learning-based navigation policies using the traversability map. Running sensor data processing and other autonomous driving components simultaneously, our navigation policies perform rapidly at a refresh rate of 18 Hz for control command, whereas other systems have slower refresh rates. Our methods show better performances than current state-of-the-art navigation approaches within limited computation resources as shown in 3D simulation tests. In addition, we demonstrate the applicability of our mobile robot system through successful autonomous driving in an indoor environment.


I. INTRODUCTION
Recently, mobile robots have been navigating cluttered environments such as buildings and roads. Implementation of these devices into various industrial fields has been accelerating [1]. Depending on the design and purpose, they are utilized in various areas, such as for delivery, guidance, searches, and inspections. Therefore, robot navigation in crowded environments has been studied as a key topic in many research fields. Furthermore, the demand for mobile robots is increasing not only in industrial fields but also for individual uses. Examples include social robots, home service robots, and cleaning robots.
However, expensive hardware and high power consumption are hindering the practical application of mobile robots [2]. For safe driving, the ability to recognize traversal areas and to detect obstacles is critical in an advanced motion * These authors contributed equally 1   planning strategy. LiDARs have been used as a dominant sensor to ensure accurate distance measurements and have been combined with cameras for deep learning recognition. However LiDARs are far more expensive than other sensors and thus increase the price of the robot. Simultaneous localization and mapping (SLAM) [3], mainly used for indoor positioning, requires a high-performance computer. Meanwhile, with the rapid development of deep reinforcement learning (DRL), numerous studies have focused on the use of neural networks for robot navigation [4]- [6]. Although the inference of a DRL model can be done in a short time, environment recognition requires heavy iterations for sensor data processing. This should be supported by a high performance computer, and the process ends up draining the battery more rapidly. Furthermore, software components for environmental interactions such as image-based object detection should be followed by navigation components to serve a socially interactive robot. These image-based recognition algorithms rely highly on the graphics processing unit (GPU). Therefore, navigation algorithms with low GPU utilization are greatly welcomed in mobile robot systems, even if they have an on-board GPU.
We build an open-source low-cost autonomous mobile robot system without a need for a high-performance GPU or LiDARs that successfully overcomes the aforementioned problems. We also propose a real-time navigation approach designed for low-cost indoor mobile robots. Only an RGB-D camera is used for environment recognition, and real-time performance is achieved on a low-end single-board computer (SBC) without external computing aids. The robot can build a point cloud map and perform real-time positioning by means of lightweight RGB-D SLAM. We use a modified A* algorithm that generates a stable path while maintaining sufficient distances from adjacent obstacles. In addition, we propose a ground segmentation approach that provides a compact traversability map in real time using an RGB-D camera. This approach enables the robot to navigate among pedestrians safely. We demonstrate the feasibility of our ground segmentation method using both rule-based and learning-based navigation policies with the traversability information. All of the software for fully autonomous driving is integrated on our mobile robot platform, DPoom (see Fig. 1). For human-computer interactions, friendly expressions are displayed on the front screen. It also has an appropriate exterior design for educational and socially interactive purposes. We deployed DPoom as a social robot in a crowded residential environment. All of our materials, including the hardware and software, are released under an open-source license. 1

II. RELATED WORK
System design of a robot is important, as the design determines its purpose, function and price. Most traveling mobile robots are designed for mission automation. For full automation, several essential functions should be realized synchronously with cross-interaction capabilities. In modern autonomous driving, the task is developed with separately divided modules that are integrated in a pipeline. Localization is the most basic module for all control tasks with closedloop feedback. SLAM is generally used for indoor robot localization. In order to drive to a certain location in a wide area, the robot should generate a trajectory through global path planning via, for instance, the A* algorithm [7]. When obstacles not on the prior map or moving objects appear on the planned trajectory, the robot avoids them by motion planning.
Collision avoidance and safe navigation are particularly important for stable robot operation [8]. Reciprocal velocity obstacles [9] and optimal reciprocal collision avoidance (ORCA) [10] have been commonly used for dynamic robot navigation. However, given that they are based on handcrafted functions, they do not work well in more complex environments. Recent works applied DRL to navigation in crowds [4], [5], [11]. These approaches assume that the robot is aware of objects in a 360 • field of view (FOV) and that it accurately measures the positions of objects in real time with LiDAR. Unlike those assumptions, point cloud processing is computationally expensive and lowers the decision frequency of navigation algorithm when running on an onboard computer. A slow decision often causes frozen robot situations or even collisions. Therefore, it is necessary to choose a navigation policy that guarantees realtime execution according to the robot's computational performance. Furthermore, if procuring a 360 • FOV considering the assumptions above, the price of the required sensor increases greatly. This also places negative constraints on the mechanical design and on the design of the robot body's exterior components. Meanwhile, successful navigation is coupled with the ability to estimate traversable areas, not merely depending on the navigation policy. Recent ground segmentation methods based on a convolutional neural network (CNN) incur a high computational cost [12], [13].
In this paper, we use an RGB-D camera, which is generally much less expensive than a 3D LiDAR. The depth data provide robustness for localization and the direct distances to obstacles without estimating them with heavy algorithms. The robot body design was convenient because the sensor does not need to be mounted on top and an empty layer is not required in the middle of the body for laser range scanning. Real-time navigation is possible using our RGB-D ground segmentation approach.

A. DPoom indoor mobile robot platform
DPoom (see Fig. 1) is an open-source indoor autonomous mobile robot designed to interact with people while traveling around indoor environments. It was developed while focusing on three factors: cost performance, human-robot interaction, and ease of use.
DPoom is built for fully autonomous driving using only a low-end SBC (LattePanda Alpha 864, LattePanda) and an RGB-D camera (Realsense D435i, Intel). The low-end SBC consists of an Intel dual-core m3-8100y processor, 8 GB RAM and Intel HD 615 on-board graphics. The controller board (OpenCR, ROBOTIS) for our system is is embedded with a robot operating system (ROS) [14] and has a nineaxis IMU Sensor MPU9250. The robot uses the front RGB-D camera D435i with a 1280×720 resolution to recognize the environment. The camera has 85.2 • × 58 • FOV. Two actuators (Dynamixel XM430-W210-T, ROBOTIS) are used for wheel driving, and two ball casters at the bottom of the rear structure support differential driving. The software interface is built on Ubuntu-based ROS Kinetic. The hardware is 33.0×33.5×35.0 mm (width×depth×height), weighs approximately 4 kg, and can achieve speeds up to 0.26 m/s.

B. 3D SLAM
Mapping should be preceded before deploying a robot to an unknown area. Mapping enables the robot to plan its trajectory to the goal and to perform localization to estimate its pose. 3D mapping using RGB-D data is known to be capable of higher accuracy than monocular vision-based mapping as it provides additional robust features for scan matching. We used RTAB-Map [15] to perform mapping and localization simultaneously with RGB-D data and wheel odometry. RGB image frames and depth frames were obtained from the Intel Realsense D435i with Intel Realsense SDK 2.0. Wheel-based odometry was calculated in the OpenCR controller using data from the built-in IMU sensor and motor encoders. Environments are represented as a point cloud map or a grayscale occupancy grid map after SLAM. Fig. 2a presents the result when mapping around a building hallway. Point cloud maps are saved in local memory and are used for scan matching during localization. The projected occupancy grid map can be used as prior knowledge of global path planning. The localization results are synchronously published to the ROS middleware in our system.

C. Global path planning
Under a 2D environment, the deterministic planning can generate more accurate paths in less time than a probabilistic path planner [16]. In this study, we use the A* algorithm [7], a deterministic path planning algorithm, as the global path planner. The A* algorithm searches for the path by adding information about the goal node to the Dijkstra algorithm [17], which is the most basic path planner.
By introducing the distance cost d(n) to the cost calculation of the A* algorithm as shown in Equation 1, it is possible to generate a path with more stability, maintaining a proper distance from obstacles [18]. We used the fast marching method (FMM) to calculate the distance cost, as this approach solves the boundary value problem of the Eikonal equation [19]. The distance cost is designed to have a larger value as the node becomes closer to nearby obstacles. Hence, the modified A* algorithm generates a smoothed path with a tendency to keep a distance from nearby obstacles.

D. Fast raw depth image ground segmentation
When a low floor mobile robot follows a globally planned path, it is necessary not only to detect obstacles for collision avoidance but also to recognize whether the floor surface is traversable. Along with the rapid growth in the field of deep learning, research on ground traversability estimation with RGB-D cameras has been also conducted [20]. Yang et al. demonstrated the robustness of CNN-based models [12]. Paigwar et al. presented a real-time preprocessing method and a CNN model for application to robot navigation [13]. However, these methods require GPUs for real-time operation. In addition, deep-learning-based approaches can take into account floor thresholds, but they are unable to adjust the height of ground threshold on a deployed model depending on the situation. Floor thresholds are disastrous to low floor robots, as they can cause a malfunction or cause the robot to overturn. Therefore, there is a need for an analytic algorithm capable of adjusting the threshold height according to the robot platform and driving situation.
Mathematically derived estimation algorithms have been developed at the same time. For a mobile robot system designed at a low price point, the algorithm must be computationally efficient and must support real-time implementation. Holz et al. showed that real-time plane segmentation is possible on a CPU by clustering and merging points in normal space [21]. However, this requires an additional analysis to determine whether each plane area is actually drivable when considering physical constraints such as the vehicle's width and rotation angle.
Here, we propose a concept known as milestone over rendered paths (MORP), a real-time ground segmentation algorithm that can robustly recognize the forward traversal area with an RGB-D camera and that is designed to avoid obstacles effectively. With this algorithm, motion planning can be solved with a very small amount of computation by separating the area in front of the robot into virtual lanes and using the information of the closest non-traversable point recognized in each lane. Ground segmentation is performed for the center path of each lane area, and the first encountered non-traversable point is saved as a dead-end. This procedure is similar to 2D line extraction for fast segmentation of 3D point clouds [22]. A raw depth image is used for ground segmentation, which is converted from a depth point cloud into a 2D gray-scale image. Holz et al. showed that considering the pixel neighborhoods instead of spatial neighborhoods leads to a significant increase in the point cloud processing speed at the cost a small degree of accuracy [23]. It is possible to represent a large lane area by performing single column segmentation in a raw depth image. The score of each pixel is a weighted sum of the gradient from both the assigned start pixel depth and the adjacent pixel depth relative to the current pixel depth. Let (w 1 , w 2 ) denote the weighting factors of the gradients. The first pixel exceeding the score limit will be the dead-end in that area. Let (d (i,j),r , d (i,j),z ) denote the actual position from the sensor of pixel P (i,j) in the N × M size depth image, as shown in Fig. 3. Let n denote the number of virtual lanes to scan. The dead-ends D(n) consist of the set of the largest indices to be segmented as the ground in each column j: where s j is the start index of the ground on column j, and C is the empirically determined threshold to segment as the ground. When this process is done sparsely over the entire image, milestones are generated for a traversability map containing robust but very compact data. It is easy to implement parallel computing because the process is explicit and the execution time is virtually consistent on each lane. The implementation results are visualized in Fig. 3. More dense segmentation will be done on the front area of the robot as n becomes larger. However, there is a trade-off relationship between the density and the computation cost. An appropriate n should be selected in consideration of the width and the driving performance of the robot.

E. Robot navigation based on traversable areas
Given that each dead-end in D(n) contains direct information about non-traversable areas, collision avoidance is possible just with simple rule-based decisions. Avoidance in this case works in the same manner as a vehicle lane change on a road. On the other hand, this type of data pre-processing to obtain compact and representative features D(n) can be used as the observation space (o) for reinforcement learning. One powerful and sample-efficient approach of reinforcement learning is imitation learning. Behavior cloning (BC) in particular has been successfully used in many robotics applications given its simplicity and efficiency [24]. In this work, we also implement a neural network policy based on BC to verify the feasibility of our lightweight ground segmentation method aside from rule-based navigation policy. Both policies are evaluated in Section IV.
We used a feedforward fully-connected neural network for imitation learning. Here, we denote the robot position relative to the goal point as x and y, and denote the robot direction as θ. The robot state is defined as s = [x, y, θ]. The action command is defined as a = [v x , ω z ], representing the longitudinal velocity and angular velocity of the robot, respectively. Then, we can consider a policyπ that takes x t = [s t , o t , a t ] at time t as input and results in the next action a t+1 as output. If the policy is given only a single time step information, capturing high-level intentions from the human demonstrations may be ambiguous. Therefore, we provide the state, observation and action history here as the input of the policy. In this case, the observation history contains implicit information about the velocities of the moving obstacles. This approach has been introduced in relation to applications of helicopters, autonomous vehicles, and quadruped legged robots [26]- [28]. The length of the history is denoted as H. The overall structure of our neural network policy a t+1 =π(x t−H+1 , x t−H+2 , . . . x t ) is shown in Table I.
Finally, all modules are synchronously integrated into the autonomous driving system. In order to navigate to the goal in a complex environment, the ideal trajectory should be generated from global path planning as part of the proposed method. The orientation toward the way point should be updated while avoiding obstacles via localization. A realtime traversability analysis is required to avoid local obstacles successfully. The overall system architecture is shown in Fig. 4.

A. 3D SLAM
We tested our SLAM in a residential lobby on the DPoom platform. The 3D point cloud map was saved in local memory for localization. Fig. 5a depicts the obtained occupancy grid map. We added artificial grass and tables as confined areas on the map. Compared with Fig. 5b, the map shows that a loop closure was performed to correct distortion and elevation issues.

B. Global path planning
The modified A* algorithm uses a binarized 2D occupancy grid map. It is also used to obtain a distance cost map by FMM. The modified A* algorithm generates a path by calculating the fitness cost based on the binarized map and the distance cost map.
We compared the generated paths from the original A* algorithm and the modified method. These results are shown in Fig. 6. The original A* algorithm generates a path close to obstacles, whereas the modified method generates a smoothed path with keeping distances from the adjacent obstacles.  given trajectory is too long, it will cause overfitting. In contrast, if the history is too short, it becomes difficult for the model to find the optimal policy. We collected a human demonstration dataset in a simulated environment that included static and dynamic obstacles. We used Gazebo [29] simulation, and we implemented the hardware and driving characteristics of DPoom in the simulation. The RGB-D camera specifications are described on the Gazebo plugin for a realistic simulation. The demonstration data were collected in environments containing [0, 1, . . . 5] moving obstacles and [0, 3, 5] static obstacles. 2 Moving obstacles used the DPoom 3D model as well and were controlled by ORCA [10]. We trained the neural network policy using the dataset with different H values. We used the mean squared error (MSE) loss and Adam optimization fotraining. The results are shown in Fig. 7, demonstrating the fastest convergence speed and lowest test loss when H = 4 (in red).

C. Training neural network navigation policy
For further training, we used the DAgger [30] method, which is a basic on-policy approach of imitation learning [24]. During the training procedure, we leveraged the previously aggregated dataset as the initial dataset as a warm start. Moreover, we used the pre-trained policy network as the initial policy for bootstrapping rather than using a randomly initialized policy. This strategy can also be found in recent studies of imitation learning and its effectiveness has been proven [31], [32]. The best policy model during training is saved and used in the experiments. 2 The dataset is available in: https://github.com/ SeunghyunLim/Dpoom_gazebo D. Fast raw depth image ground segmentation and robot navigation In this section, we denote 'MORP-RB' as our rule-based navigation policy and denote 'MORP-IL' as our neural network policyπ trained in the imitation learning manner, coupled with ground segmentation.
1) Training existing policies: We implemented several existing state-of-the-art navigation methods in Gazebo for a comparison: ORCA [10], CADRL [4], and SARL with Local Map [6]. The obstacles were detected with the RGB-D camera and fed into the policies. The motion commands of the policy were published to the integrated system via ROS to actuate the motors. The DRL policies designed for navigation in dynamic environments, in this case CADRL and SARL, were trained using parameters suggested by the authors [4], [6]. The limited FOV and depth range of the RGB-D camera were applied to the observable area of the agents. In cases where no object was detected by the DRL agents, we fed a dummy pedestrian with a zero velocity and radius into the network, which did not affect the navigation [1]. ORCA served as the policies of the moving obstacles.
2) Gazebo simulation results: The point cloud of the RGB-D camera was downsampled by voxel grid filtering and detected obstacles were fed into the ORCA and DRL policies as observed inputs. We evaluated the runtime of each navigation method by measuring the one-step time to return a motion command. We specified the runtime as three parts: the communication delay on ROS, the depth data pre-processing time (abbreviated as "Depth."), and the decision delay of the navigation policy (abbreviated as "Nav."). It was tested on a laptop with an Intel Core i7-8565U CPU. The results are the average of more than 100 iterations. Measurements of our methods showed that it is up to 20 times faster than the other methods compared here (see Table II). Unlike existing methods that require CPU and memory-intensive point cloud preprocessing, our approaches use an exteroception optimized for indoor mobile robots, greatly reducing the processing time. The compact information after ground segmentation also reduces the complexity of motion planning. We compared the following metrics for a performance evaluation: the success rate, collision rate, and average time to reach the goal. Tests were conducted separately in two different randomized environments with and without static obstacles. Goal distances from our robot were sampled from [7,11] m. The first environment had only pedestrians and no static obstacles. Moving obstacles used DPoom 3D models of identical sizes. Their start and goal points were empirically determined to avoid collisions with each other. There were one to five moving obstacles that were visible to all agents, but they were not able to perceive our robot (see Fig. 8a).
In the second environment, a unit cube and cylinder-shaped obstacles each with a 0.5 m radius were added to the first environment, and the number of obstacles was varied from one to nine (see Fig. 9a). We modeled ten worlds for each case and ran the test three times per world. Accordingly, 60 tests were conducted for each navigation method in total. MORP-IL shows the highest success rate while retaining a short time to reach the goal in both environments (see Tables III). The collision rate of CADRL is lower than that by our method with static obstacles because it tends to take large detours, causing it to spend twice as much time compared to the others. Apart from CADRL, our methods also show the lowest collision rate because it guarantees real- time execution. The collision rate for ORCA should be zero in an ideal 2D simulation with holonomic constraints by its design, but collisions occurred due to slow decisions by the robot and were occasionally caused by pedestrians outside of the robot's FOV. A slow refresh rate also has a disadvantage when the robot has non-holonomic constraints because the robot is unable immediately to rotate or move backwards. ORCA assumes that all pedestrians are observing the robot and avoiding it actively regardless of their FOV, which is not practical in the real world. This assumption causes the ORCA agents to take less time to reach the goal in the first environment and causes collisions with pedestrians who cannot observe the robot. Moreover, SARL has the longest one-step runtime due to its complex model architecture (see Table II). Akin to ORCA, slow decisions of SARL resulted in a high collision rate.
3) Real-world experiments: Our navigation method was integrated into an autonomous driving system on our DPoom platform via ROS. We deployed the robot in the DGIST student dormitory lobby. For human interaction, tiny-YOLOv3 [33] was used for object detection. The robot was able to estimate its pose by localization and navigate to the desired locations in the wide, crowded environment without collisions. Face emotions were displayed on the front screen depending on the situation. Our robot was able to interact in a friendly manner with people as a social robot (see Fig. 10).

V. CONCLUSION
In this paper, we built an open-source low-cost mobile robot platform with a single RGB-D camera. In addition, we designed a software architecture of fully autonomous navigation system for a low-cost mobile robot without Li-DARs or high-end computers. For global path planning, we developed the modified A* algorithm, applying FMM to generate collision-free trajectories. For motion planning, we proposed a new RGB-D ground segmentation method that represents the traversability of the front area in the form of compact information, which is well-suited for mobile robots. This enables depth data processing in real time on a low-end SBC. We combined this idea with both rulebased and learning-based motion planners, validating that our methods can successfully navigate in crowded environments. Unlike current state-of-the-art DRL navigation approaches were slowing down while executed simultaneously with other autonomous driving functions, our approaches operated at 18Hz in real time. We also demonstrated that our methods had lower collision rates and higher success rates in a 3D simulation compared to the other methods. Finally, we deployed our autonomous driving system on our platform in a real-world residential lobby, proving the applicability of the proposed system. We tackled practical issues associated with current mobile robots and contributed to the universal use of this technology through the presentation of a price-efficient mobile robot.