Event Camera Based Real-Time Detection and Tracking of Indoor Ground Robots

This paper presents a real-time method to detect and track multiple mobile ground robots using event cameras. The method uses density-based spatial clustering of applications with noise (DBSCAN) to detect the robots and a single k-dimensional ($k - d$) tree to accurately keep track of them as they move in an indoor arena. Robust detections and tracks are maintained in the face of event camera noise and lack of events (due to robots moving slowly or stopping). An off-the-shelf RGB camera-based tracking system was used to provide ground truth. Experiments including up to 4 robots are performed to study the effect of i) varying DBSCAN parameters, ii) the event accumulation time, iii) the number of robots in the arena, iv) the speed of the robots, and v) variation in ambient light conditions on the detection and tracking performance. The experimental results showed 100% detection and tracking fidelity in the face of event camera noise and robots stopping for tests involving up to 3 robots (and upwards of 93% for 4 robots). When the lighting conditions were varied, a graceful degradation in detection and tracking fidelity was observed.


INTRODUCTION
The commercial availability of dynamic vision sensor (DVS) based cameras, also known as event cameras, has provided researchers and practitioners with an attractive modality for high-speed computer vision applications. By recording a change in light intensities asynchronously, event-based cameras offer several advantages over their frame-based camera counterparts to provide high speed vision, low perception latency, and relatively low power requirements [1].The earliest event-based systems were designed in 1986 at Caltech [2] and have recently been the focus of significant commercial development by companies such as Prophesee, iniVation, Samsung, Insightness [3]- [6].
Multiple ground robots based cooperative system are widely used in indoor applications such as warehouse automation, surveillance and security, and payload transportation [7]- [9]. The task of detecting robots in indoor environments, estimating their pose with respect to the environment, and tracking their motion in real-time is crucial to developing of autonomous systems [10]- [12]. Developments in this field have been made using a combination of traditional frame-based cameras and inertial measurements of the robot [10]- [13]. Tracking systems with the traditional modality of frame-based cameras suffer from frame rate limitations and have trouble with motion blur and dynamic range. Another tracking modality is an infrared camerabased motion capture systems such as OptiTrack and Vicon [14], [15]. While these infrared systems provide fast and high-quality tracking data, they come at a significant pricetag, require multiple infrared cameras, are marker-based, and need powerful computing infrastructure for data acqui- The authors are with the Robotics and Data Laboratory (RADLab) and Departments of Electrical and Computer Engineering and Computer Science, New Jersey Institute of Technology, Newark, NJ. E-mail: pva23@njit.edu sition and processing. By contrast, an event-camera negates the need to read an entire frame of image data since it only provides intensity information (positive/negative polarity) at a given location at any time, does not require markers on objects of interest, and can be operated at relatively lower computational complexity and power [16]. Stationary event cameras do not require background modeling as moving objects can be easily detected against a static background. This significantly improves data throughput and processing, making event cameras beneficial for robotics applications [16]. The benefits mentioned above form our primary motivation for developing a real-time detection and tracking method using an event-based camera for indoor robot operations. The event camera used in this study featured the Prophesee Gen3S VGA-CD dynamic vision sensor.
The method uses density-based spatial clustering of applications with noise (DBSCAN) to detect the robots and a single k-dimensional (k − d) tree to accurately keep track of them as they move in an indoor arena [17], [18]. DBSCAN is a powerful and popular clustering algorithm first proposed in 1996 and has found use in a plethora of data-driven applications [17]. The main idea is that a neighborhood of a cluster of points should have a minimum number of points in a given radius. A point is considered as part of a cluster as long as it has a minimum number of points (minP ts) in its neighborhood of radius eps. DBSCAN is an effective clustering algorithm, especially when the clusters are arbitrarily shaped and noisy [19]. In recent years, there have been debates about its effectiveness in cases with 3 or higher dimensions, and competing methods have been proposed [20]. However, the original authors of DBSCAN have shown that with effective indexes and reasonably chosen parameter values, DBSCAN performs competitively in higher dimensions [21].
The k − d tree is a multi-dimensional binary search tree used widely to store information that is retrieved Fig. 1 Top: The Prophesee Gen3S VGA-CD event camera was mounted on the ceiling facing downward. A Logitech webcam was mounted next to the event camera to provide ground truth data. Bottom left: Multi-robot mission (top view) with 4 ground robots. The robots moved in a 183 cm × 183 cm arena on white-colored foam mats. Bottom right: Event data visualization for 4 ground robots and annotations representing the detection and tracking results. by associative searches [18]. This data structure is useful in several applications involving multi-dimensional search keys (e.g., range searches and nearest neighbor searches). k − d trees for tracking applications have been explored in several studies [22]- [26]. While clustering approaches have been used in conjunction with tree type data structure in applications such as 3D SLAM and mapping, to the best of our knowledge this paper presents the first implementation of DBSCAN and k − d tree to detect and track multiple indoor mobile robots for event cameras [27]- [34].
An off-the-shelf RGB camera-based tracking system provides ground truth (with human corrections when necessary). The experiments featured two-treaded, differentially driven ground robots with an accelerometer, gyroscope, magnetometer, and encoder sensors onboard. The event camera used in this study was affixed to a stationary mount on the ceiling to provide a fixed frame of reference. When an event camera moves, the background suffers from clutter, making it difficult to distinguish the object of interest [35].
A limitation of an event camera is its inability to detect a stationary object as no new events are generated. Another challenge of working with event cameras is the amount of background noise they can generate. In a multi-robot system, both these issues can cause spurious detections and loss of real-time tracks [36]. The k − d tree-based tracking method presented here maintained robust tracks in the face of event camera noise and lack of events (due to robots moving slowly or stopping). This tracking method is named IDTrack.
This study's main contributions are as follows: 1) A density-based spatial clustering of applications with noise (DBSCAN) based real-time method to detect multiple ground robots operating in an indoor environment using a stationary event camera.
2) A single k-dimensional (k − d) tree-based robust tracking technique called IDTrack to ensure that robot IDs are not lost/mislabeled during their operations due to background noise or lack of robot motion. 3) Experiments including up to 4 robots to study the effect of i) varying DBSCAN parameters, ii) the event accumulation time, iii) the number of robots in the arena, and iv) the speed of the robots on the detection and tracking performance. The performance was evaluated using precision, recall, Mean Absolute Error, and Multi-Object Tracking Accuracy metrics [37]. 4) Event-camera data, ground truth data, and key Python functionalities have been open-sourced for the benefit of the community [38].
The remainder of this paper is organized as follows. Section 2 discusses existing literature on the use of event cameras for robotic systems. In Section 3, the hardware and software architecture used in the experimentation is described. Section 4 provides a detailed discussion of the detection and tracking method developed in this study. Section 5 covers in-depth the results of the experiments. Section 6 concludes the paper and provides some future directions.

RELATED WORK
The use of event-based cameras continues to grow across a plethora of applications. Event-based cameras have been used in object/pedestrian tracking, surveillance and monitoring, and object/gesture recognition [39]- [45]. They have also been shown to be beneficial for depth estimation, structured light 3D scanning, visual odometery, optical flow estimation, HDR image reconstruction, and Simultaneous Localization and Mapping (SLAM) [46]- [56]. For aerial robotics, safe navigation has been accomplished using the low perception latency afforded by event cameras [57], [58]. Readers interested in gaining an exhaustive understanding of event cameras are referred to the comprehensive survey paper [16] and its references.

Robotic systems with event cameras
Event camera-based algorithms for single or multiple object detection, pose estimation, and tracking (MOT) can be classified into three categories: feature-based, artificial neural network-based, and time surface-based [35]. Studies focusing on robot pose estimation using event cameras have been reported in the literature [59]- [61]. In [59], the authors validate a method to estimate the 6-DOF pose of a iniVation Dynamic and Active-pixel Vision Sensor DAVIS346 event camera given a photometric 3D map of the scene and improve upon the results of a similar study [60]. In [61], a DAVIS camera-based approach was successfully evaluated for tracking a quadrotor motion performing highspeed maneuvers like flips with rotational speeds up to 1200 • /sec. In [62], the authors validated an event-based iterative closest point (EICP) algorithm to estimate pose and track microgripper position at a frequency of 4 kHz using a DVS camera.

Event camera-based robotic systems control
One of the earliest applications of event cameras for feedback controls was the pencil balancing platform presented in [63]. Since then, several studies have shown how event cameras can be used for control of unmanned aerial systems, ground robots, robotic arms, and industrial robotic platforms [57], [58], [63]- [67]. In the field of unmanned aerial systems, event cameras are a viable solution for feedback control and optical flow [57], [58], [65], [68]. In [58], the authors showcase an event-based feedback control for a quadrotor. The approach was evaluated on a dual copter platform for one-dimensional attitude control. In [57], the authors proposed dynamic obstacle avoidance for quadrotors using an event camera. The approach was evaluated in outdoor experiments where the quadrotor was capable of avoiding the obstacles moving at relative speeds up to 10 meters/second. In [68], a DAVIS camera-based approach was successfully evaluated for tracking a quadrotor in motion performing high-speed maneuvers like flips with rotational speeds up to 1200 • /sec. In [65], the authors compared nine optical flow algorithms that used events generated from a dynamic vision sensor using event cameras. The study highlighted the problems faced by standard optical flow algorithms such as Lucas-Kanade and local plane fit due to the noise in the event data stream and motion discontinuities. Event-based cameras in industrial robotics have recently become an active field of research [66]. In [66], the authors compared the performance of a 2-axis servocontrolled robot based on the data acquisition and motion tracking from an event camera and a frame-based camera. The results showed that the robotic arm using the eventbased camera could follow the object using image recognition while achieving up to 85% percent data reduction and providing an average of 99ms faster position detection than the frame-based camera.

Multi-object detection and tracking using event cameras
A relatively small yet growing body of work underscores the value of event-based cameras for multi-object detection and tracking [69]- [73].
In [69], the authors validated an approach for monitoring intruders using a DAVIS 346 DVS attached to a DJI Flamewheel F550 hexarotor. The approach included techniques to differentiate moving objects from the static objects in the moving background and had low computational cost. The scheme was implemented in the robot operating system (ROS) and validated using an unmanned aerial system for experiments performed in complex and unstructured scenarios during day and night. In [70], the authors presented an approach for long-term tracking of objects using event cameras, even if the detected object left the scene and reappeared later. This method used an event-based local sliding window technique that performed reliably in scenes with a cluttered and textured background. In [71], the authors presented an approach for moving object detection and tracking using event cameras, which used information about the dynamic component of the event stream. The 3D geometry of the event stream was approximated with a parametric model to motion-compensate for the camera.
Moving objects that did not conform to the model were detected in an iterative process. In [72], the authors proposed an approach for object tracking that leveraged both frame-based and event-based camera sensors. The tracking algorithm was based on a conventional Convolutional Neural Network (CNN) based tracker combined with regionsof-interests from a cluster-based DVS tracker. The tracking system was evaluated on the Ulster dataset to solve the task of tracking an object of interest in a cluttered background with ego-motion. The results showed 90% tracking accuracy with 20 pixel precision for the Ulster dataset. In [73], the authors validated a pedestrian detector system based on multi-cue event information fusion. The system leveraged three different event-stream encoding methods -Frequency, Surface of Active Event (SAE), and Leaky Integrate-and-Fire (LIF).
Closest to the technique presented in this paper is clustering based object detection and tracking explored in [74] for a single object and in [75] for multiple objects. In [74], the authors used event cameras to preserve a pedestrian's privacy while detecting his/her presence. The authors validate a proof-of-concept approach to cluster a single human in the cluttered environment, calculate the cluster centroid and track it over time. In [75], the authors presented a multiobject tracking technique that pre-filtered event data to reduce computational complexity, identified event clusters (representing multiple objects) using spatial variance. They tracked the identified clusters using a partial update Gaussian Mixture Probability Hypothesis Density (GMPHD) filter. The authors tested their approach on a simulated dataset only. The simulated dataset featured multiple virtual small Unmanned Aerial Vehicles (sUAVs) created using Blender 3D design software [76].
This paper extends this growing body of work by implementing a DBSCAN and k − d tree-based approach to experimentally validate real-time detection and tracking of up to 4 ground robots operating in an indoor environment.

SYSTEM ARCHITECTURE
This section elucidates the hardware and software setup for capturing real-time event information from the camera.

Fig. 2
The ground robot used in this study consisted of two treads that were differentially driven. The robot featured a 9degrees of freedom inertial measurement unit (IMU) with an accelerometer, gyroscope, and magnetometer. The robot also had motor encoders that measured the drive motor position and rotational speed. The C++/Python software application handled 1) data acquisition, 2) detection and tracking, and 3) data visualization in real-time. Two data buffers were used -one for accumulating event data for a specified amount of time t a and the other for buffering data for annotated visualizations.

Hardware
The hardware consisted of the event camera, an RGB webcam, and the ground robots.

Camera specifications
The event camera used in the experiments was a VGAresolution contrast-detection vision sensor from Prophesee, shown in Figure 1. This camera features a CMOS vision sensor with a resolution of 640x480 (VGA) pixels with 15µm × 15µm event-based pixels and a high dynamic range (HDR) beyond 120 dB. The camera ran on 1.8V supplied via USB, with a 10mW power dissipation rating in low power mode. The camera was interfaced using USB for communication and was mounted on the ceiling of the experiment area looking down. A Logitech C920 HD PRO webcam was also mounted next to the event-camera to capture the mission and provide ground truth measurements. Both cameras are shown in Fig. 1.

Robot specifications
The ground robot used in this study was a tracked robot based on the Arduino-compatible ATmega32U4 MCU and is depicted in Fig. 2. It featured two 150:1 high-powered micrometal gear motors with integrated dual motor drivers, a ring of RGB LEDs, quadrature encoders, accelerometer,

Fig. 4
Positive event pixel locations were displayed using black color, negative event pixel locations were displayed using blue color, and pixels with no events were displayed using white color. Also shown here are the full robot cluster CL i corresponding to the robot i with centroid C i , the positive events cluster CL i + with centroid C i + , and the negative events gyroscope, and magnetometer. At 100% motor power, the robots moved at approximately 0.46 m/s (and approximately 0.23 m/s @ 50% motor power). The ground robots were networked using a Bluetooth connection and were programmed using a Python API.

Software
All software was developed in C++ and Python according to a modular architecture shown in Figure 3.

Listener service
The listener service used a set of data acquisition functionalities (API calls) provided by Prophesee to read event data. To reduce noise and decrease event data processing time, manufacturer-recommended parameter tuning was performed. The parameters set the operating point of the photoreceptor feedback amplifier, the bandwidth of the post-photoreceptor source follower buffer stage, the refractory period between events, contrast sensitivity, and high pass filtering [77]. The event data contained information about the x, y location of a pixel where the event occurred and the event type (positive or negative). An event was positive when there was a positive change in light intensity, and it was negative when there was a negative change in light intensity. A data buffer continuously collected event objects from the event listener service. This information was then encoded into address-events that were asynchronously transmitted at the periphery via a mechanism called Address-Event Representation (AER). This process was repeated up to 30 times a second. The parameter accumulation time t a was used to specify the time in microseconds for which events were fetched from the past to the current time. These events were then stored in the data buffer before being passed to the detection and tracking service.

Detection and tracking service
The detection and tracking service featured a set of functionalities that ingested the event data from the AER data buffer and provided real-time (x, y) location and heading angle θ of each moving robot in the scene. This service was run 24 times per second to create a smooth track for each moving robot. The robots' detected location along with their respective IDs, were stored in a data buffer to be passed onto the display service.

Display service
A display service consisted of visualization functionalities with a variable display frame rate and a fixed frame matrix (640 pixels X 480 pixels). The frame rate specified the number of times the display service was run in one second. The final image was created by reading every event in the display buffer and assigning an RGB value of (0, 0, 0) or (255, 255, 255) to the pixel location where the event occurred. If the event was positive, the pixel at that location was assigned black color. If the event was negative, the pixel at that location was assigned white color. Image pixels that remained the same were assigned gray color. Additional visualizations were created (to add clarity for the readers) with black-colored positive events, blue-colored negative events, and white color for unchanged pixels as depicted in Figure 4.

Homography Service
As the two cameras above the indoor testing arena were mounted next to each other, homography transformations were necessary to align their reference frames for accurate comparison [62]. A homography step was performed at start-up to account for the distance and angle between camera lenses. Pixel coordinates of the four corners of the foam mat were noted from both the RGB and event cameras. A homography matrix was constructed from these eight 2D points, mapping corresponding corner locations. Finally, a perspective transform was performed on the RGB camera's captured image such that the event and RGB coordinate planes were aligned. The homography service was run at the beginning of each experiment session.

DETECTION AND TRACKING
Detection and tracking of multiple robots was performed using DBSCAN to create positive and negative event clus-ters and a single k− dimensional (k − d) tree to keep track of robot location in the arena. Considering i = 1, . . . , n robots in the experimental arena, the following discussion elucidates the process depicted in Figure 5.

Detection: Position (x, y) and Heading Angle θ estimation
The event-objects stored in the data buffer from the Listener service were used to create three arrays. The first array contained all the positive events in the data buffer, the second array contained all the negative events, and the third array contained all events. Each array was passed through the DBSCAN algorithm with eps and minP ts values as critical parameters. The DBSCAN algorithm returned clusters of partial (positive and negative events) and full events. For each full cluster CL i corresponding to robot i, the (C i x, C i y) coordinates of the cluster centroid C i were determined by calculating the mean of (x j , y j ) coordinates all the points j in that cluster.
(1) The calculated (x i , y i ) locations for each centroid C i of the full clusters CL i were then used to create robot objects with empty positive event cluster CL i + with centroid C i + and negative event cluster CL i − with centroid C i − , and added to a 2-dimensional k − d tree τ. Figure 4 depicts an example of these clusters. C, C + , and C − represent the set of all full, positive, and negative event cluster centroids, respectively, for each timestep.
The ground robot's heading angle calculations from the event data relied on positive and negative event data captured from the event camera. The robot's heading angle was then calculated from the inverse tangent of the positive and negative cluster's centroid for each bot as described in 2.
where (C i x + , C i y + ) are the coordinates of C i + , and (C i x − , C i y − ) are the coordinates of C i − In this manner, the location and heading angle of the robot (C i x, C i y, θ) were estimated. Figure 5 depicts the resulting visualization with the frame of reference, bounding box, tracking id, and heading angle displayed on the image.

IDTrack Technique for Robust Tracking
The detection and heading angle processes were repeated continuously for the entire duration of the mission t = 0 . . . T mission . At each time step t, the k − d tree τ(t) was updated with information about available robot clusters CL i ∀i ∈ {1 . . . n} and their corresponding centroids C i as shown in Algorithm 1.
The tracking suffered from spurious noise effects that caused DBSCAN to assign new IDs to clusters. Additionally, when the robots would slow down or rotate in place, the event camera would report fewer events (and hence sparser clusters), thereby causing lost tracks/IDs or mislabeling of robots. To address these issues, IDTrack leveraged nearest neighbor searches between the robot cluster centroids.
IDTrack was initiated as soon as information about the first full detected cluster centroid C i=1 was added to the empty k − d tree τ(0).
After this initiation step, a nearest neighbor search τ(t).NN was conducted for all other detected full event cluster centroids C j (t) at a given time step t. The Euclidian distance d ij (t) between C j (t) and its nearest neighbor C i (t) ∈ τ(t), ∀i, j ∈ {1, . . . n}, i = j was used to define two possible cases: 1) New Robot Discovered: If this distance d ij (t) was greater than the width of the physical robot chassis σ, the new cluster was inferred as a distinct robot.

Algorithm 1: IDTrack for Robust Tracking
A new node corresponding to this newly discovered robot was created in τ(t) with (C j x(t), C j y(t)) coordinates. A counter variable called nextID was incremented by one each time a new node corresponding to a distinct robot was added to τ(t). This counter helped keep track of the sequence of IDs being assigned to the newly discovered cluster centroids. 2) Same Robot Rediscovered: On the other hand, if d ij (t) was less than or equal to σ, the centroid C j (t) was inferred to belong to the same robot represented by centroid C i (t) ∈ τ. In this case, C i (t) ∈ τ was overwritten by C j (t) as the latest centroid information about the corresponding robot.
Since the positive and negative event streams captured from the camera were not necessarily in the same order as the full event stream, additional data processing was performed to ensure that the positive and negative cluster centroids were assigned to the correct robot. This was achieved by running nearest neighbor searches between C(t) ∈ τ(t) and C + (t), and C(t) ∈ τ(t) and C − (t) at each time step t.

Fig. 7
Left image: 4 robots circular paths. Right image: 2 robots square paths. The only feedback control applied to the robots was a PID controller using motor encoders. As such, the tracing of the shapes was not perfect, as seen here.

RGB ground truth data
A RGB webcam-based detection, robot heading angle estimation, and tracking system was developed in Python to provide ground truth data. An output frame of this system is depicted in Figure 6. This was a frame-based system developed using OpenCV libraries to detect and identify robots based on the color of their 3D printed shell [78]. The algorithm provided a k − d tree containing the centroid locations of each robot in the frame. The use of a k − d tree along with OpenCV libraries has been studied in the literature [79]- [81]. This system first converted the input RGB image to HSV representation. Using relevant OpenCV library functions such as findContours(), the system was used to detect each robot's area in the frame and calculate the robot's centroid. This process was repeated at 24 frames per second to generate tracking information about the robots in the arena. These results were manually cross-checked and corrected for any labeling errors/missed detections for a high-quality ground truth dataset.

EXPERIMENTAL SETUP AND RESULTS
The event-based camera and the webcam were mounted on the ceiling directly above a 183 cm × 183 cm area. The area was covered with white foam mats. At the start of each experiment, the robots were placed on the white foam mat, as depicted in Figure 1. In all experiments, the robots were programmed to trace predefined paths (circle or square) on the mat via the Python interface, as shown in Figure 7. The event-based and RGB frame-based software trackers were simultaneously executed. The host system was configured with the Intel i7 8th generation processor @ 1.8 GHz and 16 GB RAM. The runs lasted between 15 seconds to 1 minute.

Key Metrics
The key metrics used in the study are discussed next.

Detection Metrics
The detection performance was assessed using Precision, Recall, and Mean Absolute Error. Precision is the ratio of the number of correct detections to the total number of detections, recall is defined as the ratio of the number of correct detections to the total number of true objects in the data, and Mean Absolute Error is a measure of errors between paired measurements: where T P , F P , F N , b i , a i , and n m are the number of True Positives, False Positives, False Negatives, event data measurements, ground truth data measurements, and the number of measurements.

Tracking Metric
The tracking system performance was assessed using the multiple object tracking accuracy (MOTA) metric proposed by Bernardin [37]. MOTA is defined as where M k , F P k , and mme k represent the number of missed sequence, false positives, and mismatches in frame k, respectively. g k is the number of ground truth objects in frame k.

Clustering Metrics
The clustering performance was assessed using n avg−clusters and A ratio . Robot area analysis was performed with t a = 100,000µs.
1) n avg−clusters : the average number of clusters detected per robot. Figure 8 illustrates this metric.
where n avg−clusters is the number of clusters detected and n is the number of robots.

Fig. 9
An example of A ratio metric calculations as applied to full event clusters for DBSCAN. The detected area A d is enclosed by the blue rectangle. The actual physical robot chassis area A actual is represented by pink rectangle. Ideally, A ratio should be as close to 1 as possible for a given minP ts value. (t a = 100,000 µs) Fig. 10 To understand the effect of minP ts, the average number of clusters detected per robot (n avg−clusters ) and the ratio of detected cluster area and robot body area (A ratio ) were noted for a single robot for two scenarios (50% motor power and 100% motor power). The minP ts value (or range of values) that provided the values of n avg−clusters and A ratio closest to 1 were used for subsequent experiments involving more robots. (t a = 100,000 µs) 2) A ratio : the ratio of detected cluster area A d , and actual robot chassis area A actual . Figure 9 illustrates for this metric.

Experimental results
The experiments focused on studying the detection and tracking performance of the proposed method in scenarios 1) with varying the DBSCAN minP ts parameter, 2) with changing t a of the camera, 3) involving 2, 3, and 4 robots, 4) with varying speed (at 50% motor power and 100% motor power) of the robots, and 5) variation in ambient light conditions. Each experiment was conducted 3 times and the average values over these 3 runs are noted in the following discussions.

Effect of changing minP ts on robot detection
The effect of changing the minP ts parameter value on the detection performance was evaluated using the metrics n avg−clusters and A ratio . Figure 10 depicts the values of these two metrics averaged across 3 runs for a single robot moving at 50% speed and 100% speed, respectively. Ideally, for a single robot, DBSCAN should detect one full-body cluster on average per robot, i.e., n avg−clusters = 1. DBSCAN should also provide A ratio = 1.
Ideally, for partial positive (or partial negative) clusters, DBSCAN should detect one partial positive (or partial negative) cluster on average per robot. The robot's partial cluster area was measured to be one-third of the overall area of the robot chassis. Figure 10 shows that at 100% robot speed, minP ts = 45 provided the best values for n avg−clusters and A ratio for the partial positive clusters. For 50% robot speed, this value was minP ts = 35 for the partial positive clusters. The same value was used for the partial negative clusters. Similarly, a minP ts value between 220 and 230 provided the best n avg−clusters and A ratio values for the full robot clusters. The minP ts values noted here were used for all subsequent experiments.
The number of events captured by the event camera is affected by the speeds of the robots. This dependence affects the overall cluster quality, as observed in Figure 10. Of note is the 50% speed and full robot cluster scenario where changing minP ts between 150 and 170 dramatically affected n avg−clusters and A ratio values.
Key Insight: minP ts is a critical parameter for the DB-SCAN algorithm. As reported in Figure 10, comprehensive tests can provide a minP ts value or range of minP ts values that lead to the best results for n avg−clusters and A ratio .

Effect of changing camera accumulation time t a
The effect of changing the t a value on detection performance was evaluated using the MAE distance metric. Figure 11 depicts the values of this metric averaged across 3 runs for a single robot moving at 100% speed for 4 different t a values.
t a values of 100,000µs yielded better detection results than lower t a values in otherwise equivalent experiments. MAE distance results improved for circular and square patterns as t a increased. MAE results for circular paths were lower than square paths. This difference in the results between the two path patterns is attributed to the constant motion of circular paths, where the robot did not pause to turn, and detections were consistent. When the robot made 90 • zero-point turns for the square path pattern, fewer events were generated relative to when it traced the edges of the square. This reduced the detection quality and resulted in higher MAE.
t a = 100,000µs was selected for all subsequent experiments.
Key Insight: At a given robot speed, lower accumulation times led to fewer events buffered by the listener service. This caused sparser clusters and higher MAE. Accumulation time thus becomes a pivotal parameter to appropriately tune the sensitivity of an event camera to the change in brightness of objects in the field of view of the camera.

Effect of increasing the number of robots
The effect of increasing the number of robots on detection performance was evaluated using Precision, Recall, MAE distance, and MOTA metrics. Table 1 presents the values of these metrics averaged across 3 runs for scenarios with 1, 2, 3, and 4 robots. It is Fig. 11 The effects of changing accumulation time t a on Mean Absolute Error (100% motor power) for a single robot. t a = 100,000µs provided the lowest MAE results and was used for subsequent experiments. Key Insight: The robots were operating on white-colored foam mats. Different robot bodies generated varied numbers of events depending on the color of their 3D printed shell. For example, the robot with a black-colored shell resulted in denser positive and negative event clusters compared to the robot with the yellow-colored shell. As the number of robots increased, a wider range of body colors was introduced into the experiments leading to an increase in MAE distance.

Effect of changing motor speeds
Motor speeds were changed by changing the power to the robot drive motors. The effect of changing motor speed value on detection performance is reported in Table 1. Key metrics used to evaluate the effect of motor speed on performance are Precision, Recall, MAE, and MOTA. Table  1 reports the detection and tracking results of experiments with 1, 2, 3, and 4 robots, respectively. Two sets of experiments were conducted -at 100% motor power (∼0.46 m/s robot speed) and at 50% motor power (∼0.23 m/s robot speed), respectively.
It is observed that Precision, Recall, and MOTA metrics remained high regardless of motor speed in both path patterns. For square patterns, MAE increased as the robot number increased. Square patterns produced higher MAE compared to circular path patterns. At 50% motor power, MAE distance was greater than at 100% motor power with otherwise equivalent parameters.
Key Insight: Event cameras report events as per-pixel brightness changes. Slow-moving robots or robots that stop moving create less dramatic changes in brightness (and hence sparser clusters) than robots moving faster. Detection on slow-moving objects, therefore, leads to higher MAE.

Effect of varying ambient lighting conditions
Three ambient lighting settings were created by using lighting dimmers and LED light strips as depicted in Figure  12. The three conditions featured fluorescent lights at full intensity, fluorescent lights at dimmed intensity, and the use of LED light strips. Key metrics used to evaluate the effect ambient lighting conditions on performance are Precision, Recall, MAE, and MOTA. A graceful degradation in these metrics was observed as the ambient lighting was modulated from the brightest to the darkest settings. The Precision, Recall, MAE, and MOTA metrics for full brightness are noted in Table 1. For dimmed brightness, the Precision, Recall, MAE, and MOTA degraded to (1, 1, 3.01, 1) respectively for 1 robot, (1, 1, 2.59, 1) respectively for 2 robots, (1, 1, 3.12, 1) respectively for 3 robots, and (1, 0.73, 10.05, 0.72) respectively for 4 robots. All robots were commanded to move in square path patterns. Finally, for the darkest condition, the event camera struggled to detect motion consistently.
Key Insight: While event cameras can operate in varying lighting conditions, their performance is dependent on the overall ambient light intensity. As such, environmental lighting conditions should be considered while evaluating the detection and tracking performance of event camera based systems.

A note about heading angle calculations
Heading angle calculations were performed using the positive and negative cluster centroids as described in Eqn. 2. For a single robot moving in a circular pattern, the minimum MAE θ recorded was 5.76 • , and the maximum MAE θ recorded was 13.01 • . By contrast, the minimum MAE θ for a single robot moving in a square pattern was 30.52 • , and the maximum MAE θ was 50.56 • . A similar trend for MAE θ was observed for the multi-robot case. The following two key reasons contribute to the MAE θ results: Fig. 13 In-flight quadrotor visualized as pixel events. Future work will focus on detecting and tracking concurrent moving quadrotors. 1) As mentioned earlier, during the 90 • zero-point turns for the square path pattern, the number of events generated was significantly less relative to when the robot traced the edges of the square. This reduced the detection quality and resulted in higher MAE. 2) Additionally, the color of the 3D printed shell of the robot also affected the number of events generated (and hence MAE θ).
Further reductions in MAE θ may require use of probabilistic or optical flow techniques -this is a topic of further investigation [16].

CONCLUSION
This study presented a method to detect and track mobile indoor ground robots using event cameras. Using DBSCAN and k − d trees, this method achieved comparable performance to existing frame-based detection and tracking methods without the need for any training. With high detection and tracking fidelity in the face of event camera noise and robots stopping, experimental evaluations point to this method's suitability for real-time robot control applications. Future work will aim to extend this study to detection/track multiple quadrotors as displayed in Figure 13.