A Two-Layer Control Framework for Persistent Monitoring of a Large Area With a Robotic Sensor Network

The deployment of a group of robots equipped with sensors for monitoring, also known as a robotic sensor network, is a promising technological solution to solve time-critical societal and environmental issues. This paper considers the problem of deploying a robotic sensor network to persistently and effectively monitor multiple locations of interest in a large field represented by grids. To this end, we propose a novel two-layer control framework where the first layer (i.e., task allocation strategy) encapsulates the targeted grids into a set of tasks (small regions of interest) followed by optimally allocating the robots to each task based on their initial position (location of their base stations) and sensing capabilities. The second layer (i.e., persistent monitoring algorithm) generates each robot’s motion control input to ensure persistent monitoring over the designated region of interest. The proposed framework is demonstrated and evaluated via numerical simulations. It is shown that the proposed control framework improves real-time monitoring in terms of both the coverage performance and travel distance of the robots.


I. INTRODUCTION
The deployment of a group of robots equipped with sensors for monitoring, also known as a robotic sensor network, is a promising technological solution to solve time-critical societal and environmental issues, such as infrastructure survey, environmental monitoring, and search-and-rescue mission [1], [2], [3], [4], [5], [6].Notably, a robotic sensor network promises a multitude of benefits, such as the capability to cover multiple places at the same time, robustness to individual robot failure, and flexibility in The associate editor coordinating the review of this manuscript and approving it for publication was Renato Ferrero .completing a complex task.To take the most of these benefits in ensuring a successful monitoring mission, a robotic sensor network requires a coordinated control scheme to effectively cover the targeted areas of interest.
The focus of this paper is on the deployment of a robotic sensor network to persistently and effectively monitor multiple locations spread across a large field, as depicted in Fig. 1a.To this end, coverage control has been demonstrated as one of the most promising approaches for coordinating a robotic sensor network [6], [7], [8], [9], [10], [11], [12], [13], [14], [15].Briefly speaking, coverage control is a distributed control strategy for deploying a robotic sensor network by maximizing a cost function that represents the information gained over a given domain.Here, the importance of each location of interest is emphasized by a spectrum of values denoted as the density function.Coverage control has been studied for various sensors' footprints [10], [11] and has also been demonstrated on ground robots [10], flying robots [12], [13], and a combination of both [14], [15].Originally, the coverage control algorithm was designed to ensure a stationary configuration of the optimal robots' position.However, when the region to be monitored is much larger than the total sensing capability (footprint) of the robotic sensor network, having a stationary sensor configuration may result in some important areas being left uncovered.
An extension of the coverage control to ensure the whole region to be covered at least one time has often been categorized as dynamic coverage control [12], [16], [17], [18], [19], [20].The idea is to introduce an exogenous input for the robotic sensor network given by a time-varying density function.Specifically, the density function on a sub-region is updated, marking it less important, once a robot senses it.Built on top of this idea, the persistent coverage control scheme [15], [20], [21] introduces the concept of information decay that rebounds the density function after the robot leaves the sub-region for a certain time.This addition generates a persistent patrolling motion over a given region of interest.The literature has often demonstrated persistent coverage control via full-domain exploration, where the whole domain initially has equal importance.Note that specifications for monitoring vary depending on the application scenarios.In some practical cases, the full-domain patrolling motion may not be required.Instead, patrolling only over several locations within the domain can be sufficient.To this end, the discussion on persistent coverage control which focuses on only a spread of locations in the large region of interest remains lacking from the literature.This particular case is especially important as a poorly designed controller may render the robots stuck in local optimal as shown in [22] which also makes the robots waste their energy to monitor unimportant areas.
A potential strategy to remove the unimportant areas is by restructuring the domain into a number of smaller regions, where each region represents a cluster of important areas to be persistently covered.To this end, one needs to solve the following problems: i) how to effectively divide the domain and ii) how to optimally allocate the robotic sensor network to the cluster of areas, which is a combinatorial optimization problem.Swarm intelligence is a class of meta-heuristics approach which is able to find good solutions (but not necessarily optimal) of combinatorial optimization problems within a reasonable amount of time.Particularly, the Ant-Colony Optimization (ACO) algorithm [23] is a well-known swarm intelligent algorithm for solving the Traveling Salesman Problem, that has also been adopted to the Task Allocation problem in [24], [25], [26], [27], and [28].Regardless, a thorough and extensive study is required to explore the potential use of ACO in persistent monitoring using a robotic sensor network.
In this paper, we propose a novel two-layer control framework to persistently monitor the targeted locations in a large field, here formulated as a set of targeted grids, to improve the coverage performance and reduce the robots' travel distance.Specifically, the first layer (task allocation strategy) encapsulates the targeted grids as several smaller regions via K-means clustering [29], [30] followed by assigning the robots to monitor the regions accordingly via a modified ant colony optimization (M-ACO) [31].Then, the second layer (persistent monitoring algorithm) models the importance of the area in each smaller region as a density function and further generates the control input for each robot based on a persistent coverage control algorithm [20], [21].In particular, the main contributions of this paper are listed as follows: • We propose a modified ant colony optimization (M-ACO) to find the most suitable agents (robots) to cover the tasks (regions), where the earlier version is presented in [31].The proposed M-ACO in [31] prioritizes the compatibility of the task area with the total capability of the agent(s) allocated to that particular task, resulting in a limitation where task(s) with a small area may not be covered by any agent.In this work, modifications to the M-ACO are introduced to guarantee that each task is covered by at least one agent.
• A novel design of the density function is introduced to model the area of importance within the region.The design is also a novel feature of our coverage algorithm, as most of the literature on coverage control assumes that the density function is given.Furthermore, compared to our previous framework in [31], we improve the encapsulation of the targeted grids as a convex hull (formerly a box rectangle) which can further improve the coverage performance.Collision avoidance between the robots is also explicitly considered by imposing additional constraints using the control barrier function approach [6], [32], [33], [34].
• We present an extensive simulation for the proposed two-layer control framework.It highlights the benefits of dividing the targeted grids into several smaller regions that help improve coverage performance and reduce the overall travel distance.The rest of this article is organized as follows.In Section II, we provide the problem settings considered in this paper.We present our two-layer control framework in Section III, followed by a more detailed formulation of the first layer (Task Allocation Strategy) in Section IV and the second layer (Persistent Monitoring Algorithm) in Section V. Section VI presents the numerical simulation and discussion of the proposed framework.Finally, Section VII presents a concluding remark.

II. PROBLEM STATEMENT
In this paper, we consider a scenario of monitoring a specified field, formally denoted as a convex polytope-shaped region (compact set) F ⊂ R 2 .The field F is further discretized into square grids with equal width D s where a number of grids of interest need to be monitored (also referred to as targeted grids), as depicted in Fig. 1.Let the set M denote a collection of indices of m number of targeted grids in the field F whose center location is represented by h i ∈ F, i ∈ M. The targeted grids over the field F can then be defined as To monitor the targeted grids H, we consider a solution with n number of autonomous robots equipped with sensors working in cooperation, often referred to as robotic sensor networks.
Let the set of identifiers for the robots is denoted by I = {1, . . ., n}, and let the term p i (t) = [x i (t) y i (t)] T ∈ F describes the position of the robot i ∈ I at the time t.Here, we assume that each robot starts from its own base station p i b , i.e., p i (0) = p i b .Furthermore, it is assumed that the position of each robot is updated according to the kinematic model1 : where u i ⊆ R 2 denotes the velocity input to be designed.Each robot i's sensor is assumed to be able to sense a subset of the area in the field within a fixed sensing radius R i > 0, with the sensing region further defined as Within this paper, let us define the size of a given region R as A(R).We can then quantify the sensing capability of each robot as the size of its sensing region, i.e., A(B i ) := πR 2 i .To this end, the objective of this paper is to design a cooperative control framework for the robotic sensor network to maximize coverage over the targeted grids, i.e., H, at all times.Note that in most cases, the energy consumption of the robot is proportional to its travel distance [33].Thus, to ensure the long duration of the monitoring mission, the proposed solution needs to specifically account the distribution of the targeted grids to reduce the required traveling distance from the base station as well as during the monitoring mission.
An example of the considered scenario is the monitoring of a field, where the grids correspond to areas that each static sensor needs to cover [31].However, coverage holes (unmonitored areas) may appear, e.g., due to malfunction or lack of sensors [35], [36].As a complementary solution, a group of quadrotors equipped with sensors (the robotic sensor network) can then be deployed for monitoring the coverage holes (the targeted grids), resulting in a hybrid monitoring system that comprises of static and mobile sensors [37], [38], [39].

III. PROPOSED TWO-LAYER CONTROL FRAMEWORK
Given the presented problem statement and objectives, we formulate our framework for solving the following problems: 1) to encapsulate the targeted grids H to monitor as a set of tasks and provide an optimal allocation based on the robots' initial position and sensing capabilities, and 2) to design a control input u i in (1) for each robot i ∈ I to ensure persistent monitoring over the designated tasks' region.The separation of these two problems allows us to approach the solution in two separate layers: the first layer (i.e., a higher layer) which divides the original targeted grids into smaller tasks followed by allocating each robot to one of the tasks while accounting for the expected distance each robot needs to take, and the second layer (i.e., a lower layer) which directly governs the motion of the robot to ensure persistent monitoring.Throughout this paper, we refer to these higher and lower layers respectively as task allocation strategy and persistent monitoring algorithm.The Fig. 2 illustrates our proposed two-layer control framework.
The task allocation strategy comprises dividing the targeted grid H into ℓ smaller regions via K-means clustering and then assigning the n number of robots accordingly via a modified ant colony optimization (M-ACO).The process of dividing the targeted grids H into ℓ smaller regions reduces the total area to be covered and, at the same time, localizes the area that each robot needs to monitor.Here, the K-means clustering algorithm is chosen since it is a straightforward clustering method that can decide whether an object belongs to one cluster or not [29], [30].Since K-means clustering aims to minimize the pairwise squared deviations of points (grids) in the same regions (clusters), this will help to reduce the travel distance of the drone for monitoring all the grids within its region.Moreover, the requirement to define an explicit number of the generated clusters allows a thorough study of how these region divisions impact the coverage performance, as will be presented later in Section VI.We further impose that ℓ ≤ n and each region need to be monitored by at least 1 robot and all n robots need to be allocated.The Modified Ant Colony Optimization (M-ACO) is then used to allocate the robots to regions by considering the size of each region, the sensing capability of the robot, and the distance between the robot's initial position and the region.Our proposed M-ACO method is a modified version of the ACO algorithms that aims at finding the most suitable agents (robots) to cover the tasks (regions).Here, some modifications are introduced to fit the intended scenario and the aimed objective.
Finally, the persistent monitoring algorithm comprises modeling the importance of area on each smaller region as a density function and deploying the robots based on a timevarying Voronoi-based coverage controller [20], [21].Here, we introduce a novel approach to construct a density function for fields in grids.Then, the control input u i for each robot is computed via a gradient ascent algorithm to maximize the coverage within each robot's sensing area.The density value across each region varies over time, i.e., the value at a certain position decreases when a robot passes through it and increases again once it is outside any robot's sensing area.These time-varying properties of the density function along with the gradient ascent algorithm ensure a continuous motion to travel to all the targeted grids.Additionally, obstacle avoidance via control barrier function [6], [34] is implemented to ensure no collision between robots.The detailed descriptions of our approaches are described in more detail in the subsequent sections.

IV. TASK ALLOCATION STRATEGY
In this section, we detail our proposed approach for a higher layer, which introduces ℓ smaller regions and then accordingly allocates all n numbers of robots into the regions.Each smaller region is identified as The output of the modified ant colony optimization (M-ACO) is the sets of robots I j ⊆ I to monitor each region j ∈ C.

A. K-MEANS CLUSTERING
The K-Means clustering requires the explicit number of clusters to be generated and the data to be clustered, i.e., ℓ and h i , ∀i ∈ M. In practice, ℓ is chosen by considering the number of available robots, the distribution of the targeted grids, and the size of the field F. The summarized procedure of K-Means clustering is given as follows: 1) First, initialize ℓ number of random points inside F as the initial centroids of the cluster.2) Each point located at h i , ∀i ∈ M is then assigned to the closest centroid, by calculating the Euclidean distances between the point and the centroids.
3) The centroid positions are then updated by taking the average of all points that were assigned to them.4) Steps 2 and 3 are repeated until the centroid positions converge, i.e., do not change anymore.We direct the interested readers to [29] for more details on the algorithm.
The output from K-Means clustering is ℓ groups of targeted grids.For each group j ∈ C, the cluster of the targeted grids is denoted as M j ⊆ M.Then, the region to be monitored in each group j is constructed as a convex hull (conv) that contains each grid i ∈ M j , i.e., with the region's centroid as c j .To this end, the combination of the regions overlaps the targeted grids H, i.e., ∪ j∈C Q j ⊇ H.The resulting ℓ regions from this procedure are illustrated in Fig. 2, where each region for ℓ = 4 is denoted by the colored polygons with its centroid is denoted by the colored circle.

B. MODIFIED ANT COLONY OPTIMIZATION
In this subsection, we describe our proposed modified ant colony optimization (M-ACO) method.The goal is to obtain an equal distribution of n robots in covering the ℓ regions, where ℓ ≤ n.Referring to the terminology of ''agents'' and ''tasks'' as commonly used in ACO [23], in this work, the agents are the robots and the tasks are the regions with targeted grids.To this end, both the robots (agents) and the regions (tasks) are represented as nodes, while the ants are the computational units to find the optimum coalition of robots.Given an N number of ants and M number of iterations, the general procedure of the M-ACO method is summarized as follows: 1) An ant k ∈ {1, • • • , N } randomly chooses the pairing of region and robot based on a given probability function p k ij (t).Here, p k ij (t) defines the probability of ant k in pairing robot i with region j based on the pheromone concentration and heuristic function at the time t.
2) Ant k repeats step 1 until all regions are allocated with at least 1 robot and one of the following conditions is satisfied: each robot has been allocated to a region, or all regions has been covered with enough robots.3) Repeat steps 1-2 for all N ants.4) Updates the pheromones concentration based on the chosen pairings from all N ants.5) Repeat steps 1-4 for M number of iterations.In this work, we define the probability for ant k in pairing robot i with region j as where τ ij denotes the pheromone concentration of pair ij, η k ij denotes the heuristic function of pair ij for ant k, and α and β denote the importance of the pheromone and the heuristic value respectively.Here, allowed k is the set of robots and regions that can be chosen by ant k.In our previous work [31], allowed k is defined as the robots that have not been allocated to any region and the regions that still have an ''uncovered'' area.Here, we modify the definition of allowed regions in allowed k into two conditions: 1) If there are still regions that have not been paired with any robot, then the regions that can be selected are the regions that have not been assigned with any robot; 2) If all regions have been allocated with at least one robot, then the regions that can be selected are the regions that have not been fully covered.
The heuristic function η k ij is defined as where I j ⊂ I is the set of robots already selected for the region j ∈ C. The heuristic function in (3) considers the following terms: 1) size of each region, A(Q j ), j ∈ C; 2) sensing capability of each robot, A(B i ), i ∈ I; 3) distance between the robot's initial position (the base station) and the centroid of the region, ∥p i b − c j ∥, i ∈ I, j ∈ C; such that the ants tend to pair the region that has not been fully covered yet and then pair it with the closest robot.Note that due to the change of definition in allowed k which regulates how the ants pair a region with a robot, the heuristic function introduced in [31] may give a negative value when the size of the region is smaller than the sensing capability of the robot assigned to it.This will result in negative probability (see (2)).Thus, an additional term is added to avoid a negative value in the heuristic function (see (3)).
One iteration of pairing the robots and regions for a single ant (Step 1-2) is defined as one tour.At the end of one tour, each ant records the changes it imposes to the pheromone in each pair ij (pairing of robot i to region j) as where P is the pheromone strength, and ϵ k is the efficiency factor of ant k in one tour, defined as j∈C k∈I j d kj Once all ants finish their pairings (step 3), the pheromones for each pair ij are updated according to where ρ is the pheromone evaporation coefficient.Here, the pheromone concentration is iteratively updated by minimizing the following two costs: (i) the difference between the size of the region and the total sensing capability of the allocated robots; (ii) the distance between the paired robots and regions.At the end of M iterations, the chosen solution is the ant tour which yields the maximum efficiency factor ϵ k .The output of the M-ACO algorithm is the set of indexes I j ⊆ I, j ∈ C which ensures a distinct selection of robots to each given region, i.e., I j ∩I k = ∅ for j ̸ = k.The information of each region Q j , j ∈ C, the location of targeted grids within (i.e., h i , ∀i ∈ M j ), and the selected robots (i.e., I j ) will be used to generate each robot's control input u i as described in the next section.

V. PERSISTENT MONITORING ALGORITHM
In this section, we describe our proposed approach for the lower layer, that formulates each robot's control input to ensure all the assigned robots persistently cover their designated region.For simplicity, in the remain of the discussion we focus on a single region of Q j and the robots in the set I j where j ∈ C.

A. MODELING AREA OF IMPORTANCE
In this subsection, we first describe the procedure of defining the degree of importance for all locations within the region Q j .Here, we define a range of value [0, 1] where 1 specifies the location with the highest importance to monitor and 0 as the lowest.Thus, the locations nearby h i , ∀i ∈ M j (targeted grids within Q j ) need to be represented with values close to 1.To achieve that, we design a density function φj : Q j → [0, 1] which is formulated using a mixture of Gaussian functions: where ∈ R 2×2 is the covariance matrix, c f ≥ 0 is a fixed minimum value of the density function, and k ∈ [0, 1] is a fixed gain to adjust the maximum density value.In this paper, we consider a covariance matrix in the shape of = σ 2 I 2 with σ > 0. Note that each selection of c f , k, and σ provides unique characteristics in highlighting the contrast of importance between the targeted grids and the remaining area in Q j , as will be shown later.Additionally, the parameters need to be selected such that the maximum density value over the field Q j does not exceed 1.
Given the formulation in (5), notice that any targeted grid i ∈ M j increases the density values over its surrounding grid's center h i .Note that this influence diminishes as distance from h i increases.The term φ j i (q) can be written as exp Then, the computed values of φ j i (q) for any points q around h i whose d j i equal to σ , 2σ , 3σ and 4σ are 0.607, 0.135, 0.011 and 0.0003, respectively.These series of values serve as a basic benchmark in guiding the selection of appropriate parameters for σ as well as k for a given field with grid distance D s .
To illustrate this further, we present two examples for a field with 49 targeted grids (a 7m × 7m with D s = 1m) as shown in Fig. 3.Both examples share the same value of c f = 0.3 with two different sets of k and σ .Fig. 3a presents a case where σ = 0.25 (D s = 4σ ) which results in almost non-existent influence from φ j i (q) to the adjacent grid.Hence, the selection of k can be set close to the remaining 0.7 values.On the other hand, Fig. 3b presents a case with σ = 0.5 (D s = 2σ ) that provides significant influence from φ j i (q) to the adjacent grid, hence resulting in a smaller range of permissible values for k.
In both examples in Fig. 3, the max value of k which ensures φj (q) ≤ 1 for any q ∈ Q j can be approximated by considering the influence from direct adjacent neighboring targeted grids, such as solving This results in k < 0.6991 and k < 0.4335 for the case in Fig. 3a and Fig. 3b, respectively.Note that with σ ≤ 0.5 D s the influence from the targeted grid further than √ 2D s are significantly smaller, and hence it is sufficient to choose a slightly smaller value than the approximated k, e.g., 0.433 for Fig. 3b.However, with a larger value of σ , the designer needs to assess the degree of influence from each Gaussian function φ j i (q) component when deciding the appropriate value of k.
Finally, let us focus our discussion on the selection of c f .Consider a case where M j comprises multiple clusters of targeted grids that are apart from each other, e.g., when the number of available drones is smaller than the number of clusters of targeted grids.An example of this is shown in Fig. 4. Here, the gaps between the groups may result in an area with φj (q) = c f .Providing the c f = 0 can potentially hinder the robot from crossing from one area to another, as often observed for voronoi-based coverage controllers [22] (described later).Thus, a positive c f , i.e., c f > 0, highlights a small importance of these regions where φj (q) = c f and promotes the exploration that potentially results in the robot traversing to all targeted grids.Note that a high value of c f further promotes the exploration, but with a drawback of reducing the relative importance of the targeted grids.

B. VORONOI-BASED COVERAGE CONTROL WITH OBSTACLE AVOIDANCE
Finally, in this subsection, we present the computation for each robot's control input u i based on the provided density map.First, let us consider the collocations of p i for all i ∈ I j as p j .Then, the Voronoi partition of each region Q j [7], namely the collection of the sets {V i (p j )} i∈I j , is defined as Moreover, let us define the feasible sensing area S i (p j ) as It is shown in [7] that the set S i (p j ) depends solely on the position of robots that lie within a radius 2R i from p i .Therefore, S i (p j ) can be computed in a distributed fashion, e.g., by allowing robots to exchange position information within 2R i radius.
In the remaining discussion, let us consider a practical case when the total sensing capabilities of the assigned robots is much less than the total area to be covered, which requires the robots to periodically visit the broken sensors locations.To that end, we consider the following time-varying density function φ j : with δ, δ > 0 and φj (q) refer to the density map designed via (5).The update rule (6) implies that the importance of a point being monitored by a robot is decreasing with rate δ, and will then be increased with rate δ if it is left unmonitored and thus requiring the robot to revisit that point to maintain persistent monitoring of the targeted grids.Note that the parameters δ, δ are assigned by the designer by considering the characteristics of the sensor attached to the robot.For example, if the sensor requires more time to take measurements, then δ can be set to be small.The control input u i for all i ∈ I j can then be computed based on the gradient ascent algorithm to maximize the following objective function To that end, let us then consider a (partially) distributed computation given by ∂J (p j , t) where mass(S i (p j )) := During deployment, the initial position of each robot may be outside the designated region Q j .Thus, proportional control is introduced for the robot to navigate towards the centroid of Q j (i.e., c j ) until it enters the region Q j .To summarize, the computation of u cov i for i ∈ I j is given by ) otherwise where γ > 0 is the proportional gain.
Additionally, the collision avoidance and actuator limit can be explicitly incorporated by utilizing quadratic programming with control barrier function [6], [33], [34] and additional constraint as follows Here, D is the considered collision distance, γ > 0 is the obstacle avoidance gain, and u lb i and u ub i are the lower-and upper-bound of the actuator range.
Finally, in practice, the computation of the density update in (6) will be performed by a central system for each region j, since each robot hardly knows if other robots have visited each q ∈ Q j .On the other hand, provided the information of Q j , c j , and φ j (q, t) in S i from region j's central system, each robot i ∈ I j can distributively compute u i .

VI. NUMERICAL SIMULATIONS AND DISCUSSION
In this subsection, we demonstrate and evaluate the proposed two-layer control framework in simulations.Particularly, we are interested in investigating the effects of selecting the number of regions to be generated on the overall performance of the monitoring mission.

A. SIMULATION SETUP
Throughout the simulation, we consider a field F of size [−0.5, 19.5] m × [−0.5, 19.5] m discretized into 400 grids with grid size D s = 1 m.Here, we consider 20% of the grids are targeted grids, i.e., m = 80, and we prepare 2 cases of targeted grid distribution, which are 1) uniform distribution throughout F denoted by H u , and 2) clustered distribution towards 4 corners of the field F denoted by H c .Then, we consider that the number of robots to be deployed is n = 6 with 2 different sets of sensing capabilities (homogeneous and heterogeneous) as shown in Table 1.
In total, we investigate these 3 scenarios: A. H u (uniform distribution) with homogeneous robots

B. H c (clustered distribution) with homogeneous robots C. H c (clustered distribution) with heterogeneous robots
Additionally, we prepare 50 sets of random initial positions for the 6 robots within the field F. Given that the number of the generated region should not exceed the number of robots, i.e., ℓ ≤ n, we simulate each scenario for all ℓ ∈ {1, • • • , 6}.Thus, we generate and analyze 300 simulation data for each scenario.
The parameter values for both the task allocation and persistent monitoring algorithms are summarized in Table 2.For the M-ACO algorithm, we utilized these values based on [24].The number of ants corresponds to the number of computational units to try some possible region-robot pairing based on the pheromone and heuristic values (see (2)).Here, we utilized 10 ants where in every iteration, each ant paired all regions with at least one robot.At the end of each iteration, the pheromone value is updated according to (4) before the next iteration begins.When the iterations reach 1000, the best region-robot pairing is then decided.Note that the initial pheromone value / strength was set to 1, which means that all region-robot pairs were given the same pheromone strength of 1, before being updated in each iteration.Pairs that are not chosen by any ants will evaporate with a pheromone evaporation coefficient of 0.2.In this work, we give the same importance to the pheromone value and heuristic value.Thus, both parameter values are set to 1.For the modelling area of importance, we use the same parameter as the example in Fig. 3b as this parameter generates steady slopes for the gaps between targeted grids (as compared to Fig. 3a).Remark: For the deployment in other scenarios, Table 2 can be used as a reference for the selection of the parameter values.For the M-ACO algorithm, the same sets of parameter can be used irrespective of the number of robots (including their sensing range) and regions (clusters).The modelling area of importance can use the same c f and k while the can be scaled linearly to a different grid size.Otherwise, section V-A provides guidelines to compute these parameters.For the persistent monitoring algorithm, the δ and δ are dictated by the specification on how long the robot needs to stay in order to gather the information and how often the information needs to be revisited.The D, u lb i and u ub i rely on the robot's physical features, while the γ can be used as it is or adjusted if more reactive response is needed by making the gain smaller.

B. EVALUATION OF TASK ALLOCATION STRATEGY
In this subsection, we first evaluate each step in task allocation strategy, namely K-means clustering and Modified Ant-Colony Optimization (M-ACO) algorithm, for 300 simulation data at each of the 3 scenarios.

1) DIVIDED REGIONS FROM K-MEANS CLUSTERING
As described in subsection IV-A, the procedure of K-means clustering requires the input of the desired cluster, i.e., ℓ, and the initial value of the centroids of the cluster.Within our test, we generate the initial values of the centroids randomly with a fixed seed.The fixed seed minimizes the variation from K-means clustering and results in a consistent shape of clustered regions for each ℓ value.Examples of the resulting clustered regions are shown in Fig. 5 for both H u (uniform distribution) and H c (clustered distribution).Note that the K-means clustering does not take account of the robots' initial position and sensing capabilities, and thus there is no difference in the results for homogeneous or heterogeneous robots.The detailed values of the area for each region after K-means clustering are summarized in Table 3. 4160 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The colored polygons and circles denote the resulting clustered regions and its centroid, respectively.The color indexes for region 1 to 6 are blue, orange, green, red, purple, and brown.
From Table 3, we observe that with the increase of ℓ, the reduction of the total area of the generated region is more significant on H c compared to H u .Moreover, as H c is intentionally generated based on 4 clustered distributions, we can observe that the sum of the regions size is significantly reduced towards ℓ = 4 and with reduced improvement for ℓ = 5 onward.Thus, the K-means clustering helps in condensing the area to be monitored, but has the most impact on targeted grids whose distribution is more clustered.

2) TASK ALLOCATION PERFORMANCE FROM M-ACO ALGORITHM
Next, we evaluate the task allocation performance of the M-ACO algorithm.Here, we define the metric to evaluate the task allocation as the occupancy ratio between the summation of the capabilities from the allocated robots versus the region j's size as The case where k∈I j A(B k )/A(Q j ) > 1 denotes the condition where the sum of the allocated robots' capabilities exceeds the area to be monitored.Thus, we capped the value to 1 using the minimization function.For each scenario at each value of ℓ, we compute the mean ± variance of ϕ j for all 50 simulations with different robots' initial positions.The resulting graph for the 3 scenarios is shown in Fig. 6.Additionally, we evaluate the distances of each robot to the allocated region's centroid.The mean ± variance of the initial distances and also the deviation from the true minimum, i.e., the shortest distance of each robot to any region's centroid, are shown in Fig. 7.
Note that the generated regions from K-means clustering is consistent for each ℓ value in each scenario.Thus, the resulting variations of ϕ j on Fig. 6 are solely the results from the M-ACO algorithms when allocating the robots to each region.Observing the result on scenario H u (uniform distribution) with homogeneous robots in Fig. 6a, we can observe that the ratios ϕ j are relatively consistent for all regions on each different value of ℓ. Specific to ℓ = 6, the M-ACO algorithm is reduced to one-to-one allocation of robot-region based on the shortest pairing distance.Due to the homogeneity of the sensing capabilities, the allocation ratio is consistent, i.e., the variance value is 0. A similar observation can be seen for the scenario H c (clustered distribution) with homogeneous robots in Fig. 6b.In comparison to Fig. 6a, the value of the ratio is slightly larger for ℓ ≥ 3 due to the smaller FIGURE 6. Occupancy ratios (mean ± variance) between the allocated robot's sensing capabilities versus the region's size, i.e. ϕ j , for 50 simulations on each scenario.A larger value of ϕ j (with a maximum of one) means better occupancy to region j.The color indexes for region j = 1 to 6 are blue, orange, green, red, purple, and brown.sizes of region after K-means clustering in H c compared to H u .Finally, in the case that the sensing capability of the robots is heterogeneous as shown in Fig. 6c, the resulting means of the ratio ϕ j follows closely to the Fig. 6b with slightly larger variance.
From Fig. 7, in each scenario, we can observe a consistent distribution of mean and variance for each robot's distance to the allocated region's centroid, i.e., ||p i − c j ||, i ∈ I j , j ∈ C, suggesting optimal distribution in terms of distances.Additionally, we can also observe the deviation from the minimum distance pairings of a robot to any region's centroid.With ℓ = 1, the only possible pairings are toward a single region, thus no deviation occurs.A larger deviation is introduced along the increase of ℓ.This illustrates the trade-off in distance to ensure each region is allocated to at least one robot and also in a balanced allocation, as also illustrated by the coverage ratio ϕ j .Thus, with these observations, we can conclude that the M-ACO algorithms manage to provide an optimal allocation based on the robot's initial position and sensing capabilities.

C. EVALUATION OF PERSISTENT MONITORING ALGORITHM
Finally, in this subsection, we evaluate the persistent monitoring algorithm after the resulting allocation from the task allocation strategy in the previous subsection.Here, we evaluate the monitoring performance throughout 120s of simulation duration, and we execute the simulation with a time sampling of 20ms.An example of the persistent monitoring algorithm in action is shown in Fig. 6 for the first 45 seconds.

1) COVERAGE PERFORMANCE
Here, we define the ratio of total density value at the time t compared to the modeled density value as ζ (t) := H φ j (q, t) dq H φj (q) dq .This value roughly denotes the performance of the coverage control algorithm over time, where the smaller the coverage ratio ζ denotes the more often the overall targeted grids are being covered by any of the robots.For each scenario at each value of ℓ, we compute the mean ± variance of the coverage ratio ζ over time for all 50 simulations with different robot initial positions.The resulting graph for the 3 scenarios and for ℓ = {1, 3, 6} are shown in Fig. 9.
Through Fig. 9, we can observe that the coverage ratio is significantly reduced within the first 20s, and then the value remains oscillating within a bounded region, where we refer to the middle value of the bounded region as a steady value.Observing the result on scenario H u (uniform distribution) with homogeneous robots in Fig. 9a, the steady value remains similar throughout the various ℓ values.On the other hand, for the scenario H c (clustered distribution) with homogeneous robots in Fig. 9b, the steady value is reduced with the increase of the ℓ values.Then, for the case of heterogeneous robots in Fig. 9c, the results do not vary much from the homogeneous counterpart in Fig. 9b.Note that in the previous subsection, we observe that the K-means clustering provides a significant reduction of the sum of regions' size with the increase of ℓ for the H c compared to H u .Thus, we conclude that the steady value is inversely proportional to the sum of the regions' size, . travel distance (mean ± for each robot for 50 simulations on each scenario.The color indexes for robots 1 to 6 are blue, orange, green, red, purple, and meaning that the coverage performance is increased as the region's size is smaller. 2) TOTAL TRAVELING DISTANCES Last, we investigate the total travel distance for each robot.
For each scenario at each value of ℓ, we compute the mean total travel distance for all 50 simulations with different robot initial positions.The resulting graph for the 3 scenarios is shown in Fig. 10.Here, we observe one more time that the increase of ℓ results in a larger decrease in robots' travel distance for H c (Fig. 10b) compared to H u (Fig. 10a).Note that despite the similar steady value of coverage ratio throughout various ℓ value H u , we can observe a small reduction of the final travel distances as ℓ increases.Finally, by comparing the travel distance for the heterogeneous case (Fig. 10c) versus the homogeneous case (Fig. 10b), we can observe a larger variation of the final travel distance in the heterogeneous case.Here, the robot with a larger sensing radius results in a larger movement gain (mass) as depicted in the equation ( 7) and ( 8), thus resulting with further final travel distance.However, the general trend that the increase of the ℓ results in a decrease in robots' travel distance remains.

VII. CONCLUSION AND FUTURE WORK
This paper presents a novel two-layer control framework for a robotic sensor network to persistently monitor targeted grids in a given field with an improved coverage performance and reduced traveling distance of the robots.The first layer (task allocation strategy) encapsulates the targeted grids into a set of tasks and optimally allocates the robots based on their initial position and sensing capabilities, while the second layer (persistent monitoring algorithm) generates each robot's control input to ensure persistent monitoring over the designated task area.Throughout our numerical simulation, we can conclude that dividing the originally targeted grids into several smaller regions helps in improving persistent monitoring in terms of coverage performance and travel distance.Moreover, this benefit has more impact on targeted grids where the distribution is more clustered than the uniform case.However, as evident from the analysis, dividing into too many clusters will increase the travel distance from/to the base station as the number of robots that can be allocated to a cluster decreases.Hence, there is a trade-off between the system's performances, which needs to be considered in choosing the number of clusters.
To this end, several limitations persist in the proposed framework.The K-means clustering method requires an explicit number of regions to be generated.Moreover, the proposed framework neglects the issue of energy limitation and is not fully adaptive to environmental changes in the monitoring regions, as the task allocation strategy is limited to offline computation pre-deployment of the monitoring mission.Nonetheless, our proposed framework is modular in the sense that each approach can be independently replaced if improvements to the mission performance are needed.To alleviate these limitations, our future works will investigate: a clustering method that optimizes some metrics to remove the need to specify the number of clusters; an online task allocation algorithm that allows adaptation to changes in the monitored environment; energy management that accommodates charging scheduling of the robots; and implementation of the proposed control framework in a real world experiment (e.g., via crazyflie nano-quadcopters in our networked robotic laboratory) to investigate the possible deviations from the current simulation results.

FIGURE 1 .
FIGURE 1.The scenario of a field divided into a collection of grids monitored by a robotic sensor network.

FIGURE 2 .
FIGURE 2. An overview of the proposed two-layer control framework.The targeted grids are divided into 4 regions using K-means clustering.The Modified Ant-Colony Optimization algorithm assigns 6 robots to monitor the 4 regions.The importance of area in each region is modeled as a density function, where the coverage hole has a high value (yellow).Finally, the Voronoi-based coverage control provides the control input for each robot to maximize the coverage of the region.

FIGURE 3 .
FIGURE 3. Examples of density functions for a field with 49 broken sensors (7 × 7m 2 with D s = 1m).(Left) The 3D plot of density functions.(Right) A vertical cut of the density function for y = 0.

FIGURE 4 .
FIGURE 4.An example scenario where the area of importance are distanced to each other.The value of c f = 0.3 highlights the small importance of the remaining regions, hence promoting exploration.

FIGURE 5 .
FIGURE 5.Results of K-mean clustering for targeted grids that is generated with uniform distribution (Hu) and clustered distribution (Hc).The colored polygons and circles denote the resulting clustered regions and its centroid, respectively.The color indexes for region 1 to 6 are blue, orange, green, red, purple, and brown.

FIGURE 7 .
FIGURE 7. The upper figure describes the distance (mean ± variance) between the robot's initial position to the allocated region's centroid, i.e. ||p i − c j ||, i ∈ I j , j ∈ C. The lower figure describes the deviation (mean ± variance) of ||p i − c j || to the minimum distance of robot i to any region's centroid.The figures are results from 50 simulations on each scenario where the color indexes for robot i = 1 to 6 are blue, orange, green, red, purple, and brown.

FIGURE 8 .
FIGURE 8.An example of the robots' trajectory (dashed lines) and the changes in density map for the first 45s of the simulation for the scenario H c (clustered distribution) with heterogeneous robots and ℓ = 3.

FIGURE 9 .
FIGURE 9. Coverage ratio (mean ± variance) over the coverage hole H, i.e. ζ (t), for 50 simulations on each scenario.A smaller value of ζ indicates a better coverage performance.

TABLE 1 .
The sets of homogeneous and heterogeneous sensing capabilities of the robots.

TABLE 2 .
The simulation parameter values.

TABLE 3 .
Sizes of partitioned coverage holes (in m 2 ) from K-means clustering.