Reliable Task Planning of Networked Devices as a Multi-Objective Problem Using NSGA-II and Reinforcement Learning

Tracking and detecting multiple objects is difficult for a single radar device, as it may not have the capacities such as anti-interference and anti-stealth. However, if radar devices of diverse capabilities can be combined to realize collaborative networked operation, the reliability and performance of a radar system in a complex environment can be significantly improved. This paper classify the networked radar-based multi-objective task planning as a combinatorial optimization problem with constraints and abstract a distributed multi-agent system (MAS) model from a networked radar system. A node-selection algorithm was designed based on a greedy policy to narrow down solution space for subsequent networked radar task planning, reduce the amount of calculation, and improve the efficiency of the proposed algorithm. Moreover, focusing on NSGA-II, the proposed algorithm was modified using self-adaptive operators and reinforcement learning. A dual-population strategy was introduced to allow exchanges of multiple individuals between populations during migration, and the number of individuals for the exchange was obtained through reinforcement learning. In this paper, five algorithms are compared and analyzed. In addition, statistical analyses are conducted from four perspectives: the average evaluation value of energy consumption and bandwidth in the Pareto front solutions, the time consumption of the algorithm, and the diversity of the population. The results indicate that among those algorithms, the reinforcement-learning-based RNSGA-II algorithm can produce the best outcomes for networked radar task planning.


I. INTRODUCTION
Target tracking and detection by a single radar is challenging [1,2]. If radars with different advantages are networked to work collaboratively, the target tracking and detection capability of the system can be significantly improved, thanks to performance advantages in systems, polarization modes, and frequency of different radars [3][4][5].
Resource allocation is a key factor in a networked radar system. Reasonable resource allocation enables the corresponding radar system to perform at the maximum observation efficiency at the least cost. To achieve the maximum overall performance of networked radar task planning in a complicated environment nowadays, the characteristics of different radars and the detection modes and constraints of each radar in a radar system should be taken into account [6][7][8][9][10].
For networked radar task planning, Chavali et al. proposed hybrid Bayesian network-based methods of task planning and power distribution are put forward for the cognitive radar network of multitarget tracking [11]. Mir et al. proposed a planning method for tasks with variable parameters and conducted radar task planning from two perspectives of task time and available resources required to execute the task [12]. Gu and Shi explored networking issues of shipborne search radars, proposed a general flow of radar station distribution, and discussed detection probability and geometrical dilution of precision of networked radars [13]. Zhang and Dong et al. analyzed the major techniques of radar networking and discussed the prospects of further development of radars [14]. For resource management problems of networked radars, a system resource allocation model with multi-index constraints was built by Yang to settle the problem of considering multiple functions simultaneously in the condition of performing routine tasks [15]. In addition, he proposed an optimal allocation method dependent on multi-objective particle swarm optimization based on crowding distance (MOPSO-CD) and a resourcescheduling algorithm based on reinforcement learning. Ren analyzed each procedure in a networked radar system and designed the contents and relationships of every submodule [16].
Networked radar task planning is primarily faced with the following technical problems: (1) The existing literature on radar task planning is based mostly on theoretical studies but rarely involves applications in practice. Networked radar task planning is even less frequently investigated or used, making it difficult for the current research to be applied in actual scenarios [17].
(2) The networked radar task planning can be abstracted into multi-objective optimization task allocation. Considering that the solution space exponentially expands along with increases in numbers of radars and tasks, this is an NP-hard problem [18].
(3) A traditional genetic algorithm that was easy to implement and featured with a simple expression form has been widely applied in radar task planning. However, it can be easily trapped into the local optimal when used to solve networked radar task planning with high complexity and multi-objective optimization problems. NSGA-II employs a fast nondominated sorting and elitist strategy. Both the accuracy and efficiency of the algorithm are enhanced compared to the traditional genetic algorithm; however, it still has poor search performance when facing large-scale problems [19].
This paper focuses on networked radar task planninggiving full considerations to various constraints-modeling, and algorithm design for networked radar task planning. An enhanced NSGA-II algorithm using self-adaptive operators and reinforcement learning is proposed. An optimum task planning is made and then further verified by experimental analysis through simulation. The experimental data indicate that the proposed algorithm is effective for networked radar task planning.
The preceding sections of the paper are arranged as follows: Section 2 introduces the design and modeling of networked radar task planning. Section 3 presents the proposed solution and algorithmic approach for networked radar task planning. The experimental design and result analysis are explained in Section 4. Finally, we conclude the paper and discuss potential future directions in Section 5.

A. PROBLEM ANALYSIS
A networked radar system is composed of multiple radar stations. In this system, a radar station can be a receiver or a transmitter, and the number of transmitters is approximately equal to that of receivers. When signals are sent from any transmitting node, all receiving nodes simultaneously receive the signals. It is assumed that a networked radar system has m transmitting nodes and n receiving nodes, as shown in Figure 1.

B. NETWORKED RADAR TASK-PLANNING MODELING
In essence, intelligent networked radar task planning is to pair and utilize all kinds of tasks and observation resources in various constraints. For different types of targets proposed by users, which form a set = { 1 ， 2 ， … ， } that through preprocessing involving plot filter, tracking filter, data association, and path prediction. In this way, a task set accordingly. Furthermore, pairing search or tracking is performed by reasonably allocating space-time resources (e.g., time of wave beam and beam dwell) and utilizing some nodes in a node resource set The association between them is shown in Figure 2.

FIGURE 2. A Mathematical Model of the Networked Radar System
During decision-making of scheduling for the networked radar system, the following constraints need to be considered: (1) Only when a task is executed in its complete time window can the task be deemed as fully executed; otherwise, the task is considered NOT executed.
(2) A task must be consecutively executed, and in the process of execution, it must NOT be interrupted.
(3) Start time of all tasks is unique.
(4) Mutually exclusive tasks DO NOT share detection resources, and such a constraint is expressed as follows: ∀i, j ∈ Task, T i ∩ ≠ ∅ → ∩ = ∅ (1) (5) Execution time constraint: Start time and end time of task execution by transmitting or receiving node must be within a range from the earliest start time of this task to its last start time, which can be expressed as follows: [ (6) Pitch angle constraint: A pitch angle of a task must be a pitch angle subset of transmitting or receiving nodes, which can be expressed as follows: . . ∈ . .
(3) (7) Course angle constraint: Azimuth angle of a task must be a subset of course angles for transmitting or receiving nodes, which can be expressed as follows: .
(8) Transmitting-receiving capability constraint of radars: A transmitting node must be able to transmit signals, and a receiving node must have the capability to receive signals. Such a constraint is expressed in the following formulas:

C. OBJECTIVE FUNCTION
(1) Fitness-value-evaluation function for energy consumption: The energy consumed by the networked radar system to fulfill a task reaches its minimal value, which is expressed as follows: The numerator of the evaluation value is a constant, whereas its denominator is equal to the sum of energy consumption of all nodes. The smaller the denominator, the greater 1 ( ) and the lower the total energy consumption incurred by all nodes executing tasks.
(2) Fitness-value-evaluation function for bandwidth: The overall bandwidth consumption of radar nodes reaches its minimum value on the premise of meeting relevant bandwidth constraints for respective tasks; that is, refers to bandwidth utilized by i th radar node. is the maximum bandwidth, and m and n represent the quantities of transmitting and receiving nodes of each task respectively. When bandwidth constraint is satisfied, a greater 2 ( ) corresponds to a lower bandwidth consumption caused by task execution.

A. NETWORKED RADAR TASK PLANNING AS A MULTI-OBJECTIVE OPTIMIZATION PROBLEM
At present, optimization of networked radar generally considers only a single factor, such as location, power, or time. In practice, however, a multi-aspect optimization needs to be conducted. Here, the multi-objective optimization model can be expressed as follows: − min ( ) = [ 1 ( ), 2 ( ), … … , ( )] (8) In Equation 8, min refers to the minimum value, indicating that each target of ( ) reaches its minimum value, and M represents the number of objective functions. An optimal solution implies that optimal values are obtained for all objective functions in all solution sets, which is nearly impossible in practice. Generally, a multi-objective optimization problem produces a Pareto front solution set. This study selected the task, resources, constraints, and the numbers of receiving and transmitting nodes as input to invoke the improved NSGA-II algorithm and substitute the networked radar task-planning model, obtaining the final planning scheme. Its overall framework design is presented in Figure 3.

B. Node-Selection Algorithm Based on a Greedy Policy
To perform multi-objective task planning for networked radar, preliminary radar node selection was carried out to reduce solution space and lower algorithm complexity. In this paper, several qualified node sets were selected from 200 nodes based on certain policies and the greedy policy, and the node sets were further used as node input for task planning. Specific node-selection policies are described below: Policy 1: Occupied or not, whether the device has been in service or not according to the attributes of radar nodes; . = (9) Policy 2: Whether with or without transmitting or receiving function; . ℎ = || . = (10) Policy 3: Whether the detection height is or is not in conformity with the detection height required by the corresponding task; .
Policy 4: The most compact principle, that is, to give preference to radar nodes in the vicinity with tasks; The flow of node selection has been portrayed in Figure 4:

1) CHROMOSOME CODING DESIGN
For multi-objective task planning for networked radar, 2-D integer chromosome coding is selected. Gene locus means that a certain radar node transmits or receives a task, as presented in Figure 5: Chromosome length is denoted as indicate that the i th task has m transmitting nodes and n receiving nodes. A chromosome is divided into two parts. Whereas the first half part is a set of transmitting nodes of a task (the length equals to ∑ =1 ) the second half part is a set of its receiving nodes (the length equals to ∑ =1 ).

2) CHROMOSOME EVALUATION
(1) Fitness-Value-Evaluation Function for Energy Consumption In the above equation, stands for the power of the i th node, for the time taken by the i th node to execute a task, m and n for the quantities of the transmitting and receiving nodes of each task, for the maximum power, and for the longest time required by task execution. The greater 1 ( ) is, the lower the total energy consumed by all nodes to execute tasks will be.
(2) Fitness-Value-Evaluation Function for Bandwidth where D i is the bandwidth used by the i th node, D max represents the maximum bandwidth, and m and n are the quantities of the transmitting and receiving nodes of each task. On the premise of meeting bandwidth constraints, a higher value of 2 (A) corresponds to a lower bandwidth consumed by task execution.

1) NSGA-II ALGORITHM DESIGN BASED ON SELF-ADAPTIVE OPERATOR
In this paper, a Pareto-based enhanced NSGA-II algorithm with dynamic crossover and mutation operators based on the crowding degree is proposed and called ANSGA-II, the smaller the crowding distance of individuals is, the higher the crowding degree will be. In this case, the involved individuals are much easier to mutate, but it is less likely for them to be selected as the crossover to generate the next generation. Through this method, the diversity of Pareto sets and populations can be improved, preventing the algorithm from being trapped in the local optimal or prematurity. The mutation of individuals in a Pareto set can be expressed as follows: The probability equation of crossover of individuals in a Pareto set is where refers to the mutation probability of the i th individual, is the crossover probability of the i th individual, is the default mutation or crossover probability of the algorithm, is the crowding distance of the i th individual, is the maximum crowding distance in the current Pareto set, and is the average crowding distance in the current Pareto set.
is a constant, ensuring mutation or crossover probability lies between 0 and 1; that is, 0 < < 1.

(1) Mutation Strategy
The mutation is performed to enhance chromosomes' diversity. In this paper, a mutation probability was designed to be self-adaptive. In the same Pareto set, the higher the crowding degree, the poorer the chromosomes diversity, and the more likely is a mutation to occur. Here, a multipoint mutation strategy is adopted, as shown in Figure 6: (2) Crossover Strategy The purpose of crossover is to keep a genre that performs well in chromosomes. In this way, some new individuals with good attributes can be effectively generated. In this paper, the chromosomes' crossover probability was set to be self-adaptive. In the same Pareto set, chromosomes with a low crowding degree are deemed as comparatively good individuals in this set, and it is much likely for them to be selected to perform the crossover, and the probability for such chromosomes to generate superior next generation is also greater, as presented in Figure 7: Two populations were set for the algorithm and then initialized randomly. Crossover, mutation, and selection of the two populations were independently conducted. Considering that the NSGA-II algorithm may fall into a premature condition in the early stages, migration policies were designed for the first 110 generations (the total number of generations: 200) so that m individuals can be exchanged between the populations to ensure internal diversity of the two populations. The number of individuals configured for exchange has a direct influence on population diversity, so the migration parameter m was trained by reinforcement learning in this study, making sure that the algorithm can maintain population diversity. The specific flow is shown in Figure 8.

3) REINFORCEMENT LEARNING ALGORITHM DESIGN
(1) State-space sets are listed in Table I:  ij is a diversity value of the j th generation in the i th population. A diversity value reveals the distribution uniformity of solutions in a population. The greater is, the more uniformly will the solutions be distributed. In this case, it is less likely for the algorithm to be trapped in the local optimal or prematurity.
(2) Action Space Design ( ) = ( − 1) + * (17) ( ) is the value of a migration parameter for the generation-t population; is the action space, and N represents the number of populations.
Three actions were designed here: keeping the value of the migration parameter unchanged, increasing the value of the migration parameter, and reducing the value of the migration parameter.
Once the population diversity value of generation t is above that of the initial generation, a reward of 0.5 is granted. However, if the diversity value of generation t is below that of the initial generation or remains unchanged, the reward values are -1 and 0, respectively.

4) RNSGA-II ALGORITHM BASED ON REINFORCEMENT LEARNING
The migration parameter m was trained by calculating population diversity values in each generation when the algorithm run; then, the obtained diversity values were compared with those of the previous generation for reward or punishment. From multiple results of repeated algorithm operation, a value of migration parameter m corresponding to the highest reward value was selected. The corresponding abstraction mathematical model was built as follows: In this paper, population diversity is defined in (22) Therefore, the lower V is, the higher D and will be. In this consideration, the diversity of solutions in a population becomes more uniform, which is beneficial for evolutionary computation to be independent of prematurity or the local optimal.
The flow chart of the reinforcement-learning-based NSGA-II (RNSGA-II) is shown in Figure 9.

FIGURE 9. A Flow Chart of RNSGA-II
In summary, procedures of algorithmic application can be described as follows: Step 1: Node and task objects are randomly created according to the characteristics of radar nodes and detection tasks, inputting numbers of transmitting and receiving nodes m and n.
Step 2: Parameter setting: Crossover probability , mutation probability , the number of iterations G, and population size N.
Step 3: Populations and are initialized through 2-D coding.
Step 4: Crossover and mutation are independently performed for two populations to generate two progenies and .
Step 5: In both populations, parental populations are combined with progeny populations , generating a new population , which is further subjected to nondominated sorting and crowding degree calculations; by a selection strategy, a progeny + is produced.
Step 6: If the exchange of individuals should be performed for the current generation, proceed to Step 7; otherwise, go back to Step 4.
Step 7: m individuals in respective populations are exchanged.
Step 8: Check whether the current number of generations G is the maximum value; if so, terminate the iteration; otherwise, go back to Step 3.

IV. EXPERIMENTAL VALIDATION AND ANALYSIS
To validate the accuracy and efficiency of the improved NSGA-II algorithm and the proposed multi-objective planning model for networked radar, comparative analyses were made on the following algorithms: ANSGA-II, RNSGA-II, NSGA-II, multi-objective particle swarm optimization (MOPSO), and multi-objective artificial algae algorithm (MOAAA).

A. EXPERIMENTAL DATA DESIGN
Six sets of data were constructed according to differences in the numbers of tasks and transmitting and receiving nodes to test the algorithms' applicability to and validity for networked radar task planning as a multiobjective problem. For tasks and radar nodes, the number of tasks was 5, 10, and 20 respectively based on the longitude and latitude of a real location. The numbers of transmitting nodes were 2 and 3 respectively for each task, and that of receiving nodes is 3 or 5 respectively for each node. Before node selection, there are 200 radar nodes, as presented in Table I:

B. ALGORITHM PARAMETER SETTINGS
Parameter settings for the optimization algorithm have a direct influence on networked radar multi-objective taskplanning results. Through analysis and comparison of taskplanning results based on different parameter values, different parameters were selected for the optimization algorithm, as shown in Tables III-VII.

C. ALGORITHM PERFORMANCE TESTING AND ANALYSIS
In this study, the self-adaptive operator-based NSGA-II algorithm is named ANSGA-II, and the reinforcementlearning-based NSGA-II algorithm is named RNSGA-II. The results of the six groups are comparatively analyzed. The migration parameter m was first trained 1,000 times through reinforcement learning; that is, the task-planning algorithm was invoked 1,000 times. Here, the threshold of m was set to 5 ≤ ≤ 25 . After the training results were averaged, they were sorted based on the average reward value. The first nine values adopted are listed in Table VIII: Based on the experimental result, the migration parameter was set to 16 (i.e., m = 16) for the dual-population strategy. Diverse average reward values corresponding to various migration parameter values are adopted to depict a line chart, as shown in Figure 10.

FIGURE 10. A Line Chart of Average Reward Values for Different Migration Parameter Values
By substituting each group of data in Table II into the algorithm model, maximum, minimum, and mean values of energy-consumption and bandwidth-evaluation values in Pareto front of the last generation were obtained and recorded for each group, together with the average diversity of populations and the time consumption by planning based on a specific algorithm. As can be seen in Figure 11, the task planning based on RNSGA-II features the highest time consumption but an optimal problem-solving effect, and the obtained objective function fitness value and diversity are the highest. According to Figure 12, MOPSO consumes the least time, produces an ordinary solving effect, and has comparatively low diversity and fitness values. Figure 13 demonstrates that ANSGA-II performs the best according to the bandwidthevaluation values in this group, which is slightly superior to those of RNSGA-II. In Figure 14, the problem-solving abilities of both RNSGA-II and ANSGA-II are better than those of NSGA-II or MOAAA, followed by those of MOPSO. As shown in Figure 15, the fitness values of NSGA-II and MOAAA are both lowered to a certain extent, so is the value of their diversity; RNSGA-II still occupies a superior position as far as diversity is concerned. Figure 16 indicates that data in this group are of the maximum computation complexity; the searching performance of MOPSO is the worst, while those of NSGA-II and MOAAA are comparatively superior. The optimal diversity can be found in RNSGA-II and ANSGA-II. In a nutshell, RNSGA-II and ANSGA-II proposed in this paper produce good searching results even under the conditions of large numbers of tasks and transmitting and receiving nodes.
For achieving more intuitive results, the same algorithm was selected to compare fitness values, bandwidth fitness values, the average diversity value, and energy and time consumption in the Pareto front. Moreover, relevant data were selected from Groups 1-6, as shown in Figures 17-20.

FIGURE 20. A Comparison of Average Energy-Consumption-Evaluation Values for Different Algorithms
Among all five algorithms, the time consumed significantly increases along with rises in computation complexity. Although the average diversity value of RNSGA-II shows a minor decline, it usually remains the same. The diversity of another four algorithms is inferior to that of RNSGA-II. Furthermore, ANSGA-II performs best in comparison with the remaining algorithms, except RNSGA-II. MOPSO consumes the least time but produces the worst problem-solving result. In the time-consumption comparison chart, time consumption based on data in the third group is higher than that of Group 4 because the complexity of data from Group 3 (number of tasks * number of transmitting nodes * number of receiving nodes = 75) is higher than that in Group 4 (number of tasks * number of transmitting nodes * number of receiving nodes = 60). On comparing the first group's data (number of tasks * number of transmitting nodes * number of receiving nodes = 30) and the sixth group's data (number of tasks * number of transmitting nodes * number of receiving nodes = 300), the diversity of NSGA-II, ANSGA-II, RNSGA-II, MOAAA, and MOPSO decreases by 36%, 18%, 10%, 23%, and 27% respectively. Thus, as the solution space increases, RNSGA-II is proved to perform best in maintaining diversity, followed by ANSGA-II. This indicates that, in contrast to other algorithms, RNSGA-II has the potential to produce good results along with increases in computation complexity. The first group of data is utilized to run ANSGA-II, and fitness values from different generations are presented in Figure 21.  The red and blue dots represent all individuals of Populations 1 and 2 respectively. All fitness values of energy-consumption evaluation range from 9 to 16 for the initial generation, averaging 11.59. Regarding those of bandwidth evaluation, they are proved to be between 20.1 and 40.9, producing an average of 27.22. For the last generation, fitness values of energy-consumption evaluation range between 9.9 and 27, averaging 17.56 and improved by 51.3% in contrast to that of the initial generation. Finally, fitness values of bandwidth evaluation are distributed from 25 to 60, with an average of 44.56, which is elevated by 63.6% if compared with that of the initial generation.
The experimental results demonstrate that ANSGA-II is feasible in networked radar task planning as a multiobjective problem.
Regarding the convergence of various algorithms, comparisons are made in Figures 23 and 24. According to the figures above, RNSGA-II has an optimal convergence. The constant fluctuations that occur in the figures are due to the incorporation of the dualpopulation strategy. Among migrations, m individuals are unceasingly exchanged between the dual populations. In this paper, a population from them is selected as a comparison object. The convergence effect of ANSGA-II ranks second, followed by those of NSGA-II and MOAAA, and by MOPSO with the worst convergence.

V. CONCLUSION
The paper focused on networked radar task planning as a multi-objective problem, presented the basic constraints that should be taken into account during planning, and summarized the technology foundations for radar task planning. Then, a multi-objective task-planning model was constructed for the networked radar system. The validity of the proposed model and algorithms was further verified through experiments. Moreover, the improved algorithm was applied in practice, and the following conclusions were drawn through analysis.
(1) RNSGA-II and ANSGA-II algorithms proposed in this paper generate stable multi-objective task-planning results for the networked radar system. With different numbers of tasks and transmitting and receiving nodes, both improved algorithms are feasible and effective in solving the multi-objective problem of networked radar task planning as computation complexity increases.
(2) When the numbers of tasks and transmitting and receiving nodes are increased, the mean energyconsumption-evaluation value and the mean bandwidthevaluation value in the Pareto front of the improved algorithms are higher than those obtained by NSGA-II, MOAAA, or MOPSO. Moreover, the improved algorithms also perform better in searching performance and population diversity in comparison with NSGA-II, MOAAA, and MOPSO.
(3) The time consumed by RNSGA-II and ANSGA-II is higher compared with the time consumed by NSGA-II, MOAAA, or MOPSO. However, searching performance and population diversity of RNSGA-II and ANSGA-II algorithms are significantly enhanced. Specifically, the time consumption of ANSGA-II is similar to that of NSGA-II. RNSGA-II is featured with the highest time consumption, but the optimal searching performance as a dual-population strategy and reinforcement learning were incorporated.
In light of this paper, further research can be done on the following aspects. First, the multi-objective task-planning model should be refined for the networked radar system. For instance, it is of great significance to propose a more universal task-planning model considering the differences in constraints and target objects of heterogeneous radar nodes. Second, the multi-objective optimization algorithm needs to be further modified. In this study, only two objectives were taken into account. In future application scenarios, more objectives may be introduced; in that case, considerations should be given to more effective multi-objective optimization algorithms and more complicated constraints and improvements. Finally, the improved algorithms have long time consumption, and due to the parallelism of chromosomes, parallel computing may be selected for subsequent improvements to reduce the time consumption of the algorithm. DONGCHENG LI received the BS degree in computer science from University of Illinois at Springfield and the MS degree in software engineering from the University of Texas at Dallas. He is currently working toward the PhD degree at the University of Texas at Dallas. His research focus is on search-based software testing and intelligent optimization algorithms.
QIANG HOU obtained a bachelor's degree in computer science and technology from China University of Geosciences (Wuhan), and also received a master's degree in computer technology at China University of Geosciences (Wuhan). His main research direction is intelligent dispatching and information engineering.
MAN ZHAO is an associate professor of School of Computer, China University of Geosciences (Wuhan). The research field is artificial intelligence and engineering intelligence optimization algorithms. Engaged in research on complex resource scheduling problems such as satellite mission planning, ocean observation mission planning, UAV coordinated mission planning, and space debris observation mission planning.
ZHIMING WU received the bachelor's degree in computer science and technology from China University of Geosciences (Wuhan) and is currently studying for a master's degree in computer technology from China University of Geosciences (Wuhan). His main research direction is intelligent dispatching and information engineering.