Risk-Aware Service Routes Planning for System Protection Communication Networks of Software-Defined Networking in Energy Internet

Energy Internet (EI) is assumed to be the advanced stage of smart grids (SG), which is aiming to facilitate the utilization and sharing of energy through a highly ﬂexible and efﬁcient grid by leveraging renewable energy and Internet technology. To ensure coordinated operations of subsystems in EI, it requires the assistance of the high-performance system protection communication network (SPCN). As a promising networking technology, Software-deﬁned networking (SDN) is applied to the communication architecture construction of SPCN due to the beneﬁts of a global view of the SDN controller along with the programmability. At the same time, in light of the signiﬁcance of control services to EI and to ensure continuous and reliable transmission, dual routes are urgent to be preplanned so that the SDN controller freely switch control services from the primary routes to the alternate routes in the case of failures. However, the existing constructed routing strategies confront the conﬂict of network risk concentration as well as the requirement of service latency. Moreover, there are few studies considering both network operation risk and the latency requirement during routes planning. Hence, the problem of simultaneously minimizing the effect of network risk and potential delay of routes is formulated as a multi-objective optimization problem, which is NP-hard. We propose a novel risk-aware routes planning mechanism (RSRM) based on an evolutional algorithm. Extensive emulation results demonstrate that the proposed approach shows a better performance in different network topologies. At the same time, the metric of balancing risk of the network with RSRM achieves the minimum in comparison with other schemes while guarantees the near minimal service latency.

carried on the SDH-based private lines which are characterized by low latency and high controllability through the interface of 2Mbit/s [4]. However, such delivery mode is difficult to share real-time information among different devices because of no unified networks, and prone to cause the incoordination among different subsystems. Additionally, the factors of aging equipment, lower bandwidth utility and the inconsistency of processing latency in different infrastructures further limit its applications in the communication network of the EI. Thus, a high-performance communication network with a global coordinative ability needs to be constructed.
Since EI integrates multi-subsystem in a vast area, and the point-to-point distance may be up to hundreds of kilometers. Taking the AC/DC coordinated control as an example, the whole operation is limited within 300ms, which includes the failure detection time, communication time, the notification and confirmation time, the decision time, the power facilities action time. However, the latency in a single link of the SPCN should not exceed 8ms [2], [5], and thus the end-to-end latency of control services is a key concern in the construction of SPCN [2]. How to design a reasonable communication architecture to promote the routing strategy performance is still a problem, however.
The logically centralized control and the flow-based forwarding rules in the SDN paradigm greatly facilitate routing computation and network management [6]. In SDN-based heterogeneous networks, the minimal round trip time (RTT) is about 30ms by the method of PLC modem [7]. which motivates us to integrate the SDN paradigm with the communication construction of the SPCN. Besides, there is an ongoing trend of changing control services from the private lines to the SDN-based IP networks in smart grid communication networks [6], [7].
However, the design of a reliable, fast and robust routes planning scheme for control services in the SDN-based SPCN is still a challenge. On one hand, in light of the significance of control services to the stable operation of EI along with the N-1 security criterion [8], dual routes are urgent to be preplanned so that services could be freely switched from the primary paths to the alternate routing paths in case of failures. At the same time, services in the SPCN have the characteristics of aggregation, that is to say, they usually originate the executive stations, and then transferred to the master stations or the control center. Then it inevitably causes some critical links/devices to be overload and increases the operation risk accordingly.
Therefore, we focus on how to design dual routes for services taking into consideration network balancing risk as well as service QoS in the SPCN. References [9]- [12] proposed various multi-path routing algorithms based on energy efficiency, resource allocation or the variations of network topology without considering the impact of services. To the best of our knowledge, up to now, there are few studies about routes planning integrated service distribution with service latency requirement, we propose a novel risk-aware routes mechanism (RSRM) to ensure the deliverability and timeliness of critical services in EI. To summarize, the key contributions of our work are as follows.
1) Considering the nature of risk propagation, we combine the factors of electrical characteristics of the physical network with communication networking connectivity to design the node risk model and the edge risk model, respectively.
2) We formulate the risk-aware routes planning as a multiobjective optimization problem with the purpose of simultaneously minimizing the balancing risk of the network and the total end-to-end delay for all control services.
3) We propose a novel routing mechanism to solve the optimization problem in combination with the system operation. The effectiveness of the mechanism is validated in a field network with an intelligent algorithm. Simulation results demonstrate that it makes a better tradeoff between the balancing risk of the network and service QoS than other approaches.
The remainder of this paper is organized as follows. Section II reviews the related work. Section III describes the communication architecture of EI based on SDN and formulates the system model. Section IV provides explicit modules of RSRM. Section V presents the details of NSGAII. Simulation results and comparative analysis are presented in Section VI and this paper is concluded in Section VII.

II. RELATED WORK A. ROUTING ALGORITHMS IN OPTICAL NETWORKS
To guarantee service reliability in traditional power communication networks, the approach of self-healing ring based on link switch and multiplex section switch in SDH is popular for rapid restoration. Various ring-based protection approaches are studied in [13], [14]. However, latency discrepancies in bidirectional links restrict their applications in the sparse network. Moreover, the ring structure is prone to be affected by multiple link failures and accordingly increases service interruption probability and decreases service dependability.
At present, researchers concentrate on varieties of improved algorithms based on Dijkstra, Floyd, and Bell-Ford or other intelligent algorithms to obtain more reliable service routing. Considering the impact of the physical network on the cyber network, Dong et al. presented an approach of service routing construction on basis of the genetic algorithm for services in the cyber-physical power system (CPPS) [15]. Cai et al. proposed a routing planning approach for the overall services considering the average network risk and the risk balancing of the network neglecting service QoS requirements, as to control services, the above factors are critical to the stable operation of Smart Grid [16]. Zeng et al. analyzed communication service importance, the average risk of network and the balanced risk of network and then calculate the top k shortest paths for each individual service, afterward, the most balanced risk path in the network was chosen as the primary routing according to the approach of Min-Max [17]. However, the authors did not provide a definite way to determine the upper bound of k. Hence, it is hard to ensure the selected path is the global optimum.
Additionally, considering different modulation methods and spectrum fragment-aware, resource utilization, [18]- [20] proposed various of routing algorithms in the elastic optical networks, without considering balancing risk of the networking, however. Actually, the network balancing risk is not only related with the distribution of carried services but also the underlying network structure. Actually, the latter is the critical factor determined different services to be delivered between the source and destination pairs. In our early work [3], we proposed a risk balancing route planning scheme for critical services from the perspective of service, without considering the impacts of network connectivity and network topologies. Furthermore, due to the potentially dynamic risk propagation, it is highly important to combine the factors of network structure with service QoS when calculating the network risk.

B. ROUTING ALGORITHMS IN SDN
At present, there has been an upsurge in the study of routing algorithms with respect to the SDN paradigm. Aiming for energy-aware routing in the scenario of progressive migration from legacy to SDN hardware, authors designed SENAtoR algorithm to reduce the energy consumption of ISP networks [10]. Frequent interactions between the controller and SDN switch usually cause intolerant failure recovery time, to address this issue, [21] presented a fast congestion-aware rerouting approach in the scenario of the single link failure. All disrupted flows were aggregated to the same local reroute path and reduce the number of reconfigured forwarding rules during the restoration process. The authors formulated the problem as an ILP model and a heuristic algorithm was designed with the purpose of routes recovery and link congestion avoidance.
Taking the response time of routing requests, queue stability, and resource constraint into consideration, Hu et al. presented a tree-like hierarchical routing algorithm that employed the divide-and-conquer strategy to enhance routing performance for the distributed network [22]. To avoid silent failures, a routing method based on SDN analogous to the binary search was proposed in [12], where suspicious nodes and links are gradually pruned until the alternative route was found within the allowable delay. Reference [23] made comparisons of routing algorithms with static and dynamic link cost in the context of software defined networking. To avoid packet loss and long delay, [24] proposed a loop audit algorithm to solve the loop problem in SDN. To mitigate the problem of link congestion and inefficient bandwidth allocation due to manual configuration, [25] presented an intelligencedriven experiential network architecture for automatic routing in software-defined networking.
The above work concentrates on energy efficiency, resource allocation, characteristics of networking topology, etc. without considering service features in the SDN paradigm, however.

C. RISK RESEARCH IN SGs
In view of the dynamics and probabilistic property of natural disasters, to improve network resilience and reduce the number of services in communication networks resulting from natural disasters, [26] proposed a risk-adaptive preventive scheme to provide an appropriate level of protection. The Authors first studied the time-varying destructive characteristics of disasters and identified the high-risk level paths in advance, then dynamically adjusted rerouting decision parameters, after that services could be switched to more reliable routing paths to achieve proactive protection.
Aiming for the risk propagation in the smart grid, [27] leveraged the asymmetrical balls-into-bins allocation method and established the coupling relationship between the physical layer and the cyber layer in the CPS. The authors quantified the impact of risk propagation according to the percolation theory. However, they ignored the fact that the control center makes decisions in terms of the received messages from the cyber nodes (links), and performs monitoring and protection for relevant substations (nodes located) or power lines. Moreover, substations in the power system only provide energy support for fixed cyber nodes, and the node correlation in independent networks is determined. Thus, the random mapping approach has limitations to establish the coupling relation in the power CPS.
Different from the related work, we focus on reliable service routes planning based on service risk as well as service QoS, which plays an important part role in the smooth operation of EI.

III. PROBLEM FORMULATION
In this section, we first introduce SDN-based communication architecture for EI. Then, we analyze the node risk, edge risk of carrying services and the overall balancing risk of the network from the perspective of network connectivity and service characteristics. After that, we formulate the system model for SPCN based on risk

A. COMMUNICATION ARCHITECTURE FOR EI BASED ON SDN
EI is a typical cyber-physical system (CPS) [1], and it is featured by a wider coverage as well as long transmission distance. However, the leveraged distributed control architecture is inefficient to guarantee end-to-end QoS and emergency cooperation. This further restricts service deployment. Moreover, it has a poor ability regarding service collaboration, protection collaboration and operation and maintenance, which impacts the performance of entire communication network EI.
Software-Defined Networking (SDN) is an emerging network paradigm that decouples the software from the hardware devices. It can flexibly change or configure service routing dynamically according to service requirements. The logically centralized communication architecture facilitates to master the whole network status due to the global view VOLUME 8, 2020 of the network. Therefore, it enables to monitor the state of the network [28], resources allocation, path deployment and various intelligent decisions. SDN-based architectures applied for smart grid communications have been widely studied in [29]- [31].
In our early work [3], we have designed a layered SDN-based communication architecture for EI from the perspective of energy supply and user demands as shown in Figure 1 [30]- [32]. In the data plane, a wide area grid is constructed to enable the energy and information exchange among different regions. Various energy demands and energy supplies connected through end gateways are data requesters by SDN switches, which are sent to and received from the control unit, i.e., a local controller center (LCC). At the back end, a master control center (MCC) and the data unit (data servers) are deployed to analyze, control and monitor the data. The control plane provides flow forwarding rules, e.g., routing decision and access control strategy. A global SDN controller is responsible for conducting the following functions, topology discovery, network service management, routing management. The uppermost part is the application plane. As to EI, it refers to various service subsystems, for instance, wide area measurements (WAMS), distribution management system (DMS), meter data management system (MDMS). In the Fig 3, the bidirectional directions of control messages and data flows are specified in dark blue and light blue dotted lines with arrows. To easily understand, a dual routing path from LCC2 to LCC3 is provided in different colors.
For a clear illustration, all the notations appeared in the paper are summarized in Table 1.

B. SYSTEM MODEL
The SPCN is represented as an undirected graph G = (V , E). Various secondary SDN devices deployed in various substations or the control center are abstracted to the node set V .
where v s (k), v d (k) are the source node and the destination node of s k , and v s (k), v d (k) ∈ V . Moreover, different services are distinguished by service importance. The more important s k is, the larger I k has. I k is used to indicate the importance of s k . For the adjacency matrix X = [x ij ], x ij equals 1 if there is an edge between v i and v j , and 0 otherwise. The service route set is P S . The route for any s k can be represented as

1) NODE RISK MODEL
Since EI has a vast geographic coverage and spreads over several thousand kilometers, and thus it involves dozens or hundreds of secondary devices. Due to exposure to the harsh environment, the nodes are inevitably jeopardized for human damage or natural disasters.
Node risk describes the influence over the network due to the occurrence of node failure. It is associated with node fault probability, node importance, and carried service importance. The probability of node failure can be computed according to the following formula: denotes fault times of node v i in the unit interval T , which can be acquired from the statistics of the networking management system. T is the observation time (e.g., one year, one month).
In SPCN, most of the nodes are deployed in substations with different voltage levels, and the communication links are laid along the power lines. Therefore, they have some electric properties which are distinguished from those in the conventional communication network. For the convenience of illustration, we employ the node voltage level to represent the corresponding substation voltage where the node is deployed.
Generally, the higher the voltage level and the larger scale of the substation has, the larger risk it causes [33].
In the event of node or link failures, the impacts have potentialities to propagate in the whole network through neighboring nodes, which might further worsen the network performance. Therefore, network connectivity is a critical concern during the assessment of network importance. The statistics in [34] indicate that the smart grid communication network has features of complex networks. The node degree and node betweenness are commonly used to identify critical nodes in the complex network [35]. Here, the node betweenness is a ratio of the number of shortest paths traversing a node and the total number of shortest paths for any node pairs in the network. Accordingly, the larger betweenness a node has, it causes greater impacts on the network once failures occur. The betweenness for v i is formulated below [36].
where B v s,d denotes the number of shortest paths between nodes v s and v d . B v s,d (v i ) represents the number of shortest paths through node v i for services from v s to v d .
Besides, assume the risk impact propagates along the shortest path, and thus it is clear to draw the following conclusions: the higher degree a node has, the more devices connected to it, the larger propagation scope accordingly has. Furthermore, the node with higher betweenness indicates carrying more services, and thus impacts the network ever greater in the case of failures. To closely relate the degree of the node with the betweenness, according to [37], the unbiased betweenness is defined to describe the node centrality, which is formulated as follows.
where δ v i denotes the degree of the node v i . τ is the empirical value. Meanwhile, considering the importance of a node in the SPCN associated with the substation voltage level, the node importance is formulated as (3). For v i , there is where N v i represents the node importance. α 1 and β 1 are weight coefficients. They are used to adjust the node voltage level and the unbiased betweenness. Since the above indicators have different physical units (for example, substation voltage level in volt, and the unbiased betweenness is a dimensionless constant) and scale, namely, node voltage level is a large number. Thus, normalization is one method to deal with such problems, which normalizes these metrics to a united unit. Taking the node substation voltage normalization as an example where ξ min and ξ max represent the minimal and maximal voltage of voltage set in Smart Grid, respectively. Particularly, the control center is considered to be the most important node in the network. Additionally, service types and the number of services carried on a node (link) also have an obvious influence over the network in case of failure occurrence. In other words, the greater the number of services and the more important service carried, the larger risk it potentially causes. Service importance could be computed in the way of the analytical hierarchy process (AHP). Assume the service set carried on the node is S V i , then the node risk model can be formulated as follows [38].
Here, D v i , P v i and N v i represent node risk, node fault probability, and node importance, respectively.

C. EDGE RISK MODEL
Similar to the node risk model, the edge risk refers to the impacts on the EI due to the edge failure. It is related to the edge importance, the edge failure probability, and the importance of the carried services. Suppose the service set carried on edges is S E ij . The edge risk is formulated as follows where D e ij , P e ij represent the edge risk, edge failure probability. N e ij denotes the edge importance, which is related with the edge betweenness and the voltage level of two ends of a link. The latter relies on the higher voltage level of the two ends. Thus, N e ij can be computed as where α 2 and β 2 are also adjustable parameters like that of the node importance. Likewise, the edge betweenness is defined as where B v l,m represents the total number of the shortest paths between the node v l and v m . B v l,m (e ij ) is the number of the shortest paths passing through the edge e ij . VOLUME 8, 2020

D. OVERALL BALANCING RISK OF THE NETWORK
So far, there have been no definite indicators to measure risk in the network, and the two statistics, variance, and standard deviation are commonly used to describe the dispersion degree of risk. Here, we introduce the concepts of the balancing risk of the network to measure risk in the SPCN. Generally, the smaller the balancing risk of the network, the more uniform risk distribution in the network cause, and vice versa.
To compute the network overall balancing risk, the average risk of the network should be calculated. In general, the average risk of the network consists of the average node risk and the average edge risk, which are formulated as follows.
Accordingly, the overall balancing risk of the network can be expressed below:

E. SERVICE QoS
Considering the fast-forwarding capability of SDN switches, we ignore the impacts of the packet loss ratio and the bit error rate (BER). The evaluation of the end-to-end latency and service bandwidth requirements is our main concern.
F. ALLOWABLE END-TO-END LATENCY The end-to-end latency includes the processing time of the SDN controller, the average queuing forwarding delay in the switches, and propagation latency on links. Hence, the formula is shown in (11) where T p k is the end-to-end latency on route p for s k . c is a constant denoted the light speed in the optical fiber. l ij is the nodal distance from v i to v j . T PRO is the processing time for the SDN controller. T QUE is the average queuing forwarding delay. Let η s,k,p i,j be a binary decision variable such that: C SWIT is the number of switches in the route of p s (k), which can be calculated as follows.
Finally, the total end-to-end latency for all services needed planning routes can be represented as follows.
Furthermore, each service should be assigned a unique routing path. Thus, the constraint can be described as ω m=1 y mk = 1, s k ∈ S. (15) In addition, the paths should satisfy the cycle avoidance constraints.
G. BANDWIDTH CONSTRAINT The bandwidth of control services almost is 2Mbit/s. However, the total bandwidth of all services transferred a link should not exceed 60% of its capacity due to the N −1 security criterion. Thus, the following constraint for any link of the routing path should be satisfied and ζ is the link capacity.

H. DUAL ROUTE INTERSECTION CONSTRAINT
In theory, the primary route and the alternate route are physically independent, which correspond to the optimal and suboptimal route in the network. Here, we introduce the concept of dual route intersection which counts common elements between the primary routes and alternate routes except for a source and a destination. Particularly, if the SDN controller fails to allocate a fully physically disjoint dual route for services in certain network topology simultaneously, the dualroute with the minimal intersection could be planned as well.
Here, p a s (k) is the primary route for s k with the minimal endto-end latency, p b s (k) denotes the alternate route of s k with the minimal balancing risk of the network. s (k) describes the dual route intersection degree such that: As to per service s k , to separate the primary routing from the alternate routing as fully as possible, the intersection should satisfy the following constraint: In case of a batch of service requests, to obtain a rapid routing planning and satisfy the end-to-end latency for services as quickly as possible, a routing planning scheme for the entire services instead of the one-by-one method will be more feasible. Therefore, minimizing the total end-to-end latency for all services is one of the objectives. At the same time, services should be distributed on the networks as uniform as possible with the purpose of risk avoidance. Given the inequality and equality constraints, we define a vector to present the optimal solution: p s (k) * = [p s (1) * , p s (2) * , ...p s (k) * ] T , and formulate the optimization problem in terms of the literature [39] as follows: where p s (k) = [p s (1), p s (2), ...p s (k)] T , ∀p s (k) ∈ P S is a vector of k decision variables. The vector function f 1,2 (p s (k)) in (20) is the two objectives of the optimization problem. C1 ensures the end-to-end latency in route p s (k). C2 limits the maximal intersection degree of dual route to a preset threshold. C3 is the bandwidth constraint. t 0 , λ, ϕ and ζ denote the corresponding preset thresholds, respectively. C4 is the cycle avoidance constraint. C5 ensures a unique path for each service. According to the regulations of the smart grid communication system [40], the number of protection and control services on each edge is in the range of [6], [8]. Thus, C6 limits the maximal number of services on each edge. The above constraints define the ''feasible region'' M and any route set p s (k) ∈ M is a feasible solution.

IV. RISK-AWARE ROUTES PLANNING MECHANISM
The risk-aware routes planning mechanism (RSRM) has been proposed to conduct the system model. The triggers and the execution of this mechanism is illustrated in Figure 2. As shown in Figure 2, RSRM is executed because of some external events, e.g., fault occurrence, regular inspections. Firstly, data processing servers execute on-line data analysis and monitor service QoS. And then they evaluate networking risk. Once some abrupt traffic is detected, the optimization process is activated, which consists of two modules: system model establishment and model solution. iNetworking and System model has been established in Section III. Next, system model solution and system performance are becoming the focus of RSRM, which will be addressed in Section V and VI, respectively.

V. ALGORITHM DESIGN
According to the characteristics of the system model which have been illustrated in Subsection III.C, an evolutional algorithm, namely NSGAII (Non-dominated Sorting Genetic Algorithm II) is leveraged to handle with the multiple objective based route planning problem. We first analyze the established model in Subsection V.A. Then a brief introduction about the algorithm is illustrated in Subsection V.B. In addition, the specific procedures of NSGAII for RSRM are described in Subsection V.C.

A. MODEL ANALYSIS
It is obvious that the multi-objective function in equation (20) is nonlinear and nonconvex due to lacking continuity, increasing, and strictly convex [41], [42]. Moreover, multiobjective optimization with multi-constraint is proven to be NP-hard [33]. Various evolutional algorithms are exploited to provide the optimal or sub-optimal solutions through the generalized strategy aiming for the multi-objective optimization problem. Taking the genetic algorithm as an example, it first converts the multiple objectives into a single objective function by the method of linear combination, that is, scalarization, without considering the relationship of the objectives, however. To acquire the minimal total end-to-end latency, services tend to converge on the links belong to the shortest paths, and some links maybe are idle. This inevitably increases the overall balancing risk of the network. Moreover, the two objective functions in equation (20) are conflictive, and thus, the scalarization to combine multiple objectives with a single objective is not preferable.

B. ALGORITHM DESCRIPTION
NSGAII is widely utilized to handle the multi-objective optimization problem, which manipulates the entire population to iteratively improve the solutions [47]- [50]. Also, common genetic algorithm operations such as 'selection', 'crossover' and 'mutation' along with fast non-dominated sorting, elitist strategy are employed to improve the algorithm performance. For the convenience of the reader, some basic concepts, referred to as nondominant relation, Pareto front, are introduced below.
Definition 1 (Nondominant Relation): Suppose there are two routing planning r 1 , r 2 if they satisfy the following conditions: where m denotes the number of objectives. The routing planning r 1 dominates r 2 or r 2 is dominated by r 1 . This relation can also be represented as r 1 ≺ r 2 . If there is no solution dominates r 1 , it is defined as a non-dominated solution, namely, a Pareto optimal solution. The other dominated solutions are regarded as feasible solutions. The fast nondominated sorting approach reduces the algorithm complexity from O(mN 3 ) to O(mN 2 ), and N is the population size.

Definition 2 (Pareto Front):
Obviously, if the feasible solution x is the Pareto optimal solution for the multi-objective problem, it means that there is no other feasible solution dominated x. It is represented as follows: O is an optimal Pareto set, and F is a feasible solution set. The Pareto front is the representation of the Pareto optimal set in the objective space. The elitist strategy guarantees eminent parent individuals to be reserved in the optimization process meanwhile enlarges sample space and enhances population quality. Besides, the crowing distance guarantees the diversity of the population and the algorithm convergence. Due to the conflictive relation among the objectives, the planner can make a tradeoff among the Pareto front in terms of the requirements of the service and obtain the most preferable solution.

C. RSRM 1) CHROMOSOME ENCODING AND DECODING
The approaches of encoding and decoding for chromosomes are of significance to the algorithm performance. First, the nodes in the network has been numbered. To improve encoding efficiency, we adopt an integer coding method with invariable length. Here, every individual chromosome represents an integrated route planning solution for services. Therefore, it is an independent chromosome segment for each service whose length is related to the number of nodes in the networks. Each individual chromosome segment consists of random genes in terms of the total number of nodes in the network, and the gene denotes the superiority of the node whose number is in accord with the gene location. The entire chromosome is composed of different service chromosome segments. Therefore, the chromosome length is the product of the number of services and nodes in the network.
Concerning the decoding process, the source node for per service is considered as the first node in the route. The second node of the service routes can be decoded according to the gene location in a chromosome and adjacency matrix of the network topology, and repeat this process until the destination node, then a complete routing for any service can be gained.
In the encoder module, we show a simple example to illustrate the encoding process shown in Figure 3. For simplicity, suppose there are two services in Figure 3(a), s 1 (N 1 →N 4 ), s 2 (N 2 →N 6 ). A chromosome is randomly generated shown in Figure 3(b). As to the first node N 1 of s 1 , there are three neighboring nodes, N 2 , N 3 , N 7 . According to the encoding method based on position superiority, N 7 is the right next hop because of the largest gene location, 7. Subsequently, N 3 , N 4 , N 6 are the next hop of N 7 , and 5, 6, 2 correspond to their locations in each chromosome. Therefore, the routing for s 1 is N 1 → N 7 → N 4 . Similarly, the routing for s 2 is N 2 → N 1 → N 7 → N 6 . It is worth mentioning that every node should appear in the route path only once to avoid routing loops.

2) POPULATION INITIALIZATION
Firstly, the population is randomly generated. Then the chromosomes not satisfying the bandwidth and latency constraints are removed. After that, the initial population is obtained.

3) FITNESS FUNCTION AND NON-DOMINATED SORTING
The fitness includes the non-dominated levels and the crowding distance for every chromosome according to the multiple objective values. Individual chromosomes are ranked in terms of the nondominant relation in advance. The fast nondominated sorting can quickly separate the chromosomes at various levels and obtain the corresponding non-dominated levels, which makes the better chromosomes closer to the Pareto front. To maintain the diversity of the population, the calculation of crowding-distance is performed according to the local distance, which is determined by the chromosome and the other two adjacent ones with the same level.

4) SELECTION, CROSSOVER, AND MUTATION
The above three operators are the same as the conventional genetic algorithm. As to the 'selection' operation, the elitism strategy and the tournament approach enable us to reserve the excellent chromosomes for the avoidance of the optimal solution.
Crossover is also known as recombination which mimics the biological process of nature to generate a new individual. In the conventional genetic algorithm, it randomly chooses two chromosomes in population and generates a crossover location, and then exchanges the corresponding genes. After that, two offspring chromosomes are bred. In fact, there are a variety of crossover operations, such as multi-point crossover, partially matching crossover, cyclebased crossover, position-based crossover. It is important to determine the crossover location and exchanged genes for the crossover operator design. We adopt the position-based The first gene of the NewOddChrom, is Ogene(i).

6: else 7:
Compare the first gene of the EvenChrom Egene(i) Egene(j) is the first gene of NewEvenChrom.

12:
Repeat 3-11 to complete the remaining genes of OddChrom and EvenChrom. 13: Repeat 3-12 to the remaining chromosomes crossover operation proposed by Syswerda [50]. The crossover operation can be described as follow. Figure 4 shows an example of child chromosome segments through the position-based crossover operation. Concerning the mutation operation, it selects some chromosomes in terms of mutation probability and interchanges two genes within the service chromosome segment.
The algorithm and the corresponding flowchart shown in Figure 5 are provided as follows.

VI. PERFORMANCE EVALUATION AND RESULT ANALYSIS
In this section, we evaluate the performance of RSRM. Various network topologies, service deployment and parameter settings are provided in subsection VI.A. After Algorithm 2 Risk-Aware Routes Planning Mechanism (RSRM) Input: Service requirements set S, network parameters, τ, ϕ, population size, the maximal iteration, the mutation probability Output: A dual route planning scheme for services 1: Initialize network topology G = (V , E), and population P 1 . Solve route and delete chromosomes not satisfying the bandwidth and the latency constraint. Set g = 1 and calculate the objective values. 2: Execute 'selection', 'crossover' and 'mutation' to P 1 and generate a new population N 1 . 3: Merge P g and N g into R g . Calculate the objectives and fast non-dominated sorting, crowding distance calculation, Choose the best m chromosomes and generate the population of P g+1 . 4: Perform the operations of 'selection', 'crossover' and 'mutation' to P g+1 and obtain the population of N g+1 . 5: If g < g max , repeat 3,4. Otherwise, turn to 6, 7, 8. 6: Select the chromosome with the minimal total end-to-end delay for s k as the primary route planning in terms of the value of f 1 (p s (k)). 7: Calculate the service routing intersection degree of the primary routing planning and other solutions in the Pareto front. Store the candidate alternate routing in the set A if max _ s (k) ≤ ϕ. 8: Seect the alternate routing planning from A according to the value of f 2 (p s (k)).  Network topology for SPCN [45].

A. SIMULATION SETUP
The assessment of RSRM performance is carried out on a real topology, which is derived from the smart grid communication network of some province in China as shown in Fig. 5.
There are 17 nodes and 25 links in total. The average node degree is 2.94. Node 16 is the provincial control center, and 15, 17 are the municipal control centers. Nodes 1, 2, 3, 4 and 5 are 500kV, and the rest are 220kV. Here, the node voltage level represents the corresponding substation voltage level where the node locates. As control services usually originate from one substation to other substations and the optical fibers are laid along with the power lines, therefore, the corresponding network topology is leveraged the point-to-point method. To make a comparison, we compare the availability and feasibility of RSRM in NSFnet [16] and a real network with 29 nodes [51]. NSFnet has14 nodes and 21 links. The average node degree is 3. Node 9 is the control center. Assume that the voltage level of the node 5,12 and 14 are 500 kV, and the rest nodes are 220 kV. The distance of any node pair is randomly generated in the range of [300km,600km]. The real 29-nodes network has 47 links. Node 14 is the control center. The average node degree is 3.2. The voltage level of the node 5,20, and 29 are 500kV, and node 1,7,12,17 are110kV. The remaining are 220kV. Some emulation parameters are set as follows. Light speed in fibers is 2×10 5 km/s, and T PRO , T QUE are 0.01ms and 0.1ms, 0.85, respectively [52]. The intersection degree, ϕ is 3. According to [47], the initial services in SGCN are deployed as follows: 80% services deployed from 500kV substations or the control center, namely sources, and to the nodes with no more than five hops from the sources, or vice versa. The source and destination for the remaining services are randomly selected from ν. The bandwidth requirement is 2Mbps. Meanwhile, the results of the proposed approach are compared with the classical Dijkstra algorithm (DA) and the conventional genetic algorithm (GA). As to the genetic algorithm, the population size and the evolution iteration have the same settings as that of RSRM. Both the selection probability and the crossover probability are 0.9. The mutation probability is 0.05. Figure 7 shows Pareto front ratio change in the 17-node network versus the iterations to achieve convergence state when the number of services equals to 10. Here, the Pareto front ratio is the result of the number of chromosomes which the nondominant sorting level equals to 1 over the population size. It is observed that the average Pareto front ratio remains steady at 65% from the 145 th iteration, which demonstrates a better convergence property. Figure 8 illustrates the optimal solutions distribution in the population. It is found that almost all the optimal solutions are uniformly distributed in the Pareto front, which provides more available routing schemes for services according to specific requirements.

VII. RESULT ANALYSIS
Here, the chromosome in the purple rectangle is chosen as the primary route planning due to the minimal total end-to-end delay. Accordingly, the chromosome in the red rectangle is selected as the alternate route planning when the intersection degree is 3. Figure 9 demonstrates the Pareto front ratio versus the number of services with different network topologies. The results are the average values of 50 random experiments. This figure shows 95 percentage confidence interval. It is noted that the ratio of the optimal solutions in three network topologies decreases with the increase of services deployed  in the network. The reason is that when the number of services is small, there are more available routing paths, which leads to more nondominated solutions. With the increase of services, more reachable routing paths become unavailable because of various constraints in (15), therefore, the number of nondominated solutions decreases. At the same time, it can be seen that the network with 29 nodes has the largest number of optimal solutions among the three networks. This is due to the fact that the larger network scale and higher network connectivity lead to the larger search space for chromosomes, and thus it generates more candidate solutions that satisfy the service requirements. Similarly, Figure 10 depicts the average iterations to achieve convergence with the three topologies versus the number of services in 95 percentage confidence interval. In accord with the results shown in Fig. 8, the more the number of services is, the larger iteration to achieve convergence needs. Meanwhile, notice that the network with 29 nodes has the largest average iterations among the three topologies. The average iteration of NSFnet is close to the network with 17 nodes due to the similar network structure, that is, they have an approximate number of nodes, links and average node degree.
Various routing algorithms are widely used to conduct routing planning in smart grid communication networks [6], [13]. We compared the proposed approach with the shortest route strategy based on the Dijkstra Algorithm (DA) and Genetic Algorithm (GA). Since there is an obvious discrepancy between T TOT and D BRS , we normalize them through the approach of min-max in advance. Here, a linear combination has been adopted: f = αT TOT + βD BRS , where f is the objective of GA, α and β are the weight coefficient of T TOT and D BRS , respectively. we conduct 10 repetitions with different weight coefficient combinations to obtain the approximate optimization solution. Table 2 reports specific objectives of different weight coefficients in GA. As to this algorithm, the 10 th is chosen as the primary route planning due to the minimal total end-to-end delay. The seventh is selected as the alternate routing planning with the minimal intersection and the approximate minimal balancing risk of the network. The primary routing planning is gained from the Pareto front on basis of the minimal end-to-end latency, and the alternate routing planning is determined in terms of the preset threshold of intersection degree from the rest Pareto front. In other word, the intersection degree only impacts the choice of alternate routing planning. Figure 11 shows the balancing risk of the network in the alternate routing planning of 17-node network topology versus the number of services with different thresholds of intersection degree. Given that most of the control services in the SPCN usually start from the sub-station or the execution station and end at the master station or the control center or vice versa. It is observed that the balancing risk of the network increases with the increasing of the number of services. The increasing services result in the reduction of the available routing paths, the balancing risk of the network is higher accordingly. Furthermore, note that the balancing risk of the network increases faster when the number of services is less than 20. After that, the balancing risk of the network remains stable when ϕ is equal to 1,2, VOLUME 8, 2020 respectively. That is because some services fail to find the disjoint dual routes due to a small ϕ. However, this value continuously increases when ϕ is 3,4, respectively.
As bandwidth requirements of control services are 2Mb/s, we employ the number of services in each link to represent service distribution in the network. Figure 12 shows the service-link distribution in primary route planning with different algorithms. Obviously, the more services on a link, the larger risk of it has. Furthermore, more idle links and uneven distribution of services mean a lower efficiency of network resources. It is found that there are eight links carried one service, two services on nine links, and seven idle links in total with RSRM. In contrast, there are 12 links without any service meanwhile there are three links carried five services in DA. This is due to the fact that DA finds paths with the objective of the minimal end-to-end latency based on greedy strategy which easily leads to the centralized distribution of services. At the same time, the optimal scheme with GA is slightly less than that of RSRM owing to the subjective choice of weights. As expected, services distribution is more uniform with RSRM than the other two approaches.
Likewise, service-link distributions with alternate route planning are shown in Figure13. The results indicate that service distribution with GA is a little better than that of    RSRM. However, there are no obvious discrepancies as both of them aim to minimize the overall balancing risk of the network. Figure 14 depicts the change of balancing risk of the network for dual route planning with different algorithms. It is obvious that the risk in the primary routes planning with RSRM decreases by 8.24%, 29.83% compared to GA and DA. As to the alternate routes, it reduces by 4.95%, 57.48%, respectively, which demonstrates the effectiveness of the proposed approach. Furthermore, the balancing risk of the network in primary route planning with different approaches is higher than that of the alternate route planning. This is due to the fact that the primary route planning preferentially guarantees the minimal end-to-end latency requirements, and the alternate route planning is with the purpose of the minimal balancing risk of the network. Figure 15 demonstrates the total end-to-end latency change in dual routes. It is found that the total end-to-end latency in the primary route planning with RSRM approximates to DA while both are less than that of GA. As the primary route planning with RSRM is chosen from the Pareto front in terms of the minimal total end-to-end latency. Concerning the alternate routes planning, under the constraint of the minimal intersection, it is prone to allocate services in detour routes, which makes the total end-to-end latency increase by 32.08% compared to the primary route planning with RSRM. However, the total end-to-end delay with RSRM is still lower than that of GA.
For the convenience of the reader, the dual route for service s 1 with different approaches (the source and the destination are node 2,7, respectively) are shown in Figure16. We observe that the primary route of s 1 with RSRM is the same as that of DA, and the alternate route of s 1 is the same as that of GA. Additionally, similar to the multi-objective optimization based on quantum genetical algorithm proposed in [54] to distribute military resources, and multi-objective routing planning under the frame of reinforcement learning utilized to design a navigation system in a smart city presented in [55], the proposed multi-objective routing planning scheme aiming to simultaneously minimize the total end-to-end routing time and the value of balancing risk of the network is generic. That is, it is also suitable for other multi-objective problems without networking topology constraints. To test the effectiveness of RSRM and the consistence with the real situation, we conduct a series of experiments on the real network topologies and some classical topologies, which has been proven in the aforementioned experiments. However, some modifications need to be made according to specific requirements in different scenarios. For instance, after computing the node importance, failure probability, and service requirement parameters in terms of statistics and network connectivity, service routing planning can be obtained.

VIII. CONCLUSION
Due to the dynamic characteristics of the risk propagation, we combined the theory of complex networks with services to design preplanned dual routing schemes for control services against a single failure in the system protection communication network of EI. We first formulate the problem as multi-objective optimization, then the RSRM mechanism based on NSGA II is proposed, which is specialized to deal with multi-objectives with conflictive relation. Extensive emulation results demonstrated that such a mechanism guarantees service latency and reduces risk balancing of the network. Besides, NSGA-II has the O(mN 2 ) time complexity, which is associated with the number of services and population size. Therefore, with the expansion of the networking scale and the increase of service requirements, the explosive of the population scale may occur, which correspondingly greatly impacts on the efficiency of encoding and the algorithm convergence. In our future work, we will focus on improving the coding scheme and design more efficient routing algorithm to effectively satisfy control service requirements in the SPCN of EI. Supervisor with the Beijing University of Posts and Telecommunications. He is the author of more than 100 SCI/EI index articles. He presides over a series of key research projects on network and service management, including the projects supported by the National Natural Science Foundation and the National High-Tech Research and Development Program of China. His current research interests include network management and service management. He received awards and honors, include 13 national and provincial scientific and technical awards, including the national scientific and technical awards (second-class) twice.
LEI SHI received the master's degree in computer science from Uppsala University, Sweden, in 2006, and the Ph.D. degree from the University of Göettingen, Germany, in 2010. He was with Dell-EMC and Huawei Technologies. He was a Visiting Professor with Technion, Israel. He was also a Researcher with the University of Massachusetts, USA. He is currently a Lecturer with the Institute of Technology Carlow, Ireland. His research interests include network management and cloud computing. VOLUME 8, 2020