Impact of ICT Latency, Data Loss and Data Corruption on Active Distribution Network Control

The ongoing changes in modern power systems towards increasingly decentralized systems render the coordination of generation assets and the corresponding dependency on Information and Communication Technology highly relevant. This work demonstrates the impact of three types of ICT errors, namely delayed data, data loss and data corruption, on the control of distributed energy resources in an active distribution network. The settling time of the active power response at the interconnection point between the distribution and transmission system is investigated in the simulations. Additionally, two fallback strategies to mitigate the impact of data loss are proposed and evaluated with regard to their impact on the controller’s response. Finally, a generalized, aggregated service state description is proposed in order to capture the performance of the active distribution network service. It is meant to improve the interpretability of the results, which can be used to compare service designs and setups.


A. MOTIVATION
Power systems are undergoing fundamental changes with, among others, a significant impact on the level of decentralization. Government incentives lead to the installation of many Distributed Energy Resources (DERs) while large conventional power plants are being decommissioned, and both these trends are expected to continue [1]. This transition, from a few easily controllable central power plants to a decentralized system with many DERs that usually lack any means of direct control by system operators, has implications for the stable operation of the power system. Transmission System Operators (TSOs) control power generation and power flow The associate editor coordinating the review of this manuscript and approving it for publication was Qiang Li .
to keep generation and consumption in balance as well as to prevent and manage potential congestions in their networks. Currently, this option is primarily enabled by conventional plants that offer ancillary services in the form of operational flexibility to the TSO [2]. These ancillary services are more challenging to realize in a decentralized power system. On the one hand, many generation units are connected to low voltage (LV) or medium voltage (MV) levels that are controlled by Distribution System Operators (DSOs) and not directly by the TSO. On the other hand, the large amount of DERs requires more sophisticated coordination in order to maintain the required levels of performance and efficiency of ancillary services [3]. A prominent approach to decrease the complexity of coordination is to cluster DERs, which is typically done by defining distribution networks as clusters. From a topological perspective, this is due to the fact that the majority of DERs are located in distribution grids. A TSO could then, for instance, request a specific power flow from any cluster while each cluster's dedicated controller would then coordinate its assigned DERs accordingly. Following [4], this concept of a distribution grid acting as a flexibility-providing cluster of DERs is labelled as an Active Distribution Network (ADN). Hence, when it comes to the provision of ancillary services, centralized conventional power plants could potentially be partially replaced by ADNs. These ADNs can respond within seconds and thus potentially provide the following ancillary services: • Redispatch • Frequency control reserve (via the provision of active power control by the underlying ADNs) • Voltage control in Transmission Systems (via the provision of reactive power control by the underlying ADNs) With regard to the timescales of various phenomena in power systems presented in [5], the ancillary services mentioned above primarily unfold on the same timescale as electromechanical phenomena, i.e., [0.1-100]s. The provision of frequency control, for example, is a power system service with similar timescale characteristics. This means that both electromechanical disturbances and corresponding remedial actions typically settle in under 100 seconds.
The coordination and control of DERs in an ADN can either be done by using control signals transmitted via the communication network or by using autonomous DER-controllers that act based solely based on local measurements [6]. Although the latter does not have ICTinduced risks, in comparison with the communication-based design, it also lacks coordinated decision-making and flexibility. In the context of the paper at hand, the controllers, as well as the communication network together, constitute the Information and Communication Technology (ICT) system. Detailed requirements for the corresponding ICT system are well-known for high-voltage (HV) grids. In contrast to that, the dependency of MV and LV grids on ICT is a critical topic [7]. The low ICT penetration on these grid levels implies that only few data is generated and transmitted. However, the increasing penetration of ICT in power systems in the future can lead to potential new risks due to controllers' dependency on communication as well as the propagation of disturbances between the domains [8].
As with any complex system, faults in ICT systems (e.g., breakage, hardware malfunction, software bug) are inevitable, and numerous [9]. Analyzing the impact of all possible faults on the application is practically impossible. In this regard, [10] discusses the aggregation of faults in ICT systems in terms of the resulting errors. Most commonly known ICT faults can thus be analyzed considering the three error categories delayed data (latency), unavailable data (data loss), and data corruption. For example, the failure of a sensor may cause a loss of corresponding measurements. Similarly, noise interference or even cyber-attacks may cause data corruption in both measurement and control data.

B. RELATED WORK
In order to coordinate a large number of DERs in distribution grids, many automated control methods have been researched to regulate the power flow at the interconnection point (IP) between two voltage levels, especially at the IP between distribution and transmission grids [11], [12]. Basically, control methods can be differentiated by the timescale of operation. In [11] and [12], a management system and a framework based on optimization, with a focus on the interface between the distribution and transmission system, is proposed. In both approaches, the control of distribution grids is based on scheduling DERs with a resolution of 15 minutes, and therefore ICT latency is less critical for these concepts. Hence, these concepts are not elaborated in the work at hand.
The concepts in [4], on the other hand, enable fast real-time control of active and reactive power flows between voltage levels by controlling a large number of DERs connected to the distribution system. The work shows a hierarchical control scheme that focuses on controlling active power at the IP between extra-high voltage, HV, and MV systems. Despite some structural differences in these control concepts, the approaches are equally reliant on an ICT connection. These fast real-time control concepts include ICT systems to broadcast information from the controller to the DERs.
Analyses about the impact of ICT errors on ADN control are not available in the literature so far. While some publications exist on communication latency in smart grids, they typically relate to other specific smart grid functions and neglect other forms of ICT errors besides latency. For example, in [13] and [14], the impact of communication latency on bus voltages in centrally controlled microgrids has been investigated, and [15] and [16] demonstrate the positive impact of software-defined communication networks on the critical sensitivity of multi-agent-based distribution grid control towards ICT latency.
In [17], the ICT requirements of several smart grid applications are considered. The potential communication technology performance with regard to data loss and latency is assessed on a high level, while the exact impact on the said applications is out of scope. Furthermore, in [18], the impact of latency and packet loss on balancing generation and consumption in smart grids is demonstrated. The research in [19] and [20] shows the impact of data corruption (in this case, intentional manipulation) and data loss, respectively, on smart grid applications. It is argued that, in the case of data corruption, the control room operator could potentially get incorrect awareness of the system. This could then result in incorrect control actions. An analysis of all three ICT error categories and their impact on the exchange of synchrophasor measurement data and the wide-area monitoring application is made in [21]. Data loss can impact the stability of the smart grid and its economical operation. In [22], the impact of data loss on the cost of power supply is investigated. It shows that data loss can lead to an incorrect power demand estimation, leading to incorrect planning.
In [8], the operational states of the ICT-enabled grid services were introduced as a means to capture their performance in a generalized manner. The authors in [7] and [23] discuss the operational state classification for two services, namely state estimation, and on-load tap changer control. Additionally, they also analyze the impact of state degradation of these services on the interconnected power system. Although these research works presented the states of services, the impact of the ICT error categories, particularly data delay and data corruption, has only been discussed on a conceptual level but has not been analyzed or simulated. Consequentially, detailed thresholds for state classification based on these error categories are also not covered.

C. CONTRIBUTION
This paper investigates critical interdependencies between future power systems and ICT as described in [10] and [7]. It does so by investigating the impact of three ICT error categories, i.e., delayed data (latency), unavailable data (data loss), and data corruption, on an ADN controller's response. Particularly, the active power control of an exemplary ADN is analysed based on root mean square (RMS) simulations. The preceding work [24] considered the impact of increased ICT latency. The present paper extends this by analyzing the impact of all three ICT error categories on the behavior and stability of DER clusters based on an exemplary ADN control from [4]. For this purpose, the settling time of the active power response at the interface of MV and high voltage (HV) grids is chosen as the key performance indicator (KPI). Additionally, two exemplary fallback strategies to mitigate the consequences of data loss are analyzed. Finally, the results are used to identify concrete thresholds for three operational service states (i.e., normal, limited, and failed) in order to summarize the performance of the ADN control service as described in [8].
The paper is structured as follows: After an initial literature review in Section I-B, the adapted ADN simulation model is described in Section II with special attention to the adaptations over [24] for implementing the new ICT error categories. In Section III the influence of latency, data loss, and corrupted data on the behavior of the modeled ADN is investigated through time-domain simulations, partially under consideration of mitigating fallback strategies. In Section IV, the generalized service state description is introduced and demonstrated, and, finally, Section V concludes this work with a summary and an outlook.

II. SIMULATION MODEL
The following section describes the MATLAB Simulink simulation models used in this work to evaluate the impact of ICT errors on a control system based on [4], which initially was complemented by an ICT system for conducting data latency studies in [24]. In the paper at hand, this ICT system was further extended in order to also include data loss and data corruption, as well as two fallback strategies that are meant to mitigate the effects of data loss.
With regard to the aforementioned timescale of electromechanical phenomena in power system control, the 100s threshold was chosen as the indicator for a sufficiently fast response of ADNs in this work's context. Phenomena that unfold within one to ten milliseconds or even faster, on the other hand, do usually not rely on any means of remote communication or coordination and are, therefore, irrelevant in the context of the proposed ICT error categories.
The corresponding grid model, the simulation parameters, and the modifications that were required for simulating all ICT errors are outlined in this section.

A. STRUCTURE AND FUNCTIONALITY OF THE CONTROL SYSTEM
The control system enables the ADN to follow power flow setpoints at the IP to the next higher voltage level by controlling a large number of DERs. Fig. 1 shows an overview of the control system. The control system measures the power flow at the IP (P IP ) between the HV and MV grid. The measurements are transmitted via the ICT system, which results in a latency denoted by T meas . Based on the measured power flow P meas and a reference (or desired) value P ref , a PI controller determines a control output P Y,1 , which is then transmitted via the same ICT system to the DERs in the distribution grid. This latency is denoted by T con . The DERs then adjusts their active power output P DERs depending on the setpoint P Y,2 multiplied with an individual participation factor C DER , so that P IP follows P ref . Each C DER is set to its corresponding DER's installed capacity. For simplicity, it is assumed that all DERs have the same dynamic behavior, which is implemented by adding a delay with the time constant T DER and a first-order lag element with the time constant T PT1 . In summary, the total latency is described by T total = T meas +T con +T DER . The delay T DER remains constant, where T meas and T con are varied in the simulations.

B. SIMULATION OF DELAYED DATA
The ICT model for analysing the impact of latency on the performance of the control of ADNs consists of a measurement part and a control part, both of which are designed identically. The measurement part of the ICT system model consists of  three blocks -delay, moving average, and sample and hold, as described in [24]. The moving average block is described by Equation 1 and Table 1. The sample and hold block takes samples of the output signal of the moving average block with a sample time T sh . The variable T sw describes the sliding window of the moving average block.

C. SIMULATION OF UNAVAILABLE DATA
An extension was made to the control system model ( Fig. 1) in order to include varying data loss in the simulation. The control data is extracted at P Y,2 and manipulated if a generated uniformly distributed random signal is lesser than the desired data loss rate as shown in Fig. 2. In this case, P Y,2 is replaced by its previous value, and thus control data is not updated. If the random signal is greater than the data loss rate, P Y,2 is forwarded unchanged to the DERs. The random seed for the random signal generator is variable.

D. SIMULATION OF CORRUPTED DATA
Similar to Section II-C, the control system model in Fig. 1 was modified in order to include data corruption in the simulation. While noise interference would typically affect the transmitted (encoded) bitstream, in this paper, the basic raw information is assumed to be directly affected. Bitstream corruption caused by noise interference can easily be detected (e.g., via checksums) and retransmitted or prevented with forward error correction. In contrast, corrupted raw information is much harder to detect and filter, which is why this work focuses on corrupted raw information. Depending on whether data corruption is to be simulated on measurement or control data, P meas or P Y,2 is extracted, manipulated, and then fed back into the PI controller or the DERs. Fig. 3 demonstrates how corrupted control data is simulated. The corrupted measurement data is modeled in the same way but takes P meas as input. The extracted data P Y,2 is multiplied by a Gaussian random signal with a mean value of 0 and a variable standard deviation σ con . This is then added to P Y,2 , thus yielding a P Y,2,corrupt , which represents the corrupted data. For corrupted measurement data, σ meas is used instead. The random seed for the random signal generator is varied as required.

E. TEST SYSTEM AND SIMULATION PARAMETERS
The proposed control scheme is presented in [4] and is tested with the rural 20 kV SimBench benchmark network, which is described in [25]. The considered 97-bus distribution network is connected to a 110 kV transmission grid via one IP. Fig. 4 shows an overview of the network topology of this benchmark system. In this radial network, the upstream 110 kV grid is modeled as an ideal voltage source, and the loads are modeled as constant impedances. The DERs are depicted with the model from [24], which consists of a delay, a first-order lag, a limiter and a 3-phase static generator. The total load connected to the grid amounts to 16.79 MW, and the total installed generation capacity amounts to 50.61 MW. In the initial state of all scenarios, the total generation in the test system is 33.74 MW, so that the control system has 16.86 MW positive flexibility, and 33.74 MW negative flexibility available.
To investigate the impact of static latency on the dynamic behavior of the modeled system, both measurement and communication latencies are varied from 20 ms to 600 ms. For each simulation, the latency is assumed to be constant. The lower boundary of this range, the baseline latency of 20 ms, was chosen based on the average internet latency in Europe during the time of simulations. The upper limit of two times 600 ms was chosen based on a preliminary assessment that showed system instabilities and exponential growth of settling times beyond an aggregated static latency of roughly 1.1s. The analysis for unavailable data is run by varying the ratio of dropped control signals ranging from 5 % to 90 %. Finally, data corruption is analyzed by simulating noise levels in the range of σ = 0.001 to σ = 0.2 for both measurement and control data. For unavailability and corruption analyses, 100 different randomness-seeds are simulated, each with T con and T meas , both set to 0.02 s. The control parameters, shown in Table 1 are kept constant for all simulations. Furthermore, a baseline latency of 20 ms for collecting measurements and a further 20 ms for sending control data is assumed for these analyses.

III. SIMULATION RESULTS
This section presents the simulation results in terms of the dynamic active power response of the ADN controller measured at the IP to either load jumps or changed setpoints. The dynamic responses for the three ICT error categories, i.e., latency, data loss, and data corruption, are analysed using settling time T s as the KPI. The results aim to identify the thresholds of these ICT errors above which the ADN controller exhibits unstable behavior. Note that the red line in figures 6, 7 and 8, marks a T s of 30s, which is relevant later on in Section IV. For the exemplary result in Fig. 5, a +5 MW jump of the desired active power setpoint at the IP is simulated with varying total latency T total . This total latency, as shown in Fig. 1, consists of a static measurement latency (T meas ), a static control latency (T con ) and a fixed latency of the DERs (T DER ) which represents the DER's internal data processing delay and is assumed to be 0.01 s. T meas and T con are varied between different simulation runs. The green and blue curves show a stable behavior, where the oscillations settle to the P ref . Con-  trarily, the grey curve shows an unstable behavior, where the oscillations do not settle but rather grow. This demonstrates how increasing latency can lead to unstable control behavior of the ADN controller. Fig. 6 shows the T s corresponding to different jumps in loads and setpoints for varying total latency (T total ). In general, it can be seen that there is a positive correlation between the latency T total and the settling time T s of ADNs after a jump in load or the setpoint P ref . This correlation follows an exponential trend. The type (P ref or load) and the direction (positive or negative) of the jumps do not seem to impact the T s as already shown in the preceding work [24].
It is also found that the share of T meas in T total has a negligible influence on T s and, consequentially on the stability of the response. This resulted from two different simulations with a T total of 0.52 s but different values of T meas and T con . The first simulation was based on a T meas of 0.50 s and a T con of 0.01 s and for the second simulation these values were switched, while T DER = 0.01s for both cases. Only minor differences between these two simulation results can FIGURE 6. Impact of delayed data on T s . VOLUME 11, 2023 be found. The results also show that the size of the initial jump only has a minor influence on T s .
As mentioned in Section I, ADNs can be used to remedy dynamic phenomena with varying timescales. For instance, if the phenomenon of interest has a timing requirement of 10 s, the acceptable total latency threshold that would still result in a stable behavior would be around 0.7 s. Yet, if the timing requirement is 50 s, the acceptable latency threshold will increase to 1.1 s. In summary, the stricter the timing requirement of the dynamical phenomenon, the lower the acceptable latency threshold will be. T s also increases exponentially beyond a latency of 0.9 s. This shows the importance of considering latency when designing control schemes considering DERs for different dynamical phenomena under consideration of diminishing returns on investments in faster ICT. Fig. 7 shows the box plots for the simulations with varying data unavailability, which is modeled using varying loss rates. For each data loss rate, 100 simulations with different random seeds for the pattern of data loss are performed. The figure also shows the impact of two fallback strategies, which are proposed in order to improve the robustness of the ADN controller against data loss. Note that, in these simulations, a tolerance of 5 % is used to measure settling time T s , as is common in control theory according to [26].

B. IMPACT OF UNAVAILABLE DATA
It can be seen that, for the default case with no fallback (No FB), T s is almost unaffected up to loss rates of 50%, beyond which T s starts increasing. The reason is the high sampling rate used for this model, which implies that the changes in the system can be monitored sufficiently fast even if the initial measurement(s) are lost. At a loss rate of 85%, there is a relatively high increase in T s with more uncertainties, which can be seen from the spread of the outliers points. The system becomes unstable at a loss rate of 90%, where the settling time increases drastically. This shows that the high sampling rate cannot compensate for the high loss rates.
Two fallback strategies are proposed to potentially improve the controller's robustness against unavailable data. The first fallback strategy (FB1) is implemented inside the controllers of DERs. If a DER is affected by unavailable data, then the DER uses the current power infeed P DER as a new setpoint for the period in which the data is unavailable. This causes the DER to remain in its present state until new control data is available. The second fallback strategy (FB2) extends FB1 by adjusting the ADN controller. If wide-area unavailability of control data or measurement data is detected, the integral term of the PI-controller is blocked, keeping it in its current state. The integral term is unblocked again once new measurement or control data is available.
The impact of the two fallback strategies (FB1 and FB2) is also shown in Fig. 7. While FB1 can easily be implemented in the system, it only shows a marginal improvement in T s when compared to the default cause (no FB). At a loss rate of 90%, FB1 has a median T s of 58 s, while that of the default case is 70 s. The primary problem with FB 1 is the occurring wind-up effect in the integral term of the PI-controller if many DERs suffer from unavailable data simultaneously. FB2 counters this by locking this integral term during high data loss rates (i.e., wide-area data unavailability) and shows stable behavior in terms of T s for all loss rates up to 90%. When compared to the other two cases, FB2 has fewer outliers in all the cases, which implies a less uncertain controller response. It can also be seen that T s increases slowly with the loss rate. However, the drawback of FB2 is that it is harder to implement for unavailable control data as this would require the ADN controller to know or be informed about data not being received by the DER units. Gathering this information when the ICT system is impaired is challenging but can be solved by adding a heartbeat feature to the DERs. Nevertheless, considering potential data loss and corresponding fallback strategies during the design phase is a strong recommendation due to the improved stability.
With respect to the timescale of power system dynamic phenomena (see Section I), if the phenomena have a settling timing requirement of 30 s, a loss rate of up to 70% (without outliers) would be considered acceptable for the default case as well as with FB1; while 85% would be acceptable with FB2. Although the default and FB1 cases could still meet a settling time requirement of 40 s with 85% loss rate (median T s is around 9 s for both cases), there is an increased uncertainty as some outliers result in a T s of up to 88 s. For these two cases, the increase in T s is fairly low up to a loss rate of 70%. FB2 gives a guaranteed settling time response of up to 30 s even with a loss rate of 90% (although the median is much lower at 16 s). These results show the importance of considering data loss in the design of such a controller. The identified thresholds can be used by system designers to decide on the acceptable level of loss rate.

C. IMPACT OF CORRUPTED DATA
The simulation results corresponding to corrupted measurement and control data are illustrated in Fig. 8. Again, 100 simulations with different random seeds are performed for each considered level of corruption (in terms of standard deviation, σ ) for both measurements and control data. In these simulations, a tolerance of 50 % is used for measuring T s in order to avoid data corruption from exceeding the tolerance. It can be seen that for measurement corruption levels of σ meas ≥ 0.05, there is a drastic increase in T s implying unstable controller response. For corrupted control data, it can be seen that system becomes unstable only from σ con = 0.075. Although the median T s for σ con = 0.05 is around 6 s, there is increased uncertainties in the T s response as a few outliers extend up to 100 s.
Regarding the different timescales of power system dynamic phenomena (see Section I), if the phenomenon has a settling timing requirement of 10 s, the acceptable corruption thresholds σ meas = 0.01 and σ con = 0.025. However, with a requirement of 80 s, the threshold for σ meas increases to 0.025. In summary, the lower the timings requirement of the dynamical phenomena, the lower the acceptance level of data corruption for both measurement and control data. T s increases exponentially for σ meas ≥ 0.05 and σ con ≥ 0.025. These results also show that this ADN controller is more sensitive to corruption in measurements when compared to corruption in control data.

IV. OPERATIONAL STATES OF ADN CONTROL SERVICE
The simulation results analyzing the response of the ADN controller with regard to the three ICT error categories can be generalized into the operational states of the ADN control service. This improves the interpretability of the results and provides an overview of the performance of service for system operators and planners. The operational state of a service, presented in [8] and demonstrated in [7], can be classified as normal, limited or failed. These operational states denote the performance of the service.
As explained in Section I, ADNs can be used to provide various ancillary services in the power grid. Since the operational state definitions are specific to each service, the provision of frequency control reserve (containment or restoration) using the ADN is chosen as an example for the operational state classification. The service has a maximum T s requirement of 30 s [27], which is marked as the red dotted horizontal line in Figs. 6,7 and 8. The simulation results in Section III show that for certain ranges of data loss, data delay, and data corruption, T s of the ADN controller stays under 30 s, assuming that only one of these error categories is increased at a time while the other two remain unchanged. Consequentially, the ADN controller is said to be in the normal state within these ranges as its intended behavior can be guaranteed. While other ranges of data loss and corruption still result in a low median for T s , the intended behavior cannot be guaranteed as outliers are spread over a wide range exceeding the 30 s threshold for frequency reserve service. An example of this is the data loss rates of 80% to 85% in Fig. 7. With FB2 active, the service would stay in its normal state, but without any fallback strategy in place (no FB), T s may exceed the threshold. In these ranges, the ADN controller is said to be in the limited state, in which satisfying the 30s threshold can no longer be guaranteed. Finally, other ranges of data loss, delay, and corruption result in high T s with a median value over 30 s. In these ranges, the state of the services is said to be in the failed state. Regarding data corruption (Figs. 8), these ranges are with σ meas ≥ 0.05 or σ con ≥ 0.075. The service is also in the failed state for data loss rates of ≥ 90%, where the median T s is above 30 s threshold, unless FB2 is applied. The thresholds of the three ICT error categories corresponding to the three operational states with regard to the aforementioned simulation results are summarized in Fig. 9.
The goal of a robust ICT system design should be to ensure that service stays in the normal state by keeping the ICT errors within the aforementioned thresholds even in the presence of disturbances. Due to cost constraints, the thresholds could also be relaxed to allow for partial degradation to the limited state, especially in case of adverse disturbances. However, this should only be done for non-critical services, which do not directly impact the operation of the power system. The failed state should, however, be avoided as much as possible.

V. CONCLUSION AND OUTLOOK
This paper investigates the impact of three ICT errors, namely delayed data (latency), data unavailability (data loss), and data corruption, on an ADN controller. Dynamic simulations regarding the stability of a SimBench rural distribution grid control were done with settling time as the KPI for the ADN controller response. The simulation results show a significant correlation between adverse ICT errors and the behavior of the DERs in the ADN. An exponential increase in the settling time of the controller's response can be observed for the latency as well as data loss. Furthermore, the simulation results for settling time indicate a threshold from guaranteed to uncertain controller responses around standard deviations of 0.05 and 0.075 for corrupted measurement and control data, respectively. While these results are specific to the chosen network and controller design, the existence of such thresholds and the exponential correlation between data delay or data loss and the corresponding settling time of the ADN controller are relevant insights for analyzing the response of ADN controllers with respect to ICT errors.
The presented fallback strategies successfully mitigate the impact of data loss. Particularly, FB2 shows a strong positive impact, despite the difficulties in its implementation. This proves the added value of considering such fallback strategies in ADN controller design.
Finally, with regard to a more generalized, aggregated presentation of the sensitivity of a power system service towards latency, data loss, and data corruption, an operational service state definition for an ADN-based frequency control reserve service was outlined.
For future work, laboratory and field investigations to show the ADN concept and the influence of ICT errors on ADN control are recommended. Furthermore, the impact of latency variation (i.e., jitter) in ICT networks on the controller response needs to be investigated in detail.