Assessing Interdependencies and Congestion Delays in the Aviation Network

Concerning air traffic delays, air transport networks appear to have variable performance and stochastic nature. A delay incident in one airport may affect the operational efficiency of others and generate various side effects to the whole aviation network. Flight delays are a widespread phenomenon nowadays, costing billions to the air transportation economy and degrading passenger’s quality of service. Dependency graphs have been proposed in the past to understand the delay propagation phenomenon and analyze such cascading events by using dependency chains. In this work, we propose a risk-based method to analyze interdependencies and congestions in the aviation network. The methodology and the developed tool can assess delay incidents in airports and produce weighted risk dependency graphs, presenting how a delay that occurred in one airport may affect other interconnected airports. Based on data collected from the US Bureau of Transportation Statistics, we analyze how flight delay risk propagates inside the aviation network. In addition, using historic flight performance data we provide predictions for flight chains, which are prone to delays. We implement a tool that can detect the most critical airports and congested connections based on their delay contribution in dependency chains. It also proposes the n-order dependency chains, which should be avoided by airline flight planners, to reduce delay impacts in the aviation network.


I. INTRODUCTION
The US Department of Homeland security identifies aviation as a critical subsector of the transportation system [1]. Aviation provides a swift worldwide transportation network, which generates economic growth and facilitates international connectivity, trade, and tourism [2]. With increasing globalization, the aviation industry has been growing at a fast pace, while on the other hand, flight delay problems have become a serious challenge degrading traveler's quality of service.
The United States is the world's largest aviation market, while future air transport growth requires improved traffic flow to reduce congestion [3]. High airport delays can cause negative impacts on several aspects, such as passengers, airlines, and the air transport economy. Delays impact the aviation industry's efforts to maintain high levels of customer satisfaction, increased productivity while maintaining The associate editor coordinating the review of this manuscript and approving it for publication was Poki Chen . disruption's resilience. Unnecessary flight delays are often the result of outdated technology and procedures, which cost the US more than 25B$/year [2].
A flight delay is usually reported as the late arrival or late departure of an inbound or outbound flight. It can be attributed to several reasons, such as air carrier or airport handling organizational issues, aircraft technical problems, extreme weather, air traffic control, security, etc. [4]. As a result, a propagated delay may occur due to interconnected resources, while the most important resource is aircraft which flies multiple flight legs, very often more than five flight segments per day [5]. Hence, a delay of an earlier flight can affect subsequent flights. Waiting for transit passengers from delayed connecting flights is also known to cause delays to upcoming flights [6]. Flight crew switches between aircraft may also cause further delays to the network [7]. For these reasons, a small initial delay may cause cascading effects, creating larger delays and inducing worse situations in the downstream flight connections. Thus, research on the mechanism of delay propagation is a challenging area.
Researchers have studied dependency modeling, simulation, and analysis of infrastructures extensively. Several methodologies and tools that focus on dependency analysis estimate the impact [8], [9] or the risk derived from the dependencies within a critical infrastructure or among interdependent infrastructures [10]- [14]. Risk usually depends on two factors: i) the likelihood (or probability) of a negative event occurring and ii) the impact (consequences) of that negative event, usually called a disruption. Such impact may result in incomplete operations (flight cancelations) or service degradation (flight delays) due to dependencies in infrastructure networks.

A. CONTRIBUTION
In this work, we use a previous time-based dependency analysis methodology for critical infrastructure dependency modeling [11], [12], to analyze delay risk propagation in the US aviation network. We apply the proposed methodology, and the developed tool in a dataset of commercial flight routes and delays reported per flight for years 2018-19, as provided by US Bureau of Transportation Statistics (BTS) [15]. Our contributions are: 1) A methodology able to analyze congestion in aviation networks, as follows: i) model aviation networks as dependency graphs; ii) assess the dependency risk of delay incidents between interconnected airports; iii) produces weighted risk dependency chains, to present how a delay occurred in one airport may affect other inter-connected airports; and iv) calculates impact and likelihood of delays in congested airports using various methods such as min-max algorithm, standard deviation timeframes, and statistical dynamic averages. 2) A comparison analysis of airplane arrival on-time performance data of US domestic flights as provided by BTS [15] for two consecutive years. Specifically, we provide a risk congestion analysis during summer months (July-August) for years  detecting the worst n th order dependencies and worst airports in terms of congestion delay propagation. Moreover, we detect the most congested paths, schedules, and airports to be evaded by flight planners, airline marketing managers, and other air transportation stakeholders. 3) A software implementation of the proposed methodology, which can: a. Indicate the flight connections with the highest delay risk for the period defined. b. Identify the worst n-order airport dependencies by calculating the overall risk of cascading congestions. c. Indicate airports that are frequently part of the worst n-order airport dependencies and introduce delays to the downstream flights d. Analyze what-if scenarios for the congested airport's connections.
e. Propose the n-order dependency chains, which should be avoided by flight planners, to reduce delay impacts in the aviation network.
To the best of our knowledge, a risk-based methodology and software implementation for analyzing flight interdependencies of congested aviation networks and indicating worst dependency connection chains has never been introduced before.

B. STRUCTURE
The remainder of this paper is structured as follows: in section II, related work in modeling infrastructure interdependencies and flight delay propagation are presented. In section III, the dependency analysis methodology is presented, while in section IV, the dataset details, which were used for airport congestion analysis, are explained. The results from the implementation of the proposed methodology and software tool are analyzed in section V. Finally, the conclusion and evaluation of results are exhibited in section VI.

II. RELATED WORK
During the past decade, modeling of critical infrastructures along with the flow of information and risks between them has been a major topic of interest in scientific research. This section summarizes models and methodologies already proposed and focuses on similar work on aviation networks and critical air transportation infrastructures.
Several approaches exist in modeling infrastructure dependencies and information flow. Generally, infrastructure modeling appears to be associated with simulation techniques and mathematical models, such as (i) continuous time-step simulation; (ii) discrete time-step simulation; (iii) Monte Carlo simulation; (iv) decision trees; (v) geographical information systems; and (vi) risk management tools [16]. According to Ouyang [17], critical infrastructure protection methodologies and tools are categorized as: (i) Empirical; (ii) System Dynamics; (iii) Agent-based; and (iv) network-based. Empirical models are based on historical events. System Dynamics utilize top-down methods such as stock and flows to manage and analyze complex adaptive systems with interdependencies. Agent-based approaches model components of infrastructures as agents and analyze agent interaction based on sets of rules, while network-based approaches model infrastructures as network graphs whose nodes represent infrastructure components.
Our approach is purely network-based, driven by historical data and applying a risk management model for simulating airport dependencies. A graphical dependency model of the worst performing airports in the US aviation network is developed. Airports are modeled as nodes, while flight routes between airports are portrayed as graph links, using the methodology previously presented for urban and maritime transportation networks to predict high-risk nodes and propose traffic congestion mitigation mechanisms [13], [14].
Academic literature on flight delays can be classified into three main categories: i) statistical models, which explore the effects of various components of travel time, ii) econometric models that analyze the economic drivers of flight delays, and iii) operations management models, which investigate the operational impacts of delays in air transportation.
Since the nature of air transportation is highly stochastic, different aspects of flight scheduling issues have been explored in the past. Several researchers have developed statistical models for forecasting different components of airtravel time.
Deshpande and Arıkan [18] analyzed empirical flight data to model total travel time distribution without dividing it into individual segments and developed a model of total travel time for all flights flown in the United States at all airports, using the BTS dataset. Wang et al. [7] used empirical statistics of departure delays to form complementary cumulative distribution functions (CCDF), along with transmission delay functions and proposed a novel approach to interpret big temporal data. Takeichi [19] proposed a mathematical model delay analysis to estimate the delay accumulation by using arrival delay statistics and nominal flight time optimization formulas for aircraft arrivals at an airport.
In econometric models, the impacts of various factors on the initiation and progression of propagated delay are quantified. Kafle & Zou [5] developed a joint discrete-continuous econometric model to reveal the effects of various influencing factors, considering the buffer time that airlines insert into flight schedules. As a result, they were able to quantify propagated, and newly formed delays that occur to each sequence of flights that an aircraft flies in a day.
Quantitative approaches such as statistical [7], [18], [19], and econometric [5] methods focus mainly on flight delays in a single airport. Furthermore, although the series of flights are taken into account by some [5], [19], others [7], [18] neglect the sequence of the predecessor flights from the upstream airports. In either way, the process of delay propagation needs to be analyzed from a broader and network-based perspective since flight scheduling for airlines and airport operations are oriented towards network performance optimization.
Pyrgiotis et al. [20] investigated how the delay is propagated and how delays mitigate daily airport operational efficiency and push more demands into late evening hours. The approximate network delays (AND) model which computes the delays to the 34 busiest US airports was thus introduced. Zhang and Nayak [21] used the multivariate simultaneous equation regression (MSER) model to study the impact of one airport on others and concluded that major airports have a higher impact on the average delay. Hao et al. [22] used the MSER model to quantify the impact of New York's airports on delays throughout the airport network. They concluded that the delays in NY airports, being analyzed by two different models, were lower than expected. Fleurquin et al. [23] developed the maximum connected subgraph of congested airports for assessing the level of delays across the entire system. Campanelli et al. [24] compared the modeling of the US and the European air traffic networks to assess the effect of delay disruptions and proposed how slot reallocation and swapping can mitigate flight delay propagation in the US aviation network. A comparative analysis of models for predicting delays in air transportation has been presented by [25], comparing the performance of different approaches to predict delays in air traffic networks. The authors consider three classes of models: i) dynamics network models using Markov jump linear system (MJLS); ii) classical machine learning techniques like classification and regression trees (CART); iii) Artificial neural network (ANN) architectures, utilizing classification factors such as time of day, day of the week, season, and previously realized delays.
Despite the advances in understanding flight delay propagation [20]- [25] few studies have investigated delay propagation by considering the interdependence relationship of delay. The proposed approaches [20]- [25] focused on the propagation processes between sequence flights and congestion in airports while ignoring actual relationships among airports, the network structure, and airport properties. Moreover, the factors used to analyze delay propagation are often limited in the sense that the delay time is calculated directly from the difference between the actual and scheduled travel time [26] or by analyzing arrival and departure delays times separately [5], [18], [20], [22]- [24] while others consider either departure delays [7] or arrival delays [19], [21], thus not representing the actual congestion delay of a flight caused by both origin and destination airports. Finally, approaches like those presented in [5], [18], [20]- [24] use multivariate simulation methods to analyze the social and economic impacts of delays in operational performance of airports examined. These methods are more complicated and need a variety of data to be provided by airlines, inside business information and airport authorities, to produce results.
Since the air transportation system is also a typical largescale complex system, the mechanisms of delay propagation are not fully understood, especially for the interdependencies of different airports, thus creating a growing interest in the inference of causal interactions in complex systems [27]. Du et al. [28] proposed a delay causality network (DCN), based on the Granger causality test to analyze the topological and temporal properties of the DCN and better understand the mechanism of flight delay propagation at the system level. The proposed approach in [28], considers delay propagation problem from the perspective of delay interdependences utilizing network analysis. While this method can capture the interaction patterns of delay, it is limited in identifying them between airport pairs. Also, the majority of studies regarding air traffic networks focus on using network graph theory to classify the topology of the network [29]- [31], while other network based theories use measures to consider the importance of individual airports [32] or focus on the optimization and efficiency of the network [33]- [35].
Our approach is different from the majority of studies regarding air traffic congestion delays since it introduces a risk-based approach to assess the inter-dependencies of delay propagation in the aviation network. The presented implementation similar to [13], [14], is considered to be a cyberphysical, deterministic, long-term optimization model that uses risk assessment, statistical analysis, and graph theory to promote decision making by proposing the n-order dependency chains, paths, and airports which should be avoided by flight planners, to reduce delay impacts in the aviation network.
Our model relies on congestion delays created by both origin and destination airports. Also, it uses all airports and all flights approach, similar to [18], [23], [24] while others focus on a single airport [19], [22], specific airlines [5], [7], or only the busiest US airport hubs [20], [21]. We use data exclusively from the BTS database to analyze historic data from a selected period of aviation traffic on US airports, while other methods [5], [18], [20]- [24] require data from airlines, inside business information, and airport authorities, to produce results. Utilizing statistical analysis, our model converts time delays to risk impact and frequency of delay occurrence into likelihood, calculating the risk of congestion for each flight route.
Finally, our approach utilizes network graph theory similar to [29]- [35]. While the aforementioned studies utilize graph theory either to classify the topology of the network, measure the importance of the individual airport, or for optimization, we utilize known graph theory algorithms for computing all possible paths of an aviation network, thus calculating all possible n-order airport dependencies, in order to assess the cumulative congestion risk of the aviation network dependency paths.
From the related work presented, prediction methodologies are developed in [20], [22]- [24], [28] while none of them produces a projection of worst dependency chains of flight connections like our model does. Understanding flight delay propagation is a hard problem while few studies have investigated delay propagation by considering the interdependence relationship of delay time-series, and none has ever explored the delay propagation and congestion analysis through airport chains (n-order airport dependencies) in large-complex aviation networks. Therefore, our efforts to produce dependency graphs, to assess congestion risk and delay propagation to interconnected airports, introduce a new approach to this stochastic problem. To the best of our knowledge, there is currently no solution able to calculate the individual risk (in terms of delay impact and the likelihood of a flight delay to occur) of interdependencies between airports in wide aviation networks, along with the cumulative risk of n-order airport dependency chains.

III. DEPENDENCY ANALYSIS METHODOLOGY
Our methodology expands a previously presented multi-risk dependency analysis, developed in [13], to model the traffic flow of automobiles in the UK transportation system. The same team has also analyzed congestion interdependencies of ports and container ship routes in the maritime network infrastructure in [14]. Our implementation is based on the CIDA tool [36], which we modified and expanded in functionality to model and analyze aviation congestion interdependencies between airports.
The methodology uses five fundamental building blocks: A. An algorithm that models historic air traffic data into a dependency airport graph. B. A congestion delay calculation methodology for aircraft flights. C. A likelihood calculation algorithm for graph connections. D. An impact calculation algorithm that uses two different methods for calculating the delay impact. E. A multi-risk dependency analysis methodology for assessing the risk of the graph's dependency paths. Each building block is briefly presented below.

A. AIRPORT DEPENDENCY GRAPHS
In this methodology, we denote as: • A : the set of airports in the aviation network, • C : the set of connections between airports, and • F : the set of flights among airport nodes for each connection (i.e., aircraft flights that connect these airports). Dependencies are modeled in directed, weighted graphs G = (V , E), where the nodes V represent airports of the network system and edges E represent connections between them (Fig. 1). The graph is directional to represent a flight dependency from one destination to another within the aviation network. An edge A x → A y depicts a connection C x→y from Airport A x to Airport A y , and each connection is related to many connecting flights F i , performed by various commercial carriers, using different aircraft types.
Daily, from every airport A x scheduled flights depart from A x or arrive to A x , based on the submitted Flight Plans (FP) and commercial carriers' Computerized Reservations Systems (CRS). Each aircraft has a unique Tail Number (TN) that indicates the aircraft's type/model, load capacity, and speed performance. It usually serves several flights to many airports in a single day. By modeling all possible connections, we create chains of multiple flight legs, forming n-order dependencies A 0 → A 1 → A 2 → . . . → A n of connected airports.

B. CONGESTION DELAY CALCULATION
Based on FP and CRS, every flight has a predefined departure time and an arrival scheduled time. Arrival performance VOLUME 8, 2020 is based on the aircraft's arrival time at the airport's gate (in-block time). Departure performance is based on the aircraft's departure time from the gate (off-block time). We define as congestion delay, the arrival delay that exceeds 15 minutes from scheduled in-block time, caused by any delay cause (air carrier, extreme weather, NAS, latearriving aircraft, etc.).
For each tail number TN, performing a connection flight F i , congestion delay is split between arrival and departure part. For every flight,F i we calculate arrival congestion delay CA and departure congestion delay CD as follows: Each relationship is assigned with a likelihood value, which declares, how likely the route is to be delayed or congested. Intuitively, this value is a probability, based on which we can make predictions about each airport's congestion state, at different times. For each connection route C x→y from an airport A x to an airport A y every flight F x→y is rated either as ''Good'' or as ''Bad'' as follows: L x→y refers to connection likelihood, and it is calculated based on all individual flights Fi, departing from airport A x and arriving at airport A y .

D. IMPACT CALCULATION
Each flight F x→y is assigned with an impact value. This metric asserts how severe the congestion delay is and how much it will affect a flight connection's punctuality and airport's operational efficiency. Since there is no standard available for evaluating arrival delay time and social-economic impact level to passengers in the US, we proposed two different methods for impact calculation. The first (Min-Max Method) is proportional, by defining for each connection the min and max delay performance and then rescaling this range to a 10-steps scale with equal size of time frames. The method evaluates the relational delay performance of each connection. As a result, the upper and the lower range limits are evaluated based on actual performance data of all flights in this connection during the period examined. For example, if all aircrafts have arrived on-time while flying this connection route or have never been delayed more than one hour, the impact is scaled to take its max value for this onehour delay time. On the other hand, for a connection where flights are always on delay (sometimes more than 15 hours delay), we evaluate impact based on min and max delay's deviation that takes place for the specific connection. As a result, in this method impact values are not assigned with fixed delay intervals and maximum impact may represent different delay performances.
The second method (Standard deviation timeframes method) is based on specific deviation time intervals, identifying with the lowest value for impact (I = 1) when flight arrival is on-time, while impact gets its higher rating when arrival delay exceeds 900 min (or being late for more than 15 hours). In this impact rating, delay timeframes are in accordance with traveler's tolerance against delays, because as the duration of flight delay increases, passenger dissatisfaction intensifies. However, this evaluation method exerts the same objective criteria for all flights performed in the US domestic aviation, based on the deviation in minutes from the scheduled arrival time. Since air travelers use aviation as the fastest transportation mean, they expect to reach their destination on scheduled time. So, while they may tolerate delays less than 1-2 hours, higher delays may impact the airline's reputation, while economic liabilities may be claimed by dissatisfied passengers. The following paragraphs describe in more detail the impact calculation of the proposed methods.

1) MIN-MAX METHOD
For each connection C x→y from an airport A x to an airport A y a congestion delay may occur, during either departure or arrival phase, and its impact is calculated based on the best and worst-case congestion delay of each connection, as following: (3) Min and Max Departure Congestion Delay are calculated taking into account all flights departing from airport A x to airport A y . The same definition applies for Min and Max Arrival Congestion Delay, considering all flights arriving to airport A y from airport A x .
Based on MinDelay x→y and MaxDelay x→y values, an impact scaling is calculated in equal timeframes, ranging from 1 to 10, where the maximum impact denotes the maximum occurred congestion delay. For each connection flight F x→y , we assign an impact value based on its CongestionDelay x→y impact range.

2) STANDARD DEVIATION TIMEFRAMES METHOD
We calculate the delay impact on standard deviation timeframes between actual arrival time and estimated CRS arrival time (Tab. 1). If the arrival delay is less than 15 min, the flight is considered on time and the impact gets its minimum value (I = 1).
On the downside, if a flight arrives later than 900 min (i.e., delay >15 hours) after its scheduled arrival time, the impact takes its maximum value (I = 10) indicating an unacceptable delay situation for air transportation service.
The interval steps are presented in Tab. 1. Due to the lack of standards for setting the arrival delay time intervals, we divided the arrival delays up to 180 minutes to the first half of the impact scale. After 3 hours delay, compensation liabilities may arise, depending on current aviation law. The time intervals thereafter are increased by a multiplier of 1.5 until the maximum impact value is reached.
No method takes into account cancellations, although a flight leg may be canceled, due to previous congestion delays. Having calculated impact for each individual flight, we evaluate the total impact value of a connection C x→y from airport A x to airport A y as the average impact of the flights F i , performed by various commercial carriers in that connection, using different aircraft types as defined by their TN.

E. CONGESTION -DEPENDENCY ANALYSIS
The proliferation of impact and likelihood values indicate the Delay Risk R x→y for a connection C x→y from an airport A x to an airport A y as follows: Potential congestion delay is transferred from the previous connection to the next flight leg, where late arrival of one flight may be propagated to late departure of the next scheduled one for the same aircraft. To calculate the dependency risk of delay propagated in a series of airports, we use the following method: . . → A n be a chain of n th airport dependencies, based on specific aircraft routes, where L A 0 ,··· ,A n is the likelihood of the n th -order cascading congestion and I A n−1 ,A n is the impact of the A n−1 → A n dependency, then the cascading risk of the chain R A 0 ,··· ,A n due to the n th -order dependency is computed based on (5).
The cumulative dependency risk considers the overall risk exhibited by all the critical infrastructures in the sub-chains of the n th -order dependency, denoted as DR A 0 ,··· ,A n . It is defined as follows, representing the overall risk produced by n th -order dependency.
Equation (6) computes the overall dependency risk, as the sum of the dependency risks of the affected nodes in the chain, due to delay incidents realized in the origin airport A 0 of the dependency chain. Interested readers may refer to [11] and [12] for additional details about dependency risk estimation.

IV. DATA SET DETAILS & VALIDATION
Since 2010, the US Department of Transport (DOT) has been publishing a list of air traffic data sets on their website [15]. We have collected US Flight data for two consecutive years , focusing our analysis on July and August. These months have the highest traffic, due to the summer holidays season, while the arrival on-time performance degrades compared to other months all year round. Moreover, these summer VOLUME 8, 2020 months are more likely to suffer from late-arriving aircraft and air carrier delays, besides weather delays introduced, either as extreme weather events or indirectly reported in National Aviation System (NAS) delays. [37]. Therefore, July and August seem to better reflect aviation stake-holders' efficiency to manage heavy traffic in airports. We refrained from using 2020 data in our analysis, since in summer 2020 aviation traffic dramatically shrunk, due to Coronavirus global health crisis (Covid-19 Pandemic). As a result, on-time performance has significantly improved [15].
From the data set retrieved from the BTS database we exploited in our experiments the following information: To ensure the quality of the dataset, we removed rows where data entries appeared with inconsistencies, like flights without information about arrival time, arrival delay and actual elapsed time, etc. These flights were considered as canceled flights due to missing data from arrival reporting fields.
In addition, canceled flights reported in the dataset were excluded from our experiments, since they could not cause any congestion to airports, although these cancelations may have occurred due to enormous flight delays from previous flight connections, which sometimes exceeded the amount of 1440 minutes (i.e., 24 hours delay). The validated data used for each month in our experiments are presented in Tab. 2.

A. SOFTWARE TOOL
The tool was developed in the Java language using the Neo4J graph database [38]. The tool accepted as input the collected US Flight data for two consecutive years 2018-2019 for July and August. It modeled all US airports as nodes and flight connections as edges. For each connection of the modeled graph, all flights performed by various aircraft were processed and assigned with an impact and a likelihood value based on the presented methodology.
The tool generated all dependency chains for the worst performing aircraft and each airport and flight route. Output was then imported into a Neo4J graph database for risk dependency chain analysis.
We ran the experiments for the two alternative methods proposed for calculating impact and connection's delay risk to evaluate which method best fitted to congestion delay analysis. The results are analyzed below and involve the busiest US airports, where we use the 3-digit IATA code to denote them.
For readers not familiar with IATA codes, all US airports discussed in this subsection are listed in Appendix A. The table 9 in Appendix A provides details about official airport's name, the main city served, along with annual passenger traffic, which indicates an airport's importance in the aviation network.

B. IMPACT CALCULATION METHODS COMPARISON
The first step of our analysis was to present the differences between the two methods proposed for impact calculation. In Fig. 2, we present the variations of average risk calculation between the Static Impact method and the Min-Max Impact calculation for all connections arriving at New York airport (JFK). This airport is one out of three airports serving NY city, and it is connected with another 64 US airports for serving domestic flights. The average risk of each airport connected to JFK is the fraction of total risk accumulated from all flights in the same connection, divided by the number of flights. Thus, it denotes the risk tendency for delays in each JFK connection with other airports.
The graph in Fig. 2   provide a differential analysis of deteriorations and improvements of airport's connection examined, it can be relatively unfair, since impact values assigned do not indicate the same delay deviation from scheduled arrival time. For example, I = 10 may indicate a delay of 30 min, for a connection with no delays, while the same impact value may be attributed to a connection, with major delays more than 1000 minutes.
On the other hand, the Static method is more objective, evaluating all connections impact with the same criteria regarding the deviation in minutes of actual performance versus scheduled arrival time. As we have concluded from all experiments we performed, both methods were able to distinguish very congested airports, simulate the hierarchy of worst airports, and produce similar aggregated risk values for heavy traffic connections.
However, for the rest of our analysis and graph production, we decided to use the static method as a more objective for calculating airport delay performances.

C. AIRPORT CONGESTION RISK ANALYSIS
The results of the average risk analysis for the 30 busiest US airports are graphically presented in Fig. 3. We depict the average connection's risk for heavy traffic airport connections during peak traffic months (July-August) of the year 2019. Graph depiction is in GIS format, where the reported aviation hubs served overall 788 domestic flight connections. In this graph, with red lines are represented the connections which have a higher average risk for a delay (>0.6) while with orange and green color the connections with risk lower average risk below 0.6 and 0.4 values respectively.
For the examined period, these airports have handled 439,846 domestic flights, while they account for 34% of national flights reported. In other words, 8% of US airports served one-third of commercial domestic flights. The most interconnected airports, presented in Fig. 3, are Atlanta (ATL with 160 connections), Dallas (DFW:179), Chicago (ORD:174) and Denver (DEN:165), while the least interconnected busiest airport is Portland Airport (PDX), with 47 domestic connections with other airports in the USA.
In Figure 3, the airports with the highest average risk include the following destinations: BOS, EWR, FLL, JFK, LGA, and MCO. It is evident that congestion delays seem to cluster on eastern coast airports. Specifically, delay risk is detected to be higher for air-traffic departing from south-east destinations (MIA, FLL, MCO) towards north-east airports like the ones in New York, Philadelphia, and Washington, and vice versa.
The higher average risk value is 1.33 for the CLT→ PDX connection (worst connection with 67% delay probability), while the lowest risk value is 0.06 for SLC→TPA connection (best connection with only 5% delay probability). The average likelihood value is 0.37 for both summer months, which means that 37% of flight connections are likely to arrive with delay (>15 min later than scheduled arrival) during July and August.
Furthermore, we also examined the 30 busiest US airports for i) incoming congestion risk, which occurs based on arrival delays, and ii) outgoing congestion risk based on departure delays occurred for July and August in 2018. Afterward, we compared 2018 data with the same period in 2019, and results are graphically presented in Fig. 4 and 5 respectively. The data are also provided in a table format in Appendix B (Tab.10). Fig. 4 depicts average congestion risk for the 30 busiest airports in 2018. For each airport, we present its inbound delay risk (expressed by average arrival delay risk of all flight connections arriving at this airport) and outbound delay risk (expressed by average departure delay risk off all flights departing from the airport) plotted in blue and orange bars respectively. On the right-side secondary axis, the number of connections with other domestic airports are shown in the red line and depict the airport's degree centrality. In graph theory, degree centrality is equivalent to the edge count of a node; airports with high degree centrality generally are more central and of greater importance in the aviation network.
As one can notice in Fig. 4, there are airports that create more departure delays than arrival delays occurred, therefore for these airports, the outbound average risk exceeds the inbound average risk. Such airports are: ATL, CTL, DAL, DEN, DFW, DTW, MDW, MIA, MCO, MDW, MIA, ORD, PHX, SEA, SLC, etc. On the other hand, there are airports that manage to mitigate occurred arrival delays, like the BOS, EWR, JFK, IAD, SFO. These airports perform better in their operations handling, mitigate occurred arrival delays, and propagate fewer delays to the downstream connections. Overall, the best performing airports for avoiding delay risks are ATL, DTW, IAH, MSP, and SLC, which have the lowest average ingoing and outgoing risk. On the downside, the most congested airports are EWR, FLL, and MCO.
In Fig. 5, the same 30 busiest airports for 2019 are presented, with blue and orange bars for inbound delay risk and outbound delay risk, respectively, while the number of connections with other airports is shown in the red line. By comparing Figure's 5 diagram to the previous one, we see an increase in airport connections during 2019 vs. 2018, for the majority of airports. This is in accordance with the reported flight's increase, as reported by BTS. Moreover, it is evident from this graph that all airports have improved ontime performance and decreased delay risk versus previous year, despite the fact that airport connections to most airports have increased on average by 2.5% (2797 more flights vs. 2018). The airports, where departure delays are exceeding arrival delays, have slightly changed. The airports that manage to mitigate occurred arrival delays are: BOS, EWR, FLL, JFK, SAN, SFO, TPA. The best performing airports for avoiding delay risk remain: ATL, DTW, MSP, SLC, while the most congested ones remain EWR, FLL, and MCO. Despite the improvement in on-time performance, the airports that propagated more delays in the aviation network remain the same.
To reveal the most congested routes, we sorted the airports that concentrated the higher total risk in descending order. To do so, we utilized the proliferation of delay impact with delay likelihood, as shown in equation (2).
In Fig. 6, we present the average risk of the eighty most congested connections between airports for July and August in 2018. These connections between airports    In addition, in Fig. 6 we depict the most congested connections, which have higher average delay risk as occurred for peak traffic months (July-August) of the year 2018, while the data in detail are given in Tab This graph was also produced for the same period in the year 2019. In Fig. 7, we exhibit the most congested connections with heavy traffic and high average delay risk, while the data in detail are given in Tab

D. DEPENDENCY CHAIN ANALYSIS
Our tool generated all dependency chains for each aircraft's TN and for each route traveled within the same day. The tool accepted all airports as nodes and flights as edges. All data were imported as CSV files into Neo4J graph database for risk path calculation. First, we used our tool to find out the aircraft chains that accumulated higher dependency risks. The results are presented in Tables 3 and 5, for years 2018-19, respectively, where we indicate: i) the worst aircraft routes  visiting 3-6 airports in a single day; ii) the dependency risk values; and iii) the accumulated arrival delays, counted in minutes.
Although an aircraft can fly up to 9 destinations in a day, depending on destination distance and airport's congestion, results indicated that the worst aircraft were those who visited congested airports, usually with fewer flight legs. Findings also showed that the cascading effects beyond the fifth-order could rarely affect the consequent infrastructures. Since the product of likelihood tends to zero, so does the cascading congestion risk after 5 flight legs.
The top 10 worst dependency aircraft routes for July and August of the year 2018 are presented in Tab. 3. Data inside the table distinguish aircraft that accumulate higher delay risk when flying between congested airports (e.g. PHL↔SJU, JFK/ EWR↔MCO, ORD↔LGA). In some circumstances, delays exceeded 24 hours and finally, the aircraft arrived the day after. For example, TN: N410UA delayed for 483 minutes in the first leg, while in the next leg it delayed for another 1190 minutes, to sum up 1673 minutes of delay. It is obvious that the aircraft could not fly to other destinations on the same day, due to accumulated delays. For TN: N342AN the delays were fewer (679 min), however, the congestion likelihood for the connections involved was significantly higher for each flight leg. Thus, the dependency risk of this flight chain took a higher value. Moreover, in Tab 3 we can notice that  To analyze further what went wrong with these worst performing dependency routes, we provide a causal delay analysis. For all TN presented in Tab. 3, accumulated arrival delays and delay causal analysis are exhibited in Tab. 4. Results in Tab. 4 indicate that most delays were attributed to late-arriving aircraft, followed by air carrier deficiencies, while extreme weather delays are negligible and NAS delays less important.
The top 10 worst dependency chain routes for the same summer period in the year 2019 are presented in Tab. 5, to compare congestions and delay risk performances versus previous year. As one can notice, the worst flights with two legs have accumulated lower delays in minutes, comparing with the previous year, while the same busiest airports appear in dependency chains. When comparing the worst dependency chains with more flight legs in Tab. 5, we can notice that there is a lower aggregated dependency risk than the previous year, and so does the sum of accounted delays. However, in both years, the same airport hubs are included in worst chains, as congested ones, such as New York airports (EWR, JFK) and Florida airports (MCO, MIA).
In Tab. 6, percentages of causal delay analysis for each TN are provided, to analyze the cause of delays for the worstperforming dependency aircraft routes in the year 2019. As one may notice, most of the delays were attributed to late aircraft arrivals and air carrier deficiencies, while extreme weather delays and NAS delays introduce fewer disruptions in the aviation network.
We can also notice that both dependency analysis experiments (as shown in Tab. 3 and 5) include some of the routes of worst-performing connections, as presented in red in Fig.3. Comparing the chains presented in both tables, we can distinguish the improved delay performance in summer months for years 2019 and 2018. When delays are shorter the likelihood of lateness is lower, so does the accumulated dependency risk. Out of 370 airports we examined, some appeared more often than others in congested routes and seemed to greatly affect the network, in terms of adding delays. Overall data calculated are hard to be presented in detail, therefore indicative results for worse performing aircraft are presented.
In the last phase of our analysis, we used our tool to detect future airports' congestions, owed mainly to interdependencies by extracting patterns and trends, based on the historic flight data analyzed. For calculating dependency risk, we used for each connection the average impact of all flights examined during the summer period in 2019. In the experiments, we have distinguished two major airports categories: i) the busiest airports, as presented in Fig. 3, which serve from 300-1500 flights per day; and ii) the regional airports which usually have lower daily traffic, but due to summer seasonality, they serve more flights than their year average performance.
In order to evaluate the worst dependency chains for the busiest airports, we used total risk performance and we distinguished the eighty worst dependency paths to present them in the graph shown in Fig. 8.
In the graph depicted in Fig. 8, one can distinguish the airports which can produce greater delays in the aviation network, and these are ATL, EWR, DFW, ORD, LAX, MCO,  SFO. The connections presented in the graph are more likely to introduce delays in the aircraft's scheduled flight routes. The lines marked with red color represent the worst dependency flight connections, with higher cumulative risk, which are analytically presented in Tab. 7.
On the other hand, to evaluate the worst dependency chains for the regional airports, we used average risk performance, (instead of total risk used for the busiest ones) and distinguished the eighty worst dependency paths. These flight connections are presented in the graph shown in Fig. 9. In the graph, we can distinguish the airports which can produce greater delays in the aviation network, and work as hubs to the regional airports propagating delays in the aviation peripheral network due to summer seasonality in traveling. These airport hubs are EWR, FLL, and MCO. The connections presented in the graph are more likely to introduce delays in the aircraft's scheduled flight routes. The lines marked with red color represent the worst dependency connections, which are analytically presented in Tab. 8.
Finally, comparing the worst dependency paths in Tab. 7 and 8, one can notice a big difference in aggregated dependency risk between the paths involving the connections of busiest airports, versus the ones of regional airports. This makes sense, since large airports, which serve domestic aviation traffic as the nation's hubs, may introduce significant delays in the network. |However, they are more competent  to handle heavy traffic, minimize average delay risk, and be resilient when unexpected delays occur, especially during the summer period.
So, their on-time performance is better than the performance of regional airports, and this is reflected in connection chains with lower cumulative dependency risk values.

VI. CONCLUSION AND RESULTS EVALUATION
In this work, we propose a risk-based dependency method to analyze congestions in the aviation network. The method-ology and the developed tool can assess the risk of delay incidents in airports and produce weighted risk dependency graphs, presenting how a delay that occurred in one airport may affect other interconnected airports. By using real data collected from US Bureau of Transportation Statistics, we analyzed how flight delay risk propagates into the aviation network. Based on historic flight performance data, we also provided a prediction for congested connections and higher dependency risk chains.
We were able to detect the worst airports, in terms of affecting the aggregated delay of an aircraft route, along with airports that perform better and mitigate delay propagation in the aviation network. The tool, we have developed for congestion analysis, can be used to identify key airports inclined to delays with great influence on the network due to: (i) the number of connections; (ii) the likelihood of congestions; and TABLE 11. Airport's connections with higher average delay risk (2018) data related with figure 6.
(iii) the airports that get affected the most by delays occurred in previous airports.
Between the two consecutive years examined, which included two summer months in 2018-19, simulation results indicated that significant delay risk mitigation was achieved in summer 2019 versus the previous year. The results were cross-validated with BTS air travel consumer reports issued for the same period to verify the performance and the efficiency of the developed tool.
Generally, our tool can detect: i) the flights with the highest overall risk to be congested and create major impact with propagation delays to downstream flights in the aviation network; ii) dependency paths with highest overall impact for specific connections per period of calculation (week, month, year); iii) the airports which create delays in the aviation network and exert higher influence on other airports in terms of both impact (how much delay they introduce to other flight connections) and centrality (how many other flights they may affect), and iv) the worst n-order airport dependency chains.
Simulation results can aid airlines and operators, flight planners, and decision-makers to assess congestion risks of routes towards busy airports and analyze large-scale congestion scenarios. The model can also be used to run specific scenarios of interest to airlines concerning specific airport connections. These include ''what-if'' scenarios that only consider delays that affect one or some of the airports. By analyzing n th -order dependency paths, we can: i) identify which dependencies should have a high priority for applying mitigation controls for risk reduction in the aviation network; ii) propose alternatively connection paths; and iii) indicate flight connections to be avoided or rescheduled.

APPENDIX A
See Table 9.