Contagion Modeling and Simulation in Transport and Air Travel Networks During the COVID-19 Pandemic: A Survey

The COVID-19 pandemic has generated a huge volume of research from various disciplines, such as health sciences, social sciences, mathematical modeling, social network analysis, complex systems, decision-making processes, computer simulation, economics, among many others. One of the key problems has been to understand the diffusion processes of the virus, which quickly spread worldwide through transport networks, mainly air flights. Almost two years after start the pandemic, it is necessary to collect and synthesize the existing work on this matter. This work focuses on studies related to the COVID-19 contagion simulation through transport networks. In particular, we are specially interested in the different datasets and epidemiological models used. The search methodology consists of four exhaustive searches in Google Scholar carried out between January 2020 and January 2021. Of the 1786 findings, we chose 54 articles related to Covid-19 contagion modeling and simulation through transport networks. The results show 30 different data sources for the collection of air flights and 11 additional sources for maritime and land transport. These datasets are usually complemented with other data sources, local and international, with demographic information, economic data, and statistics of traceability of the pandemic. The findings also found 15 spread models of contagion, with the SEIR model being the most widely used, followed by mathematical-based risk models. This diversity of results validates the need for these types of compilation efforts since researchers do not have a single centralized repository to collect air flight data.


I. INTRODUCTION
It is not news to say that the coronavirus disease-identified in late 2019  and declared a pandemic in early 2020-changed the lives of people worldwide. Since the first months of 2020, numerous scientists and researchers around the world have focused on the study, understanding, and prevention of the phenomenon. In turn, the governments and public administrations of the various countries have developed different public policies to face the crisis [1].
The first outbreaks were identified in Wuhan, China, a city of more than 10 million inhabitants, which has the largest airport in the region and therefore carries out several daily air travel for commercial and tourist purposes [2], [3]. Air flights The associate editor coordinating the review of this manuscript and approving it for publication was Derek Abbott .
were quickly identified as high risk due to the high probability of contagion within them [4]- [6]. This risk was known, since there were already studies on the risks of contagion on airplanes, with other diseases such as tuberculosis [7]. Besides, China is the third largest country on the planet. By also considering international flights, it is easy to understand the rapid spread of the disease and how it reached Europe so quickly [8], being Europe one of the main tourist destinations of Chinese people. Despite the above, the first confirmed case of contagion outside of China was in Thailand, generated by 16 people who arrived by direct flight from Wuhan less than two weeks after the virus was discovered [9]. Thus, air traffic was considered one of the main causes of global contagion [10], [11]. Sooner or later, the virus would reach the other continents, and the predictive simulations of when it would be the moment did not take long to appear [12], [13]. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ At the same time, some studies focused on identifying the most critical airports (that is, risky for the contagion and spread of the virus) [14], and various sanitary restrictions were proposed and implemented within some airports [15]. However, some studies early assured that these measures would not be sufficient due to the incubation periods of the virus, the difficulties of traceability, and early diagnosis [16]. In addition, some political models have also been considered aggravating for the proper management of the pandemic [17]. Due to the above, many countries closed their borders and decreed curfews and long quarantines. Meanwhile, many travelers had to resort to evacuation flights (not exempt from contagion risk) to return from their destinations [18], [19]. As a result of these measures, many researchers turned to study the economic impact of the crisis and its effects on tourism [20], [21]. Many studies also appeared on the physical and mental health problems derived from the lockdown and social distancing [22], [23]. On the other hand, several studies also related to environmental effects (positive and negative) [24], [25], and some changes in the habits and transportation in citizens appeared [26], [27]. Furthermore, among the scientific community, there seems to be a relative consensus that neither quarantines nor the closure of airports alone solves a health crisis of this magnitude [28]. Moreover, after the opening of the borders, it is considered necessary to maintain special care in the health of passengers who enter a destination from other countries of origin [29].
Due to the enormous amount and diversity of research carried out on the pandemic, its causes and effects, surveys and systematic reviews are helpful tools to gather, synthesize and order existing scientific advances regarding certain specific topics. Some of these works arose early, related to contagion risks inside airplanes [30] or behavioral models of contagious diseases [31]. From computer science, there exist surveys on big data and artificial intelligence techniques regarding the pandemic context [32]- [34], as well as other studies more focused on virus traceability: tracking apps [35], [36], geolocated [37] and spatio-temporal [38] data analysis techniques, among others.
In this article, we present a survey of COVID-19 contagion modeling and simulation through transport networks. Although there are studies related to land transport networks, such as cars [39], public transport [40], road networks [41], or rail systems [42], we focus on air flight networks, as they are considered, as previously said, as the most critical for the spread of the virus. Additionally, we consider some studies on maritime and land transport networks. This study focuses mainly on articles published during 2020.
This article attempts to shed light on how applied science currently seeks to model contagion networks at a pandemic level and simulate contagion spread processes through these networks. These models and simulations are relevant to establish public policies and make decisions related to risk management and prevention of possible outbreaks. However, as we shall see below, there is still no consensus on the most appropriate ways to address these problems.
Likewise, there is a wide variety of data sources with different formats and levels of accessibility, which make experimentation and comparison of models more complex in terms of sensitivity and trade-off analysis.
The rest of the article is structured as follows. Section II describes the methodology used for the collection and review of scientific articles. Section III presents the works related to the contagion modeling and simulation of COVID-19 through air transport networks. The datasets, epidemiological models, and main analysis methods used are surveyed and discussed. Section IV is devoted to present additional work related to other types of transportation networks. Finally, Section V discusses our main findings, Section VI presents some limitations of the study, and Section VII ends with the main conclusions of this work.

II. METHODOLOGY
The main aim of this work is to know how applied science currently models epidemiological processes at a global level, using transport networks, in particular air flight networks.
This objective is achieved through a bibliographic review process carried out considering the following steps: research questions statement, searching process, data selection, data extraction, and data analysis. These steps are described below.

A. RESEARCH QUESTION STATEMENT
The research questions of this work are the following: Q1. What are the main data sources for modeling (air) transport networks? Q2. What are the main spread models used to simulate contagion processes on (air) transport networks?
These research questions will guide the following steps. The answers to these questions allow us to fulfill the main aim of this work and obtain the main conclusions.

B. SEARCHING PROCESS
The search for scientific articles was carried out through Google Scholar instead of databases such as WoS or Scopus to not excluding preprints or scientific reports. Indeed, several manuscripts were published in open access repositories such as arXiv or medRxiv, due to the urgency required by the context. In some cases, these kind of documents were later published in journals or conferences. In such cases, it was decided to cite the most recent version, prioritizing peer review over the number of citations.
Four searches were carried out, for articles dated between January 1, 2020, and January 18, 2021. Table 1 illustrates the characteristics for each type of search. Searches 1 and 2 were made using search terms: ''flight network''+''covid-19'' and ''flight''+''contagion''+''covid'', respectively. Recursively, the search for articles was continued from the citations of the articles already found. Thus, if a paper A is cited by another paper B, then B must also be included in the search results. In turn, if B is cited by another article C, then C is also included in the search results, and the same process is repeated for C. This recursive process ends when there are no more citations to add. Note that the citations for each article are also provided by Google Scholar. With recursive searches 1 and 2, we obtained a total of 1473 search results.
For search 3, the 262 citations that the article [43] possessed by January 18, 2021 were considered. We chose this article because of its early impact on the scientific community in terms of citations. In this case, a recursive search was not carried out since many articles that cited this article, in turn, had several hundred references, thus increasing the search space too much. Furthermore, in a cursory review, these new results were already far from the research questions.
Finally, search 4 includes those articles obtained in a non-systematic way from isolated searches of relevant literature. In this way, 51 additional articles were included in the search results.

C. DATA SELECTION
For all search results, their titles and abstracts were considered. The selection criterion depends on whether or not the article refers to the main objective of this work. Table 1 shows the number of papers selected for each search. Among the first three searches, only three matches were detected between the selected articles. Thus, a total of 54 articles were selected. Some of the results not selected in these searches were used in the Introduction of this work.

D. DATA EXTRACTION
The 54 selected articles were subsequently downloaded and collected in a single database.
The following attributes were saved for each article: search number from which the article was found, DOI or URL of the article, title, type of transport considered, data sources, spread models, keywords, and optional comments related to the main variables and analysis methods used.

E. DATA ANALYSIS
The 54 articles collected were read carefully, always seeking to answer the research questions initially proposed. These articles are extensively reviewed in the following sections. The data analysis will be mainly qualitative. However, in addition to only listing the different data sources and spread models considered, their use frequencies are also calculated.

III. CONTAGION MODELING AND SIMULATION ON AIR FLIGHT NETWORKS
The articles found related to simulation or modeling of contagion in air flight networks are illustrated in Table 2. The results are ordered by the last name of the first author. For each paper, we focus on two aspects: data sources and spread models, which are detailed below.

A. DATASETS
There are some surveys on COVID-19 datasets. Although they are not focused on transport networks, they were found from our search methodology, and they seem relevant to mention. In [91], various open-source datasets are gathered, including some with mobility data. A research domain taxonomy is also proposed to identify the key features of open-source datasets in terms of their type, applications, and methods. Moreover, in [92], various COVID-19 data sources for researchers and epidemiologists are mentioned. The authors mention there is no international standard for the collection, documentation, and dissemination of COVID-19 data, so each country does it in its own way. The above generates problems of usability, quality, interoperability, and completeness of the data. To face these difficulties, the authors propose policies and guidelines for sharing epidemiological data. The aim is to help early detection of epidemics, minimize deaths, and make informed decisions. Finally, they also propose standards for data and infrastructures (hardware and software) necessary for open access data.
In what follows, we focus on air transport networks data sources used in COVID-19 spread researches. It is necessary to clarify that air transport is governed by various public and private entities, which must coexist to establish regulations and minimum operating standards. Among the various international organizations, the International Civil Aviation Organization (ICAO) stands out, as it is the only one with international authority among the signatory states. Other organizations include the International Air Transport Association (IATA), a trade association representing airlines; the Civil Air Navigation Services Organization (CANSO), for air navigation service providers (ANSPs); and the Airport Council International (ACI), a trade association of airport authorities. Additional international trade groups are The Airline Cooperative (ACO), Air Transport Action Group (ATAG), International Association of Travel Agents Network (IATAN), and International Society of Transport Aircraft Trading (ISTAT).  The data sources regarding air transport networks are shown in Table 2. In the 50 investigations, 61 air transport network data sources were used, of which 30 correspond to different data sources. Table 3 shows the data sources used more than once, along with their access links.
Note that the most used data source (eight uses) is the one provided by the International Air Transport Association (IATA), through the World Air Transport Statistics (WATS). 1 This source collects information on flight restrictions and the number of passengers traveling from one airport to another. In particular, the World Air Transportation Statistics 2019 report released in February 2019 was of great value for the first investigations. In the second place, with five uses, we find OpenFlights, a website that provides datasets of airports, airlines, planes, and routes. These data are available under the Open Database License. It is closely followed by Flightradar24, a real-time global flight tracking service capable of tracking more than 180 000 flights at any given time, from more than 1200 airlines, flying to or from more than 4000 airports. Flightradar24 also offers a series of search filters, statistics and weather information. Unlike OpenFlights, it is a paid service based on deferred subscriptions for data collection. Also with four uses is the World Bank. Overall, the World Bank website is a trusted and freely accessible repository of economic and financial data worldwide. In particular, we are interested in their efforts to provide data about the numbers of passengers transported by air transport. In this sense, the World Bank interacts directly with the ICAO, 2 jointly providing data on registered carrier departures worldwide. 3 Next, we have four data sources, each used in three of the identified investigations. The Official Aviation Guide (OAG) provides global air traffic data through a database of millions of flights since 2004. This database is paid, with the possibility of a free trial. It includes not only information on flights but also on cancellations, boarding gates, and baggage information. In addition, it provides a statistics service specialized in COVID-19. OpenSky is another real-time global flight tracking service, like Flightradar24, but which, like Open-Flights, offers open traffic data for research. In addition to a historical database and various datasets ready for download, OpenSky has an API (downloadable for Python or Java programming languages) to retrieve live airspace information for research and non-commercial purposes. The third data source is OurAirports. It offers information on almost 67 thousand airports around the world, including departure and arrival flights. They also provide airport and flight datasets for free download, with millions of data frequently updated. Finally, VariFlight is another payment platform, but unlike Fligh-tradar24, not based on subscription services but personalized payment services, depending on the data required.
There are also four data sources, each used in two investigations. The Bureau of Transportation Statistics (BTS), depending on the United States Department of Transportation, has a series of datasets for download, not only related to aviation but also with: maritime transport, highways, transit, rails, pipelines, bikes, and pedestrians, among others. Eurostat is the statistical office of the European Union, whose mission is to provide high-quality statistics and data on Europe. Its website offers databases on multiple themes, including different types of transportation, including air transport. The World Factbook (by CIA) also has some global data to be requested. Lastly, FLIRT is a network analysis tool developed by the EcoHealth Alliance, which provides simulated passengers' data from hundreds of airlines. FLIRT has been developed to identify where infected travelers and contaminated goods are likely to travel. The system calculates the connectedness between airports using passenger, cargo, and network data. The aim is to predict locations at risk of exposure to infected travelers and goods after an infectious disease outbreak has been detected.
Additionally, 21 other data sources used only once were found. Several of these data sources are local. From Australia, it is considered the Australia Bureau of Statistics (ABS). From Canada, the Stats Canada. From China, the Ding Xiang Yuan, an online Chinese community for healthcare professionals; and the Baidu Migration Big Data Platform, coming from Baidu, a Chinese multinational technology company specializing in Internet-related services and products and artificial intelligence. From South Korea, the Incheon International Airport website, the largest airport in this country.
From United States of America, the Automatic Dependent Surveillance-Broadcast (ADS-B), that exchanges data to construct an airline transportation network; the Airline Origin and Destination Survey DB1B, collected by the Office of Airline Information of the BTS; the U.S. Department of Homeland Security, and the Federal Aviation Administration (FAA).
Besides, some particular efforts appear, such as GISAID, a global science initiative that seeks to facilitate fast and open access to epidemic and pandemic virus data; the Global Epidemic and Mobility Model Simulator system (GleamViz); the SafeGraph, an application with social distancing metrics based on mobility data collected by smartphones; Umetrip, a Chinese mobile application that provides real-time information on more than 700 thousand flights around the world; and VenPath, and holistic global provider of compliant smartphone data. Of course, large companies such as Apple or Google could not be absent through their Apple Maps Mobility Trend data set and Google COVID-19 Community Mobility Reports initiatives, respectively.
Lastly, there are three investigations with private data sources and two with simulation generated data. Among the private sources, one is the company Saber Data & Analytics Market Intelligence. Regarding the simulated data, in [60] the authors use random graphs crossed with real-world country-level network data, empirical data on the global spread of COVID-19 outbreak at a country-level, and confirmed cases of COVID-19 obtained from the World Health Organization. Furthermore, in [53] authors use a mathematical model from [54] to generate a simulation to study the global risk of outbreak by airport from available data.
To finish this section, it is necessary to clarify that in most studies, the data extracted from the aforementioned sources were complemented with other data, for example, census data, traceability data, infections, recoveries, and deaths that each country manages at a national level.
Among the international data sources mentioned in the works, we can highlight the World Health Organization (WHO) (www.who.int/data), the European Centre for Disease Prevention and Control (ECDC) (www.ecdc.europa.eu), the EU Open Data Portal (www.data.europa.eu/data/), City Population (www.citypopulation.de), and Worldometer (www.worldometers.info/coronavirus/). The Research and Expertise on the World Economy (CEPII) (www.cepii.fr) was also used in [65] as a data source to know the closure of international borders. Some global indices that have been used as complementary data are the Global Health Security (GHS) Index (www.ghsindex.org) [52], or the Fragile States Index (www.fragilestatesindex.org) [53].
Some institutions have also created very useful data repositories for academic purposes. A data source widely used by various investigations is the Center for Systems Science and Engineering (CSSE) of Johns Hopkins University (https://coronavirus.jhu.edu). It also highlights the Socioeconomic Data and Applications Center (Sedac) (https://sedac.ciesin.columbia.edu), from the NASA, or some government platforms, such as the UK Government Disease surveillance data (https://coronavirus.data.gov.uk). Also, several general-purpose dataset repositories have been used to upload COVID-19 databases. Among the most used examples are Figshare (www.figshare.com), Opendatasoft (https://public.opendatasoft.com), Socrata (www.dev.socrata.com/data/), Zenodo (www.zenodo.org), among many others.

B. EPIDEMIOLOGICAL MODELS
Many of the initially collected works focus on simulating and modeling general-purpose spread processes, not necessarily referring to contagion networks or the COVID-19 phenomenon. For example, in [93], an architecture was proposed to study diffusion processes in multiplex networks using agent-based simulation. The authors used the communication network used in a nuclear emergency plan as a case study. The methods used in that case were DEVS (discrete-event system specification) and agent-based modeling for multiplex networks. There is also work on space-time networks using graph neural networks [94], mathematical models based on Monte-Carlo simulations [95], [96], and contagion models based on genetic algorithms and cellular automata [97].
A spread model or diffusion model can be defined as a mathematical or algorithmic model that makes it possible to quantitatively evaluate the evolution of nodes on a network over time or through a sequence of discrete steps. Therefore, the spread models represent network dynamics. In the contagion networks context, these models are also called epidemiological models, where what is ''spread'' is an epidemic or disease.
One of the epidemiological models most used to study the spread of COVID-19 has been SIR and its various variants [98]. There exist several simulations works based on mobility data and SIR adaptations in the COVID-19 context. However, many of them do not seek to model contagion dynamics in air transport networks [99]- [104]. On the other hand, some works cover this type of network, but not to study contagion processes. For example, in [105], the authors use an airport-based Susceptible-Infected-Recovered-Susceptible (ASIRS) epidemic model on a 2015 air traffic network (data provided by the Civil Aviation Administration of China, CAAC) to study the propagation of delays caused by air traffic.
There is also a wide variety of agent-based simulation studies. These models have been used for different purposes, distinct from those sought here but equally helpful and interesting. For example, to help establish adequate contagion prevention policies within educational establishments [106] or containment strategies, including air flight restrictions [107]. Other studies have combined agent-based models with SIR-based models to simulate economic and health effects derived from social distancing [108].
Next, we focus on the researches in Table 2 based on modeling or simulation of contagion in air transport networks during the COVID-19 context. Note that of the 49 works, there are 16 that do not use spread models nor consider network dynamics. In most of these cases, a descriptive analysis of the data or statistical analysis of accumulated data is carried out. Table 4 summarizes the frequency of use of the different models for the remaining 34 works. We have classified these models into five types:

1) SI-BASED MODELS
The Susceptible-Infective (SI) model is a stochastic spread model based on simple differential equations [109]. The model acts on a population whose members or actors must have one of two possible states: infected (I) or healthy but susceptible (S) to being infected. In each unit of time, a fixed number of interactions between pairs of actors may occur. If the interaction considers an infected and a susceptible to be infected, then the first actor can infect the second with a certain probability also fixed by parameters. The SI model was originally defined as a continuous model, but there are also discrete versions [110].
There are several variations. The most known is the Susceptible-Infectious-Recovered (SIR) model, which includes an additional possible state for actors: the recovered (R), representing a recovered, immunized, or deceased actor. In our context, variations of the SIR model are used twice. In [55], by applying the model simultaneously on a topological network of interconnected cities according to regular flights between them, and in [14], where the authors include mobility variables (including travel restrictions) as well as demographic and geographic distribution.
The most used model in this review is the Susceptible-Exposed-Infectious-Recovery (SEIR) model [56], [67], [71], [82], [84], [85]. The exposed actors (E) are those who, during a latency period, have been infected but are not yet infectious themselves [111]. In [84], the authors subtly modify the model to include exported infected cases. In [85], the authors complement the model results with network analysis and Bayesian analysis to represent the transmission of contagion to other geographical areas. In [82] they also complement the model with other mathematical models, and in [56] they modify the differential equations of the standard SEIR model to include demographics dynamics derived from the flight network.
Finally, there are two works that include different variations of the SI model. In [57], the authors use the Susceptible-Exposed-Infectious-Recovery-Susceptible (SEIRS) model. The SEIRS model is similar to the SEIR model, but it does not guarantee the immunity of the recovered, so they could become susceptible to infection again. The above is a valid assumption since not even current COVID-19 vaccines guarantee absolute immunity. The authors complement this model with a graph diffusion model to capture the clusteredness of the population distribution. Finally, in [46], the authors use the Susceptible-Infectious-Susceptible (SIS) model, a simpler model that simplifies the exposed or recovered states. The authors also use an airport network model to include the congestion of U.S. airports.

2) NETWORK-BASED MODELS
Some studies define spread models based directly on air transport networks (as potential contagion networks). In [44], the network dynamics are modeled using passenger flow, the effective distance between origins and destinations, and arrival times. In [49], the authors use centrality measures, a social network analysis tool is used to identify the most relevant actors (e.g., sources of contagion) within the transport network. In [74], the authors model the spread of the virus through Zipf's law, a discrete power-law distribution associated with social networks and information retrieval. Finally, in [60] exponential random graph models (ERGM) together with statistical models to model contagion dynamics are used.

3) STATISTICAL-BASED MODELS
Regarding statistical models, the use of a wide variety of statistical methods is observed. In [73], regression models are used. In [48], the authors used ordinary least-squares (OLS), a type of standard multiple linear regression (MLR). In [53], they use a probabilistic branching process that considers the volume of air travelers between airports and the reproduction number in each location, taking into account the local population density. Other works focus on models based on specific probability distributions. In [64], they use full probability distribution of arrival times; in [70], negative binomial regression; in [90], a binomial distribution, and in [81], different statistical distributions (exponential, Poisson, and geometric) to estimate the geographical paths of the COVID-19. Additional statistical models defined are based on the difference-in-differences technique [68], the Hazards model [77], and time series [80].

4) MATHEMATICAL-BASED MODELS
While statistical models are based mainly on probabilistic methods, probability distributions, and regressions, mathematical models consider mathematical equations and formulas to obtain measurements and scores. In particular, five mathematical models of contagion risk are identified. In [59], the model considers all the end destinations of flights from four big cities of China involving 168 territories worldwide. The authors calculate the total risk of transmission into a country by aggregating the risk associated with all the entry airports of the country. In [62], the model assesses the risk of infectious diseases by considering the relative mobility-interaction effect and travel-specific risk. In [69], the authors assess the risk of contagion based on historical data from multiple sources obtained from more than 640 thousand flight routes over 1491 airports. In [76], it is considered the risk of case importation across 162 countries, in the context of local epidemic growth rates. In [87], it is considered an ''imported case risk index'' based on an ordinary differential equation that uses the potentially infected population and air connectivity between Chinese provinces and foreign countries.
In addition, in [52], a simple scoring tool is defined to produce a stratified estimate of the relative risk of COVID-19 importation to Pacific island countries and territories. Finally, other kinds of mathematical models are also proposed. In [47], the authors define a ratio of infected individuals to the estimated number of travelers. In [50], the expected proportion of under-ascertainment of cases and an age-specific deterministic model is calculated.

5) PHYSICAL-BASED MODELS
To finish with the spread models, in [45], epidemic Renormalization Group (eRG) is used, based on the renormalization group (RG), a technique from theoretical physics to perform calculations on systems with a large number of simple elements in interaction. The model consists of first-order differential equations considering the number of flights among pairs of airports.

IV. CONTAGION MODELS ON OTHER TRANSPORT NETWORKS
In what follows, we mention the works found and related to other types of transport networks, different from air flights. These articles are summarized in Table 5. Note that of the ten works included here, six were already described in Section III. Therefore, in this section, we only explain the part of their studies referring to other means of transport. Besides the air transport networks, the second most studied networks are maritime transport (seven appearances), followed by different forms of land transport (five appearances). In [112], [113], the authors refer to the relevance of the Automatic Identification System (AIS) as a data source to study contagion networks in the context of maritime transport. AIS is an automatic tracking system based on ship transceivers, used by Vessel Traffic Services (VTS). It provides real-time and historical data. In [113], the authors also mention several alternatives for ship tracking: ExactEarth (www.exactearth.com), FleetMon (www.fleetmon.com), MarineTraffic (www.marinetraffic.com), OrbComm (www. orbcomm.com), Vessel Tracker (www.vesseltracker.com), and VesselFinder (www.vesselfinder.com).
Continuing with maritime transport, in [68], the authors use data from cruise ports provided by the U.S. Department of Homeland Security (DHS) (www.dhs.gov). Note that the SafeGraph application, also mentioned in Section III, can also be used in this context to collect mobility data through smartphones. In [114], the authors created a database consisting of 43 cruise ships with passengers infected with COVID-19. The data was extracted from news reports and cruise ship alerts obtained through Cruise Mapper (www.cruisemapper.com). In [115], the authors use the Princess Cruises' official website. This website includes data about infected travelers, quarantine updates, and other news and notices (www.princess.com). The authors cross-checked this information with information from the National Institute of Infectious Diseases, Japan. The two remaining papers on maritime transport also include land travel data. In [67], the authors use data provided by the Ding Xiang Yuan community of health professionals, already mentioned before. In addition to airline schedules for 90 cities, they consider around 19 lines of sail passengers in China. Regarding land transport data, they take 11 primary railroads in China, which occupy the major railway passenger flow, connecting around 120 cities. They also collect bus data from connections between cities at a mid-range geological distance (150 km) and between central cities and other cities in the province. In the case of [81], they use maritime and land transport data from a Geographical Information System (GIS). However, these data are not detailed enough.
Three other works use land transport data. In [63], the aim was to analyze the changes in mobility patterns in Europe during the COVID-19 pandemic. Besides the OpenSky data source used to collect flight data, the authors also use an Apple Maps Mobility Trend data set (https://covid19.apple.com/mobility). In [86], in addition to the air data provided by OAG, China Railway (www.12306.cn), for train travel, and Xinxin Travel (www.cncn.com), for bus travel are considered as data sources. In [89], the authors use the Baidu Migration Big Data Platform for air and land data (car, train, bus). This data source was already mentioned in Section III.
Finally, half of the works use spread models or analysis methods considering the network dynamics. In [67], [115], the authors use the SEIR model, which we recall was also the most used for air transport networks. Statistical models already described in Section III are used in [68], [81]. In [114], they also use a statistical model, but based on correlation and regression analysis to determine risk factors for COVID-19 attack rates on cruise ships worldwide.

V. DISCUSSION
After reviewing the 53 articles selected from the 1786 search results, we can answer the research questions posed in Section II. Regarding question Q1 (What are the main data sources for modeling (air) transport networks?), Table 3 shows 30 air flight data sources used 61 times in the different studies.
The most used data source is the International Air Transport Association (IATA), followed by flight tracking services such as OpenFlights and Flightradar24. OpenFlights provides free but pre-pandemic datasets, while Flightradar24 provides paid data. On the one hand, using data from old air flights can be justified for the design of contagion models and simulations, where it is assumed that current routes do not vary much from previous years. However, the volume of trips (and therefore the number of passengers) between origins and destinations can vary significantly from year to year. Clearly, the ideal would be to use updated data, reflecting how flights decreased considerably between March and July 2020 due to border closures. On the other hand, buying air flight data is expensive, reaching around e3500 for a month of flight data in Europe alone. Due to the above, a good option found is OpenSky, which offers an open-access dataset with trips made during the pandemic. The data has been updated throughout the pandemic since January 1, 2019, and it contains all the flights seen by more than 2500 volunteer members of the OpenSky network. Despite the above, this dataset is not the ideal solution, for example, to simulate the evolution of COVID-19 worldwide. In effect, the dataset is incomplete, as it does not contain data for some small countries and airports. Furthermore, it includes just a few flights from some major countries such as China. In addition, the data is somewhat unbalanced since most of the flights collected have their origin or destination in the United States.
Besides these data sources, numerous complementary data sources were distinguished, which do not have to do with transport but with the evolution of COVID-19 in different countries and regions. Undoubtedly, international organizations such as the World Health Organization (WHO) have been key to providing valuable data related to COVID-19, such as information and advice on good practices for applying methods and processes. However, these organizations cannot intervene in the standards in which data related to air flights and transport networks are stored and preprocessed. The difficulty of achieving standards in this regard lies in the complex system of public and private organizations in charge of the administration, regulations, and coordination between these transport networks. Thus, in practice, according to the literature found, there is also no standardized way of modeling infection processes through air transport networks (neither maritime nor terrestrial).
Despite the above, the International Air Transport Association (IATA) and the International Civil Aviation Organization (ICAO) provide standard codings to identify the different air flights and airports. Thus, the various air flight data sources generally consider quite similar attributes. The desired minimums are the flight code, the origin, the destination, and the flight date. Although it could be a useful variable, the number of flight passengers is not usually available at the research level.
For other means of transport, the Automatic Identification System (AIS) stands out as a data source for maritime transport. As Table 5 shows, most data sources are local. In general, the same diversity problems and lack of standardized repositories for data extraction are observed.
Regarding question Q2, the main spread model used to simulate contagion processes on air transport networks seems to be the SEIR model. Table 3 shows 6 articles, and Table 5 another 2 using this model. Furthermore, Table 3 shows another 4 studies based on SIR models and their variations. As mentioned in Section III-B, the SEIR model allows an additional state of ''exposed actors,'' that is, actors who have been infected but are not yet infectious themselves during a latency period. Exposed actors exist in the COVID-19 pandemic since, according to the Centers for Disease Control and Prevention (CDC), the Coronavirus incubation period may last between 2 and 14 days.
Notwithstanding the preceding, note that the statisticsbased models are the most widely used as a whole, with 11 uses on air transport networks and another 3 uses on maritime transport networks. Although these models are more diverse, those based on probability distributions stand out. Finally, the risk models were used in 5 different articles, and therefore they are only surpassed in use by the SEIR model. Although risk models are all models, their idea is to mathematically model the risk of contagion from people on air transport networks.
One aspect to highlight is that the studies do not observe comparisons between different spread models. Also, there are very few exhaustive analyzes on which variables (data) are most relevant when modeling contagion processes. We believe the researchers in future works should consider these kinds of comparative studies. The absence of widely used datasets, as mentioned above, can be an additional impediment to this.

VI. LIMITATIONS OF THE STUDY
This study includes an extensive review of articles published during 2020. Some of the papers may appear published in 2021 but initially appeared as preprints in 2020. However, we do not consider publications that appeared after January 18, 2021. Since there are many related works on the subject, we leave the possibility open to expand this survey in the future.
Additionally, another limitation that we identified is that we have focused on articles written in English. Although most scientific papers are written in this language, some publications on this topic are likely written in other languages, such as Chinese.

VII. CONCLUSION
The COVID-19 pandemic took everyone by surprise. Its rapid diffusion forced the scientific community to generate valuable knowledge in short times. This acquired knowledge can be useful in application development and strategic decision-making processes. A big problem has been the contagion modeling and simulation on transport networks, particularly in air transport. This survey reflects the great diversity of related scientific publications on the subject only during the first year of the pandemic. As far as we know, this is the first survey related to this topic.
A wide diversity of data sources is observed. This diversity implies that researchers do not have a single centralized repository to collect air flight data. Likewise, there is also a wide variety of repositories with traceability data, statistics, and reports of people affected by the pandemic. A problem that emerges from this study is the lack of standardized data sources. The data formats are also very diverse, and this makes the work of researchers difficult.
Various contagion spread models through transport networks were analyzed, being the SI-based models the most widely used, although not by much. In particular, the SEIR model is the most widely used since it allows modeling not only healthy or infected actors but also recovered actors and actors exposed to the virus, incubating the disease so not yet infectious. It is striking that no other types of spread models have been found, such as the Linear threshold model or the Independent cascade model [116], originally applied to collective behavior [117] and viral-marketing [118], respectively. In addition, several statistical models are used, based on probability distributions, regression models, among others. Mathematical models mostly use risk models. Furthermore, we found other network-based models and even a model coming from theoretical physics.
As possible lines of future work, the bibliographic review could continue to be expanded, including 2021 research. In this case, it is suggested to carry out a systematic non-recursive search to face the (possibly explosive) growth in the number of publications on the subject. It would also be beneficial to build a dataset (as complete as possible) of air flights between January and June 2020, leaving it open access to the scientific community. The datasets provided by OpenSky can be a good starting point to continue expanding from there, for example, with more flights from China and smaller countries. The above would allow comparing simulations and contagion models to expand the possibilities of experimentation and validation.