Edge Computing: The Computing Infrastructure for the Smart Megacities of the Future

This article contributes a research vision for using edge computing to deliver the computing infrastructure for emerging smart megacities, with use cases, key requirements, and reflections on the state of the art. We also address edge server placements, a key challenge for edge computing adoption.


INTRODUCTION
The continuing growth of cities poses challenges in ensuring their continual smartness while taking advantage of technological advances. As illustrated in Figure 1, future megacities are expected to integrate a wide range of sensors, such as wearables, diverse cameras (for example, surveillance, thermal, and hyperspectral cameras), and environmental sensors. Indeed, according to forecasts, the number of Internet of Things (IoT) and sensor devices is expected to reach 30 billion per city by 2030. 3 While this data growth provides opportunities for optimizing the city and the inhabitants' lives, significant challenges for computing infrastructure are introduced for processing and analyzing measurements.
Currently, the most common smart city solution is to externalize the computing needs to a cloud provider. While this offers scalability, the costs of using the cloud for data storage and processing will become prohibitive as the city and the number of sensors within it grow. This is particularly prevalent as we transition from smart cities to smart megacities with millions of inhabitants and potentially billions of sensors that require persistent storage and constant processing. Another limitation of the cloud-centered approach is the requirement of strong centralization without being able to exploit the spatial nature of the data (for example, processing data for individual neighborhoods), which can result in imbalanced processing, significant overhead, and, ultimately, high cost. As a result, there is a clear need for alternative processing and storage solutions that address the needs of megacities. 4 This article contributes a vision for using edge computing as smart megacities' computing and storage infrastructure. Edge computing is well suited for smart megacities as it supports distributing the processing and allows the exploitation of spatial characteristics in the data. Edge computing also integrates support for minimizing latency, conserving bandwidth consumption, and providing resilience against security and privacy threats. 5 Ongoing 5G-and future 6G-deployments are further expected to support edge computing by providing the infrastructure for scalable deployment of edge computing nodes and to offer additional capabilities that benefit megacities. 6 We derive use cases, identify key requirements, and reflect on current state-of-the-art solutions to analyze the potential of using the edge to deliver a scalable and cost-effective infrastructure for the computing needs of emerging smart megacities. We also address infrastructure scalability by addressing the placement of edge nodes. We use air quality data drawn from a real-world deployment to demonstrate how optimizing deployment locations helps to enhance and optimize the computing resources available for the city while meeting practical constraints (such as budget and energy).

APPLICATION USE CASES
Future smart megacities are expected to contain large-scale pervasive deployments of sensor-enabled devices (see Figure 1). These deployments are expected to increase by tens of thousands or even billions of sensors, generating massive data streams and requiring flexible processing and storage support from the underlying computing infrastructure. These deployments offer significant opportunities for enhancing the functionality of cities and delivering services that support inhabitants. However, at the same time, megacities require sufficient (computing) infrastructure to support these applications. We next discuss representative examples of typical application use cases to highlight the range of opportunities and the diversity of the applications' processing needs for edge computing. We note that many of these use cases are already being explored in smart cities, but as we transition to smart megacities, the scale of sensor deployments increases by several orders of magnitude, demanding more flexibility and scalability from the underlying infrastructure.

Surveillance and hyperspectral cameras
Smart megacities are expected to integrate massive numbers of surveillance, hyperspectral, and other cameras, which can offer a wide range of services. For example, processing video streams at the network edge from densely deployed surveillance cameras in cities supports counting people and vehicles to model people's mobility and estimate health risks. 7,8 Thermal cameras can be installed at points of interest (for example, at the entrances of shopping malls) to scan for potential signs of illness and to mitigate health risks for others. Surveillance cameras can also offer security by detecting suspicious activities and criminals. Surveillance cameras' resolution and frame rate are continually improving, resulting in each camera producing larger volumes of data at a higher velocity than before. At the same time, alternative imaging modalities, such as thermal and hyperspectral cameras, are also integrated, further increasing data processing needs. For example, hyperspectral cameras generate images of 30-300 MB per image. 6 Processing the data centrally becomes highly difficult as the scale and nature of the deployments increase. As envisioned in this article, edge computing offers flexibility to scale the infrastructure together with the deployments and harness the spatial nature of the data when they are being processed. Naturally, these scenarios also raise security and privacy issues, but taking advantage of the edge instead of the cloud mitigates these issues by providing local storage and allowing privacy protection to be handled close to the location where the data are generated. Edge servers also reduce communication costs and the overall latency.

High-resolution air pollution monitoring
Air pollution is considered one of the grand challenges of our time, and it is particularly important in megacities where pollutant concentrations tend to be the highest. Low-cost air quality sensors are emerging as a solution to increase the resolution of air quality monitoring, but they require massive-scale deployments that link with high-quality monitoring stations. Transmission and processing of mobile and geographically distributed data at the network edge enable new applications that can provide real-time and hyperlocal exposure data. 2 Edge servers are also crucial for improving data quality as low-cost sensors are prone to measurement drift and require periodic recalibration. 9,10 Edge servers can support recalibration and ensure that accurate and real-time air quality information can be offered to the public. 6

Unmanned aerial vehicles
Unmanned aerial vehicles (UAVs) are expected to be a significant source of data generation and an important tool for enabling smart megacities. Among others, UAVs can enhance network connectivity, deliver emergency services in disaster situations, support pollution or power line monitoring, and offer security and surveillance services. In smart megacities, the number of UAV flights will increase, which necessitates controlling UAVs in real time to avoid collisions between them and other obstacles. The amount of data that UAVs generate can also be very high (up to terabytes 11 ), and many applications would require near real-time processing (for example, collision detection) of these data masses. 2 Edge servers are expected to play a crucial role in providing the platform for coordinating and controlling UAV operations and delivering services that utilize the data the UAVs collect.

Health care
Health-care delivery is a central concern for the governance of future smart megacities. Future health-care services are expected to take advantage of wearables and in-home medical (IoT) devices, increasing the complexity of healthcare services and the volume of data being generated (thousands of terabytes per year). These data can offer personalized health services and optimize health-care delivery. Enabling personalized health care requires intelligent and privacy-preserving processing of the collected data. Edge servers provide a natural point for offering this functionality while meeting service-level quality requirements, including high reliability, low latency, and mobility support. 12

Extended reality
Extended reality (XR) categorizes immersive technologies, including augmented reality (AR), virtual reality (VR), and mixed reality. AR enhances the physical world with virtual annotations where environments are typically analyzed with computer vision algorithms. 13 In the context of smart megacities, XR technology is essential for analyzing the sensor streams in context, thus providing visualization support for different applications. The algorithms needed for processing are often not deployable on the AR hardware (for example, smartphones or head-mounted displays) as they lack the necessary computational resources. Therefore, the most resource-hungry processes should be offloaded. Efficient edge servers are preferred to the cloud as the edge is closer to end users, providing lower latency communication, privacy, and security benefits. One of the most pressing challenges for VR is the necessity of wired VR headsets to transmit high-resolution video at high frame rates as current wireless technologies (LTE and Wi-Fi) are not capable of doing so. 14 Thus, improved communication technology combined with efficient edge resources is needed to route and analyze such data.

REQUIREMENTS AND ENABLERS
Providing communication and computation capacities to support massive-scale IoT applications in smart megacities is challenging because of the limitations of the existing technologies and infrastructure. This section highlights the key requirements for massive IoT deployments in smart megacities. The requirements are summarized in Table 1.

Ubiquitous connectivity and massive IoT connections
Reliable network connections and extensive computational resources provide the foundation for massive-scale, diverse, and highly dense IoT deployments in megacities. To offer smart services at these locations and accommodate the expected massive number of 30 billion connected IoT devices by 2030, 3 ubiquitous connections, optimizations of existing network infrastructure, the deployment of upcoming 5G and beyond (6G) networks, and IoT gateway solutions are required. Currently, IoT connectivity in cities is accomplished through wide area network (WAN) technologies, such as LoRa, Sigfox, LTE-4G, and narrow-band (NB)-IoT, but the current technologies contain several limitations. For example, they are incapable of supporting massive numbers of connections in dense and hyperlocal (for example, indoor) environments. Upcoming 5G (and beyond) networks are expected to support extremely massive connections and provide highspeed connectivity almost anywherefor example, on the ground and sea, in the sky, and even in spaces not covered by existing technologies.

Extreme high-capacity communications
Many megacity use cases require a considerably larger network capacity than current smart city deployments offer to transmit large amounts of data at a fast rate. Examples include continuous image streaming from surveillance cameras with 16K video resolution in 360° with a refresh rate of 240 Hz, transmitting hyperspectral camera images (such as images of size 30-300 MB taken in less than a second for estimating air quality over open areas), and data transmission in multisensory applications (for example, autonomous driving requires data rates and communication capacity exceeding 1 Tbps). Supporting these data rates requires enhanced communication technologies, which in turn require denser deployments of communication infrastructure, providing the necessary infrastructure for integrating edge computing support. The necessary improvements are foreseen to be offered by 5G millimeter-wave (mmWave) communications, which offer much higher bandwidths. While the bandwidth provided by mmWave is promising, the overall loss resulting from path loss, rain attenuation, and atmospheric absorption is more significant for a point-to-point link than in other communication standards. In the near future, terahertz (THz) communications are expected to evolve with 6G communication systems to overcome these performance limitations of mmWave systems.

Ultrareliable and lowlatency communications
Real-time and mission-critical IoT use cases, such as autonomous cars and UAVs, require extremely low-latency communications and effective data processing. These use cases are characterized by fast processing, analytics, and intelligent decision making at the network edge and include scenarios such as avoiding accidents and hazards. The current network technologies, including 4G and NB-IoT, used in IoT communications are generally incapable of fulfilling the extremely low-latency requirements of mission-critical IoT use cases, even when combined with edge computing. Fortunately, 5G networks satisfy most of the requirements, including offering ultrahigh network reliability (>99.99%) and ultrareliable low-latency communications of 1-ms delay per packet transmission for IoT applications. 6G networks are further expected to decrease latency (eventually to <100 μs), offering to balance reliability, latency, and scalability and providing the extremely low latency that massive IoT connections require. 15 Thus, emerging communication standards already cover most of the communication requirements, leaving the provisioning of computing infrastructure the main challenge for supporting megacity-scale deployments. Edge computing can offer this infrastructure while offering flexibility in scaling the deployments to reflect differences in population density across the city.

Localization
Offering accurate and reliable service in megacities also requires knowing the precise position and elevation of where data are collected. The location of the IoT devices also helps to identify the best and nearest edge server to connect with. For example, a sensor mounted on a UAV or a sensor used on higher floors of a building would produce location information in 3D space. Current techniques, such as GPS and experimental low-power WAN (LPWAN) localization methods, provide satisfactory solutions for today's IoT deployments, but they do not meet the demands of future smart megacities when massive numbers of IoT devices are deployed. 5G technologies integrate mmWave communications, and future 6G networks propose THz standards (such as massive deployments of infrastructure, that is, edge nodes, extended use of beamforming, and utilizing dedicated algorithms) that enhance the localization accuracy by providing extremely high-precision positioning, which is expected to achieve centimeter-level accuracy. 16

Data management and analytics
In future smart megacities, massive deployments of IoT devices intended for diverse applications will produce massive volumes of heterogeneous and geographically dispersed real-time data of different varieties. Managing these data will require agile data management architectures, platforms, and data governance policies and practices. Existing smart city IoT data management platforms utilize a layered structure through storing and processing of the data locally on the device, at the network edge, and on centralized clouds. Upcoming 5G and 6G networks will provide enhanced edge computing platforms that overcome the limitations of existing IoT management platforms. Indeed, in 5G, real-time analytics improve the latency of IoT applications by enhancing the processing and storage capacities at the edge, and 6G is expected to further improve the analytics by making AI an integral part of the communication infrastructure.

EXPERIMENTS
The previous sections addressed the technical capabilities of edge servers to support emerging IoT application scenarios tailored to smart megacities. In practice, taking advantage of edge computing also requires a sufficient proliferation of edge servers within the megacity. Next, we present experiments to demonstrate how optimizing the placement can ensure the necessary processing while minimizing resource costs.

Experimentation setup
City-District sensing and data collection. We draw on an air quality dataset collected from a measurement campaign in the Pakila district of Helsinki, Finland, from 30 October 2019 to 15 January 2020. The Pakila district is a crowded residential living area containing many detached houses where air quality in the area is affected mainly by wood burning, fireplace usage, and street dust. For measurements, research participants used 40 portable air quality sensor devices.
After the provision of instructions on how to use the devices, they were delivered to citizens in the area. The citizens were expected to carry these sensors with them in their daily lives. The collected data present the measurements for an area of circa 6 km 2 that approximately maps to the Pakila district. The details of the measurement campaign and air pollution profile of the district are explained in Kortoçi et al. 17 Naturally, we expect smart megacities to deploy significantly more sensors, thus resulting in larger volumes and faster velocity for the data streams. Nevertheless, our experiments are based on data from real users and reflect actual movements in a city district. Thus, our experiments indicate the geographical needs of the processing infrastructure.
Portable sensor devices. The portable sensor devices used for the measurements are based on a BMD-340 module and connect to a smartphone over Bluetooth Low Energy. The accuracy of the air quality sensors has been verified both in a laboratory and in the field. 17 The sensor device is designed as a smart city solution for monitoring air quality at a low cost with an approximate price of US$250 per unit, which decreases to < US$100 when produced in large quantities. The sensor devices are equipped with a Sensirion SPS30 sensor that measures masses of particulate matter, such as PM 2.5 and PM 10 ; meteorology sensors for temperature, relative humidity, and air pressure; gas sensors, including carbon monoxide (CO 2 ), nitrogen dioxide, and ozone; and an ambient light sensor. 18 After the measured variables are transmitted to the smartphone, time stamps and coordinate (positioning) information are combined with the sensor measurements, and the data are transferred to nearby cell towers.

Cell towers.
We collect the coordinates of cell towers deployed in the Pakila district using CellMapper, a crowd-sourced website for cellular towers and coverage mapping. We consider the coordinates of the LTE-4G and 5G cell towers for the three main operators in Finland and collect 25 coordinates for the Pakila area. All three operators offer high-speed connectivity at low prices, even in remote regions, and they also provide 5G New Radio (NR) (on 3,500 MHz) services in cities. Based on OpenSignal, 5G NR services are available to 11% of the population in Finland, whereas LTE-4G service availability is around 93% by the three mobile operators.

Simulations.
We conduct a series of simulations to optimize edge servers' number and placement. As the optimization criteria, we consider a utility function that balances (linearly) workload balance and latency and allows the assigning of weights to these two criteria. We assume the servers are placed at local cell towers, ensuring connectivity. We use the EDISON method for finding the optimal edge server placement as it has been shown to provide improved results over other state-of-the-art methods. 19 We first divide the area into a grid of 100 × 100 cells and then count the number of observations in each cell. We assign the grid cells to their nearest base station, summing up the number of observations as the workload passes through the base station. Finally, we place the edge nodes using the PACK capacitated clustering algorithm, which is capable of balancing the cluster allocation. 20 The PACK method minimizes an objective function that sums the distances between base stations, weighted by their workload, and the locations of the edge servers. The minimization is conducted with constraints for workload balance (that is, the sum of the workloads of the base stations allocated to each edge node) and server homogeneity (ensured by a minimum number of workloads per edge node). The workload balance relates to the processing needs of the deployment, whereas homogeneity determines the latency of the deployment. These two parameters are key to smart megacities as they determine the cost (including energy) of running the edge servers and their capability to react to data generated by sensors. We consider three scenarios with different tradeoffs between latency and workload balance: 1) workload balance prioritized over latency, 2) equal importance for workload balance and latency, and 3) latency prioritized over workload balance. Figure 2(a) shows the values of the PACK objective function for different numbers of edge servers. A lower objective function value indicates a better placement in terms of distance from sensors to edge servers and translates to lower latency for edgebased analytics on the sensor data. The knee point of the curve, that is, the point where the objective function stabilizes, is at 14 servers; additional servers provide only marginal improvements. Figure 2(b) shows the final deployment of the 14 edge servers when latency and workload balance are weighted equally. In the figure, the base station to the immediate left of each edge server serves as the deployment location. The colored dots in Figure 2(b) show the locations where data are generated. The edge servers are placed along the major streets in the resulting deployment. In total, 11 edge servers are placed on the main street intersecting the area and running from north to southeast. The remaining three are located along the main horizontal street running from east to southwest. The sensor data are concentrated along these streets, demonstrating hotspots that would benefit from edge servers.

Results
The three scenarios are further detailed in Table 2. Latency is calculated as the mean distance between edge servers and the allocated base stations, and balance is the standard deviation of the workload for the edge nodes in each scenario. The tradeoff between latency and balance is clearly visible, with improvements in latency decreasing workload balance and vice versa. Indeed, compared to the scenario where latency and workload are weighted equally, the scenario focusing on balance has a very homogeneous node distribution, reducing the standard deviation by 96% while increasing the latency by 31%. Conversely, the scenario emphasizing latency more than halves (-58%) the mean distance between base stations and edge servers, but it comes with a cost of a very high standard deviation in the resulting edge server workload (+689%). The 40 citizens carrying the portable sensors were recruited through random sampling, and the resulting mobility patterns reflect the movements of the people in the sample. Generally, the patterns reflect mobility along the main roads in areas that contain main concentrations of shops, transportation options, and other functionality and hence can be argued to be representative of the mobility patterns within the district. Nevertheless, the server deployment is always relative to the data availability. Therefore, the deployment should be reoptimized in case there are differences in the mobility patterns of people within the district (for example, new constructions). In practice, the dominant mobility patterns are likely to remain; hence, scaling up deployments to changes would likely require placing new servers in expanding areas. In case more significant changes occur-for example, as in the case of the COVID-19 pandemicthe deployment would need to be reoptimized, and any application components running on the edge servers would need to migrate to other servers.
We also note that the optimization was solely based on data from mobile air quality sensors, and, in practice, the smart megacity would contain a mixture of mobile and static sensors, as envisioned in Figure 1. Adding sensors into (or removing them from) the district would naturally also require recalculating the optimal deployment, but, in practice, the resulting model would likely prioritize areas that are close to main intersections as they are natural points of focus for activity in urban areas.

Circular smart cities
Computing for environmental sustainability is currently an active area of research, and many sensing technologies for detecting potential ecological impacts of human behavior are emerging-ranging from climate to litter and overall use of resources. This indicates that the transition to circular economies is likely an important source of sensor data for emerging megacities. Carefully designing edge server placement is thus important for supporting sensor data analysis while ensuring the resulting analysis can be carried out cost-effectively and with minimal resources. The edge servers can further take advantage of public infrastructure, for example, by being deployed at transportation hubs or public transport vehicles to provide ubiquitous computing support for citizens.

Benefits to city stakeholders
Municipal authorities, organizations, and companies have specific use cases and use the collected data to develop applications that serve the improvement of the city and its inhabitants. For example, an organization that operates massive air quality sensors can utilize the collected data for monitoring green travel routes and even offer pollution exposure analytics for citizens. Similarly, while UAVs are delivering parcels to their destinations, they can avoid generating CO 2 as the operating organization can use onboard air quality sensors or even perform video streaming during flight for real-time analysis of ground activities for the benefit of the smart city as a whole.

Intelligence on the edge
AI is bound to become an integral part of future 6G networks. Edge servers provide opportunities to deploy AI and machine learning models and enable analysis and predictions based on sensor data. This can increase the smartness of future megacities, for example, by predicting traffic on a specific route

Extensions
Our experiments focused on air quality monitoring as a representative example of smart city applications that require access to computing resources. Naturally, other application domains should also be investigated to better understand the requirements posed on smart city infrastructure. We plan to further explore this dimension and consider multimodal data combining traffic, weather, mobility, and surveillance data. Another potential extension is to investigate more complex utility functions that also consider parameters other than cost and latency, such as interference, bandwidth, average connectivity, and overall network load. This would not only benefit city planners but also provide network operators insights on how to further optimize network and edge placements.
T he development of future smart megacities is far from straightforward as cities continue their unprecedented growth-both in the number of inhabitants and the number of networked devices available within the city. Current cloud-centered data processing models are insufficient as they rapidly become too costly for the needs of cities, even if they can offer sufficient elasticity. We presented a vision for using edge computing to satisfy the computing needs of emerging smart megacities while at the same time better supporting scalability and cost-effectiveness compared to cloud-centered models. We also described how optimizing the edge server placement is necessary to ensure the best possible computing resource availability. Overall, our research demonstrates how edge computing deployments are essential for fulfilling smart megacities' increasing data processing requirements and ensuring the business models remain feasible for offering the broadest range of services to citizens and the cities themselves.