A Large-Scale Simulator for NB-IoT

This paper presents a large-scale comprehensive machine-to-machine NB-IoT (narrowband IoT) traffic simulator designed to study IoT application performance in large-scale environments, such as smart cities. The simulation system uses real geographical data to define a wide range of devices characterized by location, packet generation pattern, and network access properties. The key performance indicator metrics are collected during simulations to evaluate the way that various factors affect the "machine quality of experience.".


I. INTRODUCTION
Machine-to-machine (M2M) communication, a type of communication between devices without human intervention, enables the development of new applications that improve the quality of life in smart cities through the use of Internet of Things (IoT) devices. Although characterized by low individual data throughput, these machines can collectively generate high traffic that puts pressure on the telecommunications infrastructure. With few IoT devices, operators may rely on 4G technologies, such as LTE and LTE-A. However, as we are witnessing the dawn of a totally connected world, IoT applications are expected to explode in the near future. This means that we are no longer referring to a few devices per base station, but of thousands of devices geographically distributed all over the city to contribute to increasingly complex IoT applications. The danger of running all these applications over standard 4G networks is that IoT traffic will eventually hinder human traffic.
As a result, the answer from the 3GPP was to propose two different technologies that enable the use of separate channels for IoT traffic. One such technology is the LTE-m [1]- [3], which is mostly used for applications requiring mobility and high bandwidth, while the other is the narrowband IoT (NB-IoT). At the time of writing, many operators offered both technologies alongside 4G and even 5G services, and allowed the individual client to choose a technology to use.
NB-IoT is a robust technology for very large-scale low bandwidth applications and it is optimized for very low device power consumption. The technology has been defined to address the needs of IoT devices for the outdoors and indoors, dense connectivity, as well as long battery life and low cost [4]. Some have even tried underground coverage [5]. Moreover, while 3GPP Release 17 will provide recommendations to interface 5G NR with non-terrestrial networks (NTN), the feasibility adaptation of the NBIoT to the NTN has already been considered [6]. This confirms the rapid adoption of the NBIoT and its use in large-scale settings.
A key problem with the proliferation of IoT applications is having the ability to detect that they are doing well. This is particularly true for distributed applications over a large geographical area, such as a smart city setting. A few questions we ask include: Is the system working as expected? Is data entering at the correct pace? Is there a problem somewhere in the network? In a human network, these types of questions are often answered after customers complain to operators. However, in machine or fully automated networks, it is much more difficult to evaluate performance and detect problems.
It is in this context that we coined the term "Machine Quality of Experience" (MQoE) to define how well machine applications are doing based on KPIs (key performance indicators) that correspond to particular applications. In our previous work [7]- [10], we presented a way to assess MQoE based on applications that work on a large-scale 4G network. We built a large-scale 4G network simulator that mapped applications on the city infrastructure and considered the real location of hundreds of base stations all over the city. This work introduces the NB-IoT enhanced simulator that specifically considers the NBIoT technology in very largescale networks that span all over a city.
Despite gaining acceptance all over the world, few works exist that deal with the modeling and simulation of NB-IoT networks (some exceptions are [11]- [16]).
In [11], the authors proposed modeling random access in the NB-IoT using Markov chains to compute the system throughput and compared the analytical results with simulations. In [12], an NB-IoT simulator based on OPNET was introduced. In a different work [13], an open source NB-IoT simulation tool based on LTE-Sim was presented.
More recently, [14] proposed a model for an NB-IoT uplink scheduler based on a state machine. The authors evaluated the performance of the proposed model using the Simulink state-flow toolbox in MATLAB. The work considered four standard data rates and used four KPIs for performance analysis.
The common ground in the aforementioned proposals is that simulations and analyses have been performed for a single cell with a single base station (eNB) and for a moderate amount of user equipment (UEs).
Other possible avenues to assess the NB-IoT performance are testbeds, such as those presented in [17]- [19]. Another attempt to implement a standalone NB-IoT using a testbed is presented in [15]. The work compared the performance of the network in terms of uplink and downlink throughputs with the device specifications. Other projects have used testbeds to evaluate IoT application performance for technologies other than NB-IoT, either with real-world experimentation [20] or with virtualized networks [21].
Even though testbeds use real devices, they are typically used to set an NB-IoT test network for some specific deployment scenarios. Furthermore, similar to the reviewed simulation tools, these testbeds are only used for very small networks consisting of a single eNB and few UEs.
In contrast to the abovementioned simulators, tools and testbeds, we present a large-scale NB-IoT simulator in this paper that is capable of dealing with a smart city consisting of hundreds of eNBs and thousands of UEs. Our simulator is designed based on realistic databases obtained from smart city open data projects. Even though the data we use here are provided from the open portal of the city of Montreal [22] and the real positions of eNBs are available from Industry Canada [23], the methodology used to create the simulator is transportable to any city of the world. The goal of the simulator is to efficiently generate realistic results that show how IoT applications perform in urban settings to better understand their behavior and predict potential issues. For example, the previous version of the simulator was used to detect eNodeB failures [10].
The reminder of the paper is presented as follows. Section II describes the components of the simulator and the collected data. Simulation details are given in Section III. Section IV demonstrates the test cases and discusses the results. Conclusions and observations are given in Section V.

II. SYSTEM DESCRIPTION
A key element for our large-scale simulation system is the use of real geographical data about the city environment (i.e., location of urban elements) and the telecommunications infrastructure (base station locations and features). When using these two features as entry for the simulator, we are actually creating a large-scale virtual setting where devices that are located on urban elements all over the city exchange information with the telecommunications infrastructure. Another important aspect is that, to accelerate the simulation procedure and account for large networks, the physical layer is Abstracted, and an emphasis is placed on the connection procedures and the packet transmissions, both in the uplink and downlink directions. Moreover, only the user plane functionalities are simulated. The program allows comparing the NB-IoT with other LTE devices, for example, in terms of network resource usage efficiency or connection duration. The third element is that applications that require the usage of different base stations can also be simulated so that the results can be used to analyze the network behavior at the city scale.
The following subsections describe the initial databases, the propagation models and how simulations are created, executed, and analyzed.

A. INITIAL DATABASES
As mentioned above, two distinct databases are created from publicly available data: 1 The city infrastructure database 2 The telecommunication infrastructure database The city infrastructure database contains the geographical location of the following elements extracted from [22].
• bus stops • cameras • fire alarms • houses • parking spots • pedestrian crossing • traffic lights • traffic signs For example, in Fig. 1 we present a snapshot of the bus stop locations all over the city of Montreal. This simple snapshot can convey to the reader the large-scale features of the simulation system, as one can imagine that each one of those tiny dots is producing IoT traffic, alongside other elements that can be added to the study (security cameras, parking spots, etc.).
The telecommunications infrastructure information is, however, kept by Industry Canada [23].
The infrastructure database that we create contains the following information:

B. PROPAGATION TOOL
With the information about the telecommunication infrastructure, we employ our own propagation tool called GGTool (for Grid Genaration Tool), which was developed for this project. It divides the simulated area into small rectangular regions (called grid points), where the size of each grid point is parametrized and represents the granularity. Then, the path loss, based on the COST-HATA empirical radio propagation model [24], is used to compute the received power from every antenna at the center of each grid point. Afterwards, the list of the first N antennas (in decreasing received power), where N is a parameter selected by the user, is associated with the corresponding grid point, and any UE located at that grid point will use that list of antennas for connection. Initially, the UE connects to the antenna on the top of the list. If this antenna becomes unavailable due to a failure, the UE FIGURE 3: Coverage of the city by base station disconnects from it and selects the next one in the list. The coverage of the city of Montreal by the best antenna/eNB from one service provider, colored by eNodeB, is shown in Figure 3. In this figure, we have 478 different eNBs with a total of 3828 antennas.

C. SIMULATED NETWORK CREATION 1) Preparing the network
To create the simulator entry network, the following types of real geographical data and propagation information described above are needed: • Device positions The device types and positions all over the city are extracted from the city infrastructure database, allowing the creation of potential UEs that will be defined for particular applications.
• Base station positions and features the telecommunications infrastructure database is used for detailed propagation modeling all over the city.
• City topology refers to the resulting coverage that is obtained once the propagation modeling tool (GGTool) is run all over the city. It provides the signal strength all over a city grid predefined by the simulation user. The final coverage will result in determining which UEs are attached to which antenna of the network.

2) Definition and connections of IoT applications
Simulated IoT applications are defined by selecting the appropriate devices and their positions from the infrastructure database. For example, one application can be created by randomly selecting a specific number of cameras, while another application may be related to bus stops. This allows for a realistic estimation of the number of devices assigned to a base station in a real network. Additional parameter sets are required to model an application, such as the connection and transmission parameters VOLUME 4, 2016 presented in Tables 1 and 2. Connection parameters, such as the available RACH preambles, the maximum number of RACH attempts and the backoff time determine how devices connect to the base stations. Available preambles RACH preambles that can be sent by application devices. The NB-IoT devices rely on a simplified procedure [4].
Maximum attempt number Maximum number of RACH attempts before declaring that a connection has failed.

Backoff indicator
Maximum wait time after a collision.
Transmission parameters, however, are necessary for setting the way that packets are transmitted and received. As portrayed in Table 2, we can play with the packet length and generation patterns, the scheduling strategies, the number of repeated transmissions needed to reach the devices and the channel quality indicator. We can also establish different features for the uplink and the downlink transmission. With the application-specific connection and the transmission parameters, an arbitrary number of applications can be created to simulate a wide range of IoT devices. Note that the NB-IoT devices and the devices relying on conventional LTE can coexist, but as they use different connection procedures, they cannot collide on RACH attempts.

D. IMPLEMENTATION
The software is a discrete-event simulator: all actions (sending a packet, connecting to a base station, disconnecting, etc.) are modeled as discrete events in time. To improve performance, the simulator assumes that no state change occurs in a device between two actions. Therefore, instead of updating all devices on fixed time increments, the simulator skips operations that would not lead to state changes. Another performance optimization is multiprocessing. The software splits the area into base station cells and simulates them in distinct processes. This enables scaling simulations up in terms of number of devices and the total area. The software is programmed in Python and makes extensive use of the NumPy and SciPy libraries to manipulate arrays and generate numbers from statistical distributions.

III. SIMULATION DETAILS
Devices perform connections and transmissions as configured by the input parameters. When the simulation clock reaches a point that matches the transmission time of a packet, the device, if not connected, connects to a base station and then starts a transmission.

A. RACH ATTEMPTS
Connections between UEs and antennas (base stations) can be contention-based or contention-free. During the contention-based attempts, the devices randomly select one preamble in a list of available ones and send it to their associated antenna. If two or more devices send the same RACH preamble to the same antenna at the same time, a collision occurs. Following an exchange of four messages between the antenna and the devices (random access preamble, random access response, scheduled transmission, and contention resolution), the collision is detected, and no connection is granted to any of the UEs involved in the collision. Each device then waits for a random delay that is lower than a predefined backoff indicator before performing another RACH attempt [25].
Contention-free attempts rely on reserved preambles that prevent collisions from occurring. They require devices to be in the RRC_CONNECTED state.
The NB-IoT devices rely on a procedure similar to that used by LTE devices to gain access to a network. A device randomly selects one subcarrier among those available in a resource block. Using the initial transmission time, the subcarrier number and the cell identity, the device then transmits a pseudorandom sequence to its associated base station. This sequence is used by the base station to resolve the origin of a request; thus, if two devices in the same cell perform a random access procedure at the same time and using the same subcarrier, a collision will occur, and both devices must start over with a new connection procedure. Since the NB-IoT uses a specific channel for connection (Narrowband-PRACH, or NPRACH), no collision between regular LTE and the NB-IoT devices is possible.
The start subframe at which devices of an application are allowed to begin a RACH attempt is configured before the simulation. The default number of subcarriers in a bandwidth of 180 kHz (one NB-IoT channel) is 48, but the range of available subcarriers for an application can also be configured. Therefore, it is possible to reserve some subcarriers for a specific application if this one must avoid collisions with other devices (for example, low latency applications).
In the simulator, RACH attempts are performed when a device in the RRC_IDLE state must switch to the 4 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186365 RRC_CONNECTED state to transmit or receive the packets. In real networks, such attempts occur in five other cases: • to reconnect after link failure, • to perform a handover, • to resynchronize upon UL or, • DL arrival, • and for positioning. However, these cases are not considered, as we made the choice not to simulate the control plane to improve the performance. Thus, antenna selection is solely based on the signal strength received at the location of the devices.
The RACH delay, which averages 50 milliseconds [26], is modeled as a duration that can be parameterized as a constant or uniformly distributed within a margin of error. These delays can be set for specific applications.

B. TRANSMISSIONS
Machine-type traffic designates traffic generated without human interaction. It differs from human traffic and exhibits a large variety of transmission patterns [27]. The simulator uses two distribution functions for each application to determine packet interarrival time and packet size. Table 3 lists the used distributions defined in the SciPy library. The smallest time-frequency unit in LTE is a resource block (RB) defined as a subchannel of 180 kHz spanning one subframe of 1 ms [28]. The simulator determines the number of available RBs per antenna from their frequency bandwidth, which is retrieved from the database. When packets are received by the antenna, they are scheduled according to one of the strategies listed below: • FIFO (First In First Out): Packets are scheduled according to the order of arrival.
• RR (round robin): Packets are scheduled with equal portions of time in a circular manner.
• BET (Blind Equal Throughput): Packets with the lowest average throughput are prioritized.
• MT (maximum throughput): Packets with the highest throughput are prioritized.
• PF (Proportional Fair): Similar to BET, but weighted with the maximum throughput. The amount of data that can be sent in one RB depends on the quality of the transmission channel, which is measured by the channel quality indicator (CQI). This indicator is defined for each application and can be constant, random, or determined with a linear interpolation from the received signal strength. The modulation and the coding scheme (MCS) is computed as a linear function of the CQI and is used to determine the transport block size (TBS) from the tables found in 3GPP standards [29]. This parameterized procedure determines the data throughput for the simulated applications. LTE devices can use several resource blocks, but NB-IoT devices can only use one at a time, which leads to a lower data throughput.
The NB-IoT devices can repeat transmissions to increase the reception probability, which is useful for devices located in challenging positions [4]. Depending on the depth of the devices, which is configured at the creation of the simulation, specific repetition numbers can be set.

C. RESULT COLLECTION
Key performance indicators (KPIs) assess the quality of experience observed by human users or, in the context of IoT applications, the automatically operated devices. As the concept of quality of experience has been mostly applied to human-type communication, the simulator aims at providing insight into machine-type communication quality of experience by collecting KPI metrics for the simulated IoT applications. KPIs are obtained for each application and base stations. That way, the quality of experience can be characterized in any region of the simulated area. Each KPI consists of an averaged value, a minimum, and a maximum. Table 4 presents the list of KPI metrics collected to evaluate the performance associated with the RACH attempt.  Table 5 presents the list of KPI metrics collected to evaluate the performance associated with the packet transmissions.
Different statistics and statistical granularities can be extracted from the collected measures. These statistics, added to data mining features, allow us to assess applications, base stations and network performances. Moreover, the distributed nature of our modeling permits the evaluation of entire geographical regions, as we can see in the results section.

D. NB-IOT AND LTE DIFFERENCES
To summarize, the behavior of NB-IoT and LTE devices in the simulator differ in the three main following ways: VOLUME 4, 2016 Transmitted packet count total number of packets transmitted to the network.
Total traffic traffic volume in bits.
Used RBs number of used resource blocks.
Used RBs percentage ratio of used RBs over available RBs.
Throughput ratio of transmission volume over transmission time.
Waiting delay time between packet creation and transmission start.
Transmission delay total duration of a transmission between a device and a base station.
• NB-IoT and LTE devices use different channels for RACH attempts.
• NB-IoT can use at most one resource block at time while LTE does not have this limit.
Given the lack of public large-scale NB-IoT data needed to validate our NB-IoT simulator, we first ensured that the results that it produces for LTE devices are similar to the ones previously obtained [7]. We also performed systematic tests, in particular with boundary conditions, to make sure the results are sound.

IV. SIMULATIONS
This section illustrates the capabilities of the simulator with the simulation results.

A. SETTINGS
The simulation is composed of five main applications (collections of similar devices). They were selected to highlight the differences between LTE and the NB-IoT.
1 Regular LTE devices located on houses (2500 devices) 2 Outdoor NB-IoT devices located on houses (2500 devices) 3 Indoor NB-IoT devices located on houses (2500 devices) 4 Regular LTE devices located on traffic lights (1804 devices) 5 Outdoor NB-IoT devices located on traffic lights (1804 devices) The devices are placed by randomly sampling locations from real objects. For instance, applications 1, 2, and 3 are independently created by placing the devices on houses. Thus, one location places at most three devices (one of each application). Table 6 shows their packet transmission parameters. They are identical for all applications to ease comparisons and selected to match the IoT device's behavior (a large number of small packets). The simulation is divided into three phases of 150 seconds. During the first phase, only the five applications listed above operate. During the second phase, a sixth application begins to transmit packets using regular LTE to simulate sudden traffic caused by a large number of devices. These 27, 000 devices are randomly placed in positions at houses. Table 7 shows the transmission parameters of this "traffic" application. The exponential distribution is selected for the packet generation times to introduce more randomness. During the third simulation phase, the traffic application ceases to operate, which makes more resources available for the other applications. Each phase is performed as a distinct simulation. The number of connected devices at the end of one phase is used as the initial parameter for the subsequent simulation, which ensures continuity.

B. RESULTS
The simulations took 601 seconds to complete using 12-core processor with a 3.2 GHz frequency and 64 GB of RAM. They utilized, in total, 12, 912 devices during phases 1 and 3 and 39, 912 devices during phase 2. Approximately 7.95 million packets were generated in total. The results Highlight the differences related to technology type, location, and repetitions.
Technology type: LTE and the NB-IoT devices do not exhibit the same behavior. Since LTE can use more bandwidth than the NB-IoT during transmissions, LTE devices have a higher throughput, which leads to smaller transmission delays. The connection procedures of LTE and the NB-IoT have similar durations but rely on different channels, so they cannot collide with each other. Figure 4 shows the value of two KPIs, the average transmission delay and the average RRC delay over time.
The average RRC delay of the LTE application increases by approximately 1.5 ms, while the same KPI for the NB-IoT application does not vary. This difference is due to the collisions with the "traffic" application with occurrences that lead to repetitions of RACH attempts, and thus, to an increase in the procedure duration. 6 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.  The transmission delay of the LTE application increases by approximately 0.5 ms when the traffic application starts while the same KPI for the NB-IoT increases by approximately 0.25 ms. The first reason for this increase is scheduling. The base stations in these simulations use FIFO as the scheduling strategy. As more packets need to be sent, the transmission of some packets is delayed, which increases the duration of both LTE and the NB-IoT. The second reason is that high traffic reduces the bandwidth available for LTE devices, which decreases the throughput. Since the NB-IoT uses one subchannel, its throughput does not decrease further during periods of high traffic. Hence, the performance loss for LTE is more severe than for the NB-IoT.  : Average transmission delay of the NB-IoT devices at two different types of locations. Since the "house" application shares its distribution with the "traffic" application, the devices are located in cells with more traffic, which explains the higher delays when the additional traffic starts at 150 seconds.
Location: One feature of the simulator is its ability to model devices located on the position of real IoT devices. Figure 5 presents the average transmission delay of two NB-IoT applications, one with device locations that are sampled from house positions and another application from traffic lights. Since the "traffic" application is also generated from the house positions, the devices of one application tends to be located in cells with more traffic during phase 2. The application created from the traffic light positions exhibits a different distribution pattern, and thus, its devices are less likely to be placed in cells with high traffic. Figure 5 presents the global results of the applications over time (i.e., the cumulative results obtained in all cells). Data can also be presented with heatmaps to visualize how performances vary depending on the cells. The regions correspond to base station cells, and the the color intensity indicates the value of one KPI. Figure 6 illustrates the performances measured in different regions. The distribution of devices affects the regional performances of the applications.
Repetitions: The simulations composed of an NB-IoT application can transmit packets 1, 4, or 8 times to improve reception. The number of repetitions is assigned randomly to each device. Table 8 lists the KPIs obtained from the first phase of the simulation that highlight the loss of the performances caused by repetitions. The table presents the results. obtained from the applications with the devices located on the houses. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186365 (a) Houses (b) Traffic lights FIGURE 6: Average transmission delay for devices located around houses and traffic lights. Since the devices are distributed differently, performances vary according to the application. For instance, as traffic lights are less numerous in the west, more resources are available, which decreases the delays. Repetitions cause the indoor NB-IoT application to use more resource blocks (which impacts other applications as fewer resources become available for them) and spend more time transmitting the data. Despite these disadvantages, repetitions are necessary to ensure that the packets of the IoT devices are well-received. Thus, the simulator can be used to estimate how much of the performance is affected by the repetitions.

V. CONCLUSION
This work presents a large-scale NB-IoT simulator that realistically models IoT applications all over a city. To place the simulation system, publicly available data were gathered and incorporated into databases to create virtual base stations and devices. Connection procedures and data transmissions were simulated according to 3GPP specifications. The results show that the simulator is quite versatile and can be used for different purposes, such as to assess the performance of the NB-IOT machines against that of other LTE devices. Moreover, although out of the scope of this paper, simulation data is being used as entry for data mining and machine learning algorithms to provide more insights into the behavior of large-scale IoT networks.