Accuracy Versus Complexity for mmWave Ray-Tracing: A Full Stack Perspective

The millimeter wave (mmWave) band will provide multi-gigabits-per-second connectivity in the radio access of future wireless systems. The high propagation loss in this portion of the spectrum calls for the deployment of large antenna arrays to compensate for the loss through high directional gain, thus introducing the need for a spatial dimension in the channel model to accurately represent the performance of a mmWave network. In this perspective, ray tracing can characterize the channel in terms of Multi Path Components (MPCs) to provide a highly accurate model, at the price of extreme computational complexity (e.g., for processing detailed environment information about the propagation), which may limit the scalability of the simulations. In this paper, we present possible simplifications to improve the trade-off between accuracy and complexity in ray-tracing simulations at mmWaves by reducing the total number of MPCs. The effect of such simplifications is evaluated from a full-stack perspective through end-to-end simulations, testing different configuration parameters, propagation scenarios, and higher-layer protocol implementations. We then provide guidelines on the optimal degree of simplification, for which it is possible to reduce the complexity of simulations with a minimal reduction in accuracy for different deployment scenarios.

Abstract-The millimeter wave (mmWave) band will provide multi-gigabits-per-second connectivity in the radio access of future wireless systems. The high propagation loss in this portion of the spectrum calls for the deployment of large antenna arrays to compensate for the loss through high directional gain, thus introducing the need for a spatial dimension in the channel model to accurately represent the performance of a mmWave network. In this perspective, ray tracing can characterize the channel in terms of Multi Path Components (MPCs) to provide a highly accurate model, at the price of extreme computational complexity (e.g., for processing detailed environment information about the propagation), which may limit the scalability of the simulations. In this paper, we present possible simplifications to improve the trade-off between accuracy and complexity in ray-tracing simulations at mmWaves by reducing the total number of MPCs. The effect of such simplifications is evaluated from a full-stack perspective through end-to-end simulations, testing different configuration parameters, propagation scenarios, and higher-layer protocol implementations. We then provide guidelines on the optimal degree of simplification, for which it is possible to reduce the complexity of simulations with a minimal reduction in accuracy for different deployment scenarios.

I. INTRODUCTION
R ECENT developments have paved the way towards 5th generation (5G) cellular networks and enhanced Wireless Local Area Network (WLAN) designs, to address the traffic demands of the 2020 digital society [2]. In particular, 5G systems will support very high data rates (with a peak of 20 Gbps in ideal conditions), ultra-low latency (around 1 ms for ultra-reliable communications), and a 100x increase in energy efficiency with respect to previous Manuscript received July 6 wireless generations [2]. To meet those requirements, the 3rd Generation Partnership Project (3GPP) has released a set of specifications for NR, the new 5G Radio Access Network (RAN). Similarly, the Institute of Electrical and Electronics Engineers (IEEE) has developed amendments to 802.11 networks, namely 802.11ad and 802.11ay [3], which operate at millimeter waves (mmWaves). 3GPP NR carrier frequency can be as high as 52.6 GHz for Release 15 (even though future Releases may include extensions up to 71 GHz [4]), while IEEE 802.11ad and 802.11ay exploit the unlicensed spectrum at 60 GHz [3]. The vast amount of available spectrum at mmWave frequencies will enable multi-Gbps transmission rates [5]. Moreover, the short wavelength makes it practical to build large antenna arrays to establish highly directional communications, thus boosting the network performance via beamforming or spatial multiplexing [6]. Despite these promising characteristics, propagation at mmWaves raises several challenges for the design and performance of the whole protocol stack [7]. First, the communication suffers from severe path loss, which prevents long-range omni-directional transmissions. Second, mmWave links are highly sensitive to blockage from common materials (e.g., brick and mortar), which may result in more than 40 dB of attenuation at 28 GHz when losing Line-of-Sight (LoS) [8]. Third, the delay and the Doppler spread (which determine the temporal and frequency selectivity of the channels) are particularly strong at these frequencies and may lead to network disconnections [9]. Finally, directional communications require the precise alignment of the transmitter and receiver beams, hence implying an increased control overhead for channel estimation and mobility management [10], [11].
The combination of these phenomena makes the mmWave channel extremely volatile to mobile users. Although some early performance evaluations have suggested that mmWave networks can offer orders of magnitude greater capacity than legacy systems (e.g., [12]), a deeper understanding of the propagation channel is required to reliably characterize such networks. In this sense, experimental testbeds make it possible to examine the network performance in real-world environments with extreme accuracy [13]. However, the prohibitive cost and limited flexibility of these platforms make this approach impractical for most of the research community [14].
Therefore, theoretical analyses and computer-aided simulations have emerged as important tools in evaluating the performance of novel solutions and the interplay between the mmWave channel and the deployment and protocol design. Both analysis and simulation, however, require proper model-This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ ing of signal propagation to accurately reproduce the behavior of mmWave systems [15], [16]. On one side, analytical studies model the channel using a Nakagami-m or Rayleigh distribution (e.g., [17]). This approach, while simplifying the analysis significantly, assumes a rich multipath channel when in fact it is sparse at mmWaves [18]. Similarly, stochastic Spatial Channel Models (SCMs), e.g., [19] for 3GPP NR, characterize the channel as a combination of random variables fitted from real-world measurements, providing a more realistic assessment of the mmWave network performance compared to their analytical counterparts [20]. Still, the stochastic nature of these models may prevent researchers from evaluating the impact of the channel dynamics in specific environments, and may respond poorly to the need for an accurate characterization of the spatio-temporal evolution of the channel's Multi Path Components (MPCs) [21].
Conversely, Ray Tracers (RTs) can be used to precisely model the propagation of mmWave signals in specific scenarios [22], [23]. Unlike analytical or stochastic models, RTs are based on the geometry of the scenario and characterize the different propagation properties of each MPC, including time delay, Doppler shift, polarization, Angle of Departure (AoD) at the Transmitter (TX), and Angle of Arrival (AoA) at the Receiver (RX), thus providing higher accuracy [24]. Moreover, simulators can use ray tracing to model the temporal and spatial evolution of the channel, a necessary feature for a proper planning of wireless systems. However, the generation of the MPCs can be computationally expensive, limiting the scalability of simulations. It is thus fundamental to find a compromise between accuracy and reliability, a research challenge that, to date, has not yet been thoroughly addressed in the literature. This paper represents the first comprehensive study of how simplifications to a mmWave RT-based channel modeling can be used to reduce the computational complexity of system-level simulations without compromising their accuracy. With respect to our previous contribution [1], which only focused on physical-layer metrics, this paper considers a full-stack approach, investigating how the simplifications impact the higher layers of the 3GPP NR protocol stack. Specifically, we target the following objectives: • We propose simplification techniques for ray tracing based on the Method of Images (MoI) to speed up network simulations and the RT itself. Specifically, expanding on our previous work [1], we consider a simplified RT implementation that processes only MPCs whose received power is above a certain threshold (relative to the strongest MPC) and another that limits the maximum number of reflections for each MPC. • We showcase a publicly available and open-source RT [25] supporting a Quasi Deterministic (QD) model for mmWave simulation at 60 GHz applying the aforementioned simplification strategies. • As our main contribution, we carry out an extensive end-to-end, system-level simulation campaign to quantify the impact of the RT simplifications on several network metrics: -We consider a full-stack performance evaluation and the impact of the simplifications on different simulators. To do so, we integrate the RT implementation in ns-3 [26], [27], a network simulator with a complete TCP/IP protocol stack that makes it possible to simulate the end-to-end network performance, and in a custom MATLAB simulator, for link-level metrics. Specifically, we characterize the Signal-to-Interference-plus-Noise Ratio (SINR), throughput, and latency for different traffic regimes at the application layer as a function of the degree of simplification introduced in the RT. We also consider both indoor and outdoor scenarios, to characterize different mobility and propagation characteristics, as well as different antenna array configurations. Additionally, with ns-3, we simulate the interaction of the simplifications with different transport layer protocols, namely the User Datagram Protocol (UDP) and the Transmission Control Protocol (TCP), and with different applications. Our results show that there exist some configurations for which the impact of the RT simplifications is limited, especially when higher-layer performance metrics are considered, with a reduction in simulation complexity of up to 4 times. -We assess the effect of the QD model on the network performance, showing that the additional stochastic diffuse components make the throughput and latency fluctuate more over time, while the RT simplifications generally result in more stable channels. • We provide guidelines on the optimal working points, corresponding to the best combinations of simplifications and system configurations, for which it is possible to decrease the computation time and complexity of simulations with a minimal reduction in accuracy with respect to the baseline RT implementation (i.e., without simplifications). 1 The rest of this article is structured as follows. In Sec. II we review the most relevant analytical, stochastic, and deterministic models for the mmWave channel, while in Sec. III we present the QD model we adopted for ray-tracing operations at mmWaves. In Sec. IV we describe the proposed simplifications to be applied to the RT, while performance results are provided in Sec. V. Finally, Sec. VI concludes the paper with suggestions for future work.

II. CHANNEL MODELING AT MMWAVES
The modeling of the mmWave channel, in terms of propagation and fading, has been a key research topic in recent years [15]. Multiple measurement campaigns have tried to characterize the properties of the mmWave spectrum in diverse scenarios and environments (e.g., urban [5], rural [28], and indoor [24]), and have led to the definition of different channel models.
Comprehensive reviews, discussing propagation, fading, and beamforming models can be found in [18], [29], while [30] specifically focuses on propagation loss. These efforts have identified the key factors for an accurate modeling of the mmWave channel. First, multipath components are sparse in the angular domain, thus an accurate model should explicitly characterize the AoA and AoD of the different taps. Moreover, blockage at mmWaves has a more remarkable impact on the link dynamics than at sub-6 GHz, which should be accounted for. Finally, rough surfaces could generate more diffuse scatterers than at longer wavelengths.
The aforementioned measurement campaigns have led to different modeling approaches for the mmWave channel, which have various degrees of complexity and accuracy, and can be applied to different contexts and evaluations. In the next paragraphs, we will review three broad families, i.e., analytical, stochastic, and quasi-deterministic channel models.

A. Analytical Channel Models
MmWave analytical studies generally used simplified channel models, based on propagation and a single random variable for fading, with Rayleigh or Nakagami-m distributions [17]. Moreover, these are usually combined with a sectorized beamforming model for directional transmissions. This accounts for the beamforming gain G by assigning a maximum gain G M to a main lobe, of angular width θ b , and a lower gain G m for the simplified side lobes in the complementary angular space [31]. This simplified model limits the accuracy in the representation of the interaction between the mmWave propagation, the realistic antenna arrays, and the beamforming strategies [15].

B. Stochastic Spatial Channel Models
An improved characterization can be achieved using stochastic Spatial Channel Models (SCMs) [32]. These are based on a channel matrix H ∈ C U×S , with S (U ) being the number of antenna array elements at the transmitter (receiver). Each entry (i, j) in the matrix H models the channel between two specific antenna elements, and represents the joint effect of different MPCs. Each MPC is characterized by angles of departure and arrival, power, and delay. The interaction with the antenna arrays can be modeled by pre-and post-multiplying H by the beamforming vectors of the transmitter and receiver, respectively [20].
For stochastic SCMs, the MPCs are generated from a set of random distributions, whose parameters are determined by statistical fits on channel measurements. The channel matrix thus has a stochastic nature, with the advantage that multiple instances can be randomly generated for generic, large scale scenarios. Notable examples of stochastic channel models are those derived from WINNER and WINNER-II models [33], e.g., the 3GPP channel model for 5G deployments [19]. These models have been extensively used in the performance evaluations of mmWave networks [34], and are also integrated with popular network simulators [35]. The NYU channel model for 28 GHz and 73 GHz is also based on a stochastic SCM [20].

C. Quasi-Deterministic Channel Models
The stochastic nature of the aforementioned channel models makes them generic: they can model a common rural or urban scenario, but not a specific scenario (e.g., Times Square in NYC). Therefore, they do not accurately model the interactions of the mmWave signal in a peculiar deployment, and cannot be used for detailed planning and capacity studies in real-world contexts.
As discussed in Sec. I, RTs can, instead, provide extremely accurate propagation results in a given environment, provided that its characterization in the simulation is accurate enough. With respect to stochastic channels, an RT generates the exact MPCs that arise from a direct or reflected propagation path in the scenario [22]. Ray Tracers have thus been the basis for several performance evaluation studies at mmWaves, e.g., [23], [36]. RT-based channel models have also been adopted as candidates for the evaluation of IEEE 802.11ay networks, with a QD extension [24], [37] that combines the geometry-based MPCs and a number of random diffuse components that model the interaction of the mmWave signal with rough surfaces.
However, while being extremely precise, RTs and QD models are also more computationally intensive than stochastic models for the generation of a single channel instance, especially if the number of scattering and reflecting surfaces in the scenario is large. A number of optimizations have been studied for RTs in general [38]. Two main aspects are generally considered when trying to enhance the RT performance [39]: (i) reducing the number of objects on which the ray-object intersection test has to be performed, and (ii) accelerating the ray-object intersection test. Space division methods [40] partition the elements of the simulation space into clusters and perform the intersection test with the clusters rather than the single objects. Transmitter-dependent methods exploit the knowledge of the transmitter node location to pre-process the scene and discard obstacles that are not relevant to the simulation, such as non-illuminated elements [41], obstructed angular sectors [42], and those reached by a negligible amount of power [43]. Similarly, receiver-dependent methods reduce the number of required operations based on the spatial distribution of the receivers [44]. Notice that both transmitterand receiver-dependent methods require the pre-processing to be performed whenever the reference node moves, thus leading to a significant overhead in dynamic environments. Other techniques speed up the RT's performance by either preemptively performing an exhaustive simulation to produce a coverage map for all the receivers' positions [45], or interpolating the metrics observed between a grid of simulated points [46]. Finally, [47] analyzes the performance of an RT as a function of the number of reflections, focusing only on the mean square error with respect to measurements in different indoor locations.
With respect to the state of the art, this paper focuses on the complexity analysis and reduction for an open-source mmWave RT and QD model, with specific focus on preserving the accuracy of comprehensive system-level simulation results, based on the integration of the RT with a realistic, full-stack simulator for 5G cellular networks.

III. RAY TRACING AT MILLIMETER WAVES
As previously discussed, ray tracing is widely used to accurately simulate electromagnetic (EM) propagation in an environment whose geometry is described by a Computer-Aided Design (CAD) model [48]. This technique is based on the solution of Maxwell's equations for the far field when the operating frequency tends to infinity, where EM waves exhibit ray-like properties, i.e., the flow of power propagates along straight lines and reflects specularly on flat surfaces. At mmWaves, where the wavelength of the signal is much shorter than the typical size of the reflecting obstacles, ray tracing can therefore be used as a first approximation to model the propagation effects, whereas secondary effects such as diffraction, diffuse scattering, polarization, and refraction, should also be accounted for if a better accuracy is desired.
Some of these effects can be particularly significant in the propagation of mmWave signals. In this frequency band, the shorter wavelength leads to a higher effective roughness of the surfaces, thus increasing the amount of scattered power and, consequently, the reflection loss. The effect is twofold: on one side, higher-order reflections are expected to be weaker and thus affect the communication less than at lower frequency but, on the other side, proper modeling of the scattered rays should be taken into consideration [49]. Conversely, higher penetration loss will reduce the received power in the cluttered areas, improving frequency reuse and reducing the cross-interference between close-by radiators. Finally, diffraction shadows are deeper at higher frequency, making diffraction a less prominent means of propagation in the mmWave band.
In this section, we describe the tools we use in this work to simulate the mmWave channel. Specifically, in Sec. III-A we describe the architecture and the assumptions of the RT we adopt, while in Sec. III-B we briefly describe the diffuse scattering model.

A. The Millimeter Wave Ray Tracer
For the results of this paper, we use an open-source RT developed jointly by the SIGNET group at the University of Padova 2 and the U.S. National Institute of Standards and Technology (NIST). 3 It uses triangles described in CAD files as the basic 3D surface element units, which can then be combined to define complex shapes.
The RT only supports specular reflections and, optionally, diffuse scattering, ignoring effects such as diffraction, penetration, and polarization. Specifically, polarization is not considered to further simplify the software from the Fresnel equations. Thus, reflected rays experience a 180 • phase rotation and a random reflection loss RL whose detailed distribution has been experimentally characterized and is reported in the material library available at [25].
Multiple network nodes, playing the roles of TXs and RXs, are modeled as points, and can be deployed simultaneously, allowing the calculation of interfering channels. Furthermore, trace-based mobility is supported, making it possible to create complex scenarios with multiple base stations and mobile users. Given N nodes and t time steps, the simulator computes a channel instance for each time step and for each node pair. Considering a symmetric channel for a given node pair, tN (N − 1)/2 channel instances have to be calculated.
The RT uses the Method of Images (MoI) [48], a well-known method for ray tracing originally adapted from geometrical optics [50] to be used in computer graphics and in the early works on channel modeling for cellular networks [51]. The MoI computes specular reflections, assumed to be independent across time and node pairs. For the simplest case, i.e., first order reflections, it defines the virtual image of a node, for example the RX, to be a specular image with respect to a surface. Formally, RX (1) is the specular image of the RX, defined as RX (0) , across the surface S. The specular reflection point P (1) between the RX and the TX coincides with the intersection of the segment RX (1) , TX with the surface S, as shown in Fig. 1a.
Since triangles are used as the basic surface units of the CAD environment, S i is the plane generated by a given triangle T i , i = 1, . . . , T , where T is the total number of triangles of the CAD environment, and it must be verified that P (1) is a point within the area shaped by T i , otherwise the reflection will not be valid and will thus be discarded. Finally, every segment of the ray, namely RX (0) , P (1) and P (1) , TX , has to be checked against the remaining triangles of the environment T j , j = 1, . . . , T, j = i for obstruction. If any segment of the ray is obstructed, the whole ray is considered obstructed and thus discarded.
The MoI applies recursively when multiple reflections are considered, computing the n-th virtual image of the RX, RX (n) , and the respective specular reflection point P (n) as shown in Fig. 1a. Thus, for each ray of reflection order r > 0, r geometrical operations must be done to compute the ray path and, if the ray is valid, each of the r + 1 segments needs to be checked for obstruction over the T − 1 triangles of the CAD environment. Summing the geometrical and the obstruction-check operations, the number of operations per ray is thus n ray (r) = r + (r + 1)T, T 1.
To compute all possible reflections between a given node pair, all possible paths have to be computed and tested for obstruction. Considering increasing reflection orders, the first one to be tested is the direct ray, i.e., the segment (RX, TX). Subsequently, all first order reflections are computed, i.e., those rays starting from the TX, reflecting off a triangle T i , i = 1, . . . , T and reaching the RX. Then, second order reflections starting from the TX, reflecting first on triangle T i1 , i 1 = 1, . . . , T and then on triangle T i2 , i 2 = 1, . . . , T, i 2 = i 1 to finally reach the RX, and so on up to a maximum reflection order R.
All the reflections can be encoded in a reflection tree, as represented in Fig. 1b, where each node of the tree corresponds to a possible ray, the node depth corresponds to the reflection order r starting from 0 at the root of the tree, and the path starting from the root describes the ordered tuple of reflecting triangles to be tested. The reflection tree is a simplified concept with respect to the visibility tree described in [52]. The added flexibility, and thus complexity, introduced by the triangular tessellation is able to support fine details in a scene, but computing whether all pairs of triangles are in mutual LoS requires a significant overhead and was thus not implemented. Instead, we implement only a basic visibility algorithm, able to pre-process the scenario and discard interactions between triangles which are trivially not in LoS. The model discards all interactions between a given triangle T i and all other triangles {T j }, j = i, which are fully behind T i , exploiting the directionality of a surface as is typical for CAD models.
The complexity of a single channel instance n ch (r), then, is determined by the total number of operations required for all the nodes of the reflection tree, where we consider the upper bound given by the total number of triangles ignoring the visibility pre-processing step. At depth r = 0 we consider only the direct ray, at depth r = 1 we consider the T possible first-order reflections, then, in general, at depth r ≥ 2 we consider T (T − 1) r−1 < T r possible r-order reflections.
Thus, combining the geometrical complexity n ray (r) derived in (1) with the upper limit of the reflection tree depth (T r ), the number of operations per channel instance n ch (r) considering only reflection order r is upper bounded as n ch (r) ≤ n ray (r)T r = (r + (r + 1)T )T r . ( Then, considering the reflection order up to R, the overall number of operations n ch required by the MoI to compute a channel instance between a pair of nodes is: The first term is a truncated geometric series and the second is a special case of (truncated) arithmetic-geometric series, known as Gabriel's staircase, that can be solved as follows: thus denoting a complexity per channel instance equal to O RT R+1 , and a total complexity for N nodes and t time steps equal to O tN 2 RT R+1 . The last step in (3) is justified by considering that typical values for R and T are in the order of 1-4 and 100-10 000, respectively, thus making T the dominating term in the formula.

B. Quasi-Deterministic Model Integration
In Sec. V, the contribution of the diffused components is also evaluated, which alone can account for up to 40% of the total received power according to measurement campaigns [24]. The stochastic model for diffused rays is based on the specifications proposed for IEEE 802.11ay channel modeling [53] and its parameters are obtained from accurate measurement campaigns [24]. Further details are given in [54]. In this section, boldface letters denote random variables while non-boldface letters denote deterministic variables or realizations of random variables.
The QD model is built upon the deterministic channel, as described in Sec. III-A. For first-order reflections, the path gain of the Deterministic Ray (D-Ray) is given by contributions of the free space propagation loss and the reflection loss of the reflecting surface, as follows where λ c is the wavelength of the carrier frequency, ray is the total ray length, and RL ∼ R(s RL , σ RL ) is the Rician-distributed random reflection loss factor given by the reflecting surface's material. The Rician distribution was chosen as a good model for RL according to [24]. The computed D-Rays will then be the baseline for the multipath components generated by the QD model. If present, the direct ray is treated separately as it does not generate any diffuse component. For first-order reflections, a cluster can then be defined as the set of rays including a D-Ray and its diffuse components. Considering the pre-cursors (i.e., diffuse components that are received before the D-Ray), the main cursor (i.e., the D-Ray), and the post-cursors (i.e., received after the D-Ray), the total number of MPCs of a given cluster is N MPC = N pre + 1 + N post . In general, diffuse components are generated as long as they are above a power threshold, but empirically we found that setting N pre = 3 and N post = 16 generally satisfies this condition without significantly compromising the accuracy of the model. As described in [54], the arrival time of the i-th MPC τ i is modeled as a Poisson process, meaning that the inter-arrival times are independent and exponentially distributed with an exponential power law, i.e., where K dB ∼ R (s K , σ K ) is a loss factor, τ 0 is the D-Ray arrival delay, γ ∼ R(s γ , σ γ ) is the power-delay decay constant, S ∼ N 0, σ 2 s is the power-delay decay standard deviation, and σ s ∼ R (s σs , σ σs ). The Rician distribution was again chosen as a good model for the data collected from the measurement campaign according to [54]. A graphic representation of these parameters is shown in Fig. 2.
Finally, as observed in [55], [56], the departure and arrival angles follow a Laplacian distribution around the D-Ray, while the phase shift φ i , due to both diffusion and Doppler shift, is assumed to be U[0, 2π) independently for each diffuse MPC.
In general, for the r-th reflection order, with r > 1, the definitions presented above are heuristically adapted, considering independent statistics for the different reflectors. To reduce the computational complexity of the complete model, the multi-bounce model neglects diffuse rays beyond the first order, given their fast increasing attenuation. Instead, only diffuse rays generated directly by the deterministic ray are taken into account, each generated with the QD parameters relative to the impinging reflecting surface. Moreover, we assume that every diffuse component closely follows the main cursor, thus reflecting on the same reflectors. Consequently, every reflector produces N pre + N post diffuse components, thus yielding N MPC ≈ r(N pre + N post ) + 1 MPCs. Finally, diffuse components are independently extracted at each time step.

IV. MMWAVE CHANNEL SIMPLIFICATIONS
The accuracy of the MoI-based ray-traced channels -especially when considering the QD model -comes at a high computational cost, which may limit the scalability of the simulations, especially when considering a very large number of devices. In this perspective, the main objective of this work is to evaluate how channel simplifications affect the results of link-level and network-level simulations while speeding up the overall simulation runtime. In this section, we present two techniques that were designed with this objective in mind [21]:  aims at reducing the number of rays between a pair of nodes. Specifically, given a set of M rays connecting two nodes, we propose and evaluate a selection criterion that decreases the number of MPCs to M < M, to reduce the overall simulation time. This operation is applied on a time-step basis. The rationale behind both strategies is to decrease the number of MPCs by removing the least significant ones, i.e., those with the lowest power, as they are expected to provide a limited contribution to the overall received signal strength.

A. Maximum Reflection Order Reduction
Each reflection of the MPCs on a surface is associated to a partial power loss and an increased path length, translating into a higher path loss. Namely, from (5), the path gain for a ray reflected on r surfaces is where i is the length of the segment associated with the i-th reflection. The summation is decomposed into two terms to underline the different contributions: both the path length and the reflection losses degrade the path gain when the reflection order increases. Therefore, it is reasonable to assume that MPCs that bounce across multiple scattering surfaces have a low contribution to the overall received power, and can be omitted from the RT computations. Setting the maximum reflection order to R < R, the RT complexity is decreased to O tN 2 R T R +1 < O tN 2 RT R+1 with significant savings in terms of computation time, given the super-exponential dependency of the complexity on R.

B. MPC Thresholding
Besides the reflection order, there are other elements that contribute to reducing the MPC path gain. For example, even if R is small, in large scenarios surfaces that are located far from the TX and RX nodes are associated to MPCs with longer path lengths (i.e., the first term in (7)). As the path gain from these scatterers is much smaller than that from close-by reflecting surfaces, it is possible to prune them from the list of MPCs to compute. At the same time, it is possible that some MPCs can be pruned regardless of the distance between the signal source and the detector, if the path gain falls below a minimum threshold as a result of an excessive reflection loss (i.e., the second term in (7)), e.g., when the signal propagates in Non-Line-of-Sight (NLoS).
For these MPCs, the path gain plays a key role and can thus be used as an indicator to perform the selection of the most significant rays. Specifically, the selection is performed at each time step considering a threshold γ th in dB units, according to the following rules: • the strongest ray is identified, regardless of whether it is obstructed, and the corresponding path gain P G strong, dB is computed; • all the other MPCs are identified. Note that P G i,dB < P G strong, dB for every MPC i. • the selection is carried out discarding every MPC i such that P G i,dB − P G strong, dB < γ th . While being more general than the previous approach, as it identifies directly the weakest, least significant rays, this strategy requires the computation of all the geometric paths in order to obtain the path gain list. However, this method, which removes (M − M ) rays, eases the load on the subsequent RT operations, i.e., the obstruction check, reducing by a factor of M−M M the complexity of each time step. In fact, following the same logic as in Sec. III-A for every ray of reflection order r, after the r geometrical operations required to compute the path of the ray, none of the (r + 1)T obstruction checks are performed if the ray is discarded. Updating (3) with r ≥ 0 instead of r + (r + 1)T operations per ray, the complexity can be reduced to O tN 2 RT R and thus by a factor up to T . Note that, whereas this approach achieves a constant factor improvement, T can be in the order of tens to thousands, depending on the details included in the CAD file and on the adopted triangulation, thus dominating the complexity expression.
Absolute thresholding can also be used to limit the number of extremely weak rays similarly to the previous technique. This approach can be useful when considering high values for R, low values for γ th , and especially when using the QD model. In this case, setting a conservative threshold Γ th , every MPC i such that P G i < Γ th is discarded.
The complexity of the RT can be significantly reduced thanks to the removal of MPCs and to the reduction of the maximum reflection order R. On the other hand, these simplifications degrade the accuracy of the simulation results at the different levels of the network stack. In the remainder of this paper, we will quantify this trade-off for three realistic propagation scenarios. The overall end-to-end network performance and the runtime of the simplified RT settings will be compared with those of the complete, non-simplified channel traces.

V. PERFORMANCE EVALUATION
This section reports the details of an extensive performance evaluation aimed at understanding the impact that the simplifications introduced in Sec. IV have at different layers of the protocol stack. We first describe the scenarios and tools used for the performance evaluation (Sec. V-A), then the link and higher layer performance (Secs. V-B and V-C, respectively), and conclude with the computational performance given by the simplifications (Sec. V-D) and guidelines for the most efficient design configurations (Sec. V-E).

A. Simulation Scenarios
Three representative scenarios with distinctive features have been selected to make the performance evaluation as general as possible. Their main characteristics are hereby described and summarized in Table I. Without loss of generality, only downlink channels are considered.
1) Indoor1: The most basic scenario, with a rectangular room (see Fig. 3a) of size 10 m × 19 m × 3 m. The TX is positioned close to the ceiling at (5, 0.1, 2.9) m. The RX, at height 1.5 m, moves away from the TX at a speed of 1.2 m/s along a straight line. This scenario was deliberately designed to be simple, to analyze the propagation characteristics simulated by the RT focusing on the received power pattern when different simplifications are used; 2) L-Room: An L-shaped hallway (see Fig. 3b). A static TX, placed at (0.2, 3, 2.5) m, transmits to the reference RX that moves away from it at a speed of 1.2 m/s across the corridor. The shape of the room is such that the RX is in NLoS condition for a significant portion of the path. In order to analyze the impact of interference on the network performance, a second TX, placed at (8, 18.8, 2.5) m and acting as interferer, communicates with an RX at (9, 3, 1.5) m. Furthermore, the shape of the room plays an important role when comparing the proposed simplification techniques, as it may create blind spots where little or no signal is received; 3) Parking Lot: The only outdoor scenario, representing a parking area of about 120 m × 70 m enclosed by buildings (see Fig. 3c). The reference TX, located at Moreover, the reference RX moves at a much higher speed than in the previous scenarios, as in a basic vehicular scenario. The statistical quantities used by the QD model mentioned in Sec. III-B, namely the reflection loss RL, the rician factor K, the inter-arrival times τ , the power-delay decay constant γ, and the power-delay decay standard deviation S, for the Indoor1 [24] and Parking Lot scenarios, are obtained from detailed measurement campaigns. 4 For each scenario, the RT and QD model softwares described in Sec. III have been used to generate the channel instances at 60 or 28 GHz for the specified devices and mobility patterns sampled every 5 ms. These traces were then integrated in a custom MATLAB simulator [1], [21] to evaluate metrics at the link layer (e.g., the SINR), and with the mmWave module [27] of the ns-3 network simulator [26] to investigate the performance of the full protocol stack.
Both simulators rely on the computation of the channel matrix H to describe the channel obtained from the MPCs provided by the RT and QD. Given M rays, with path power gain PG m , phase shift Φ m , delay τ m , and angles of departure AoD m and angles of arrival AoA m , the matrix for the carrier frequency f c is computed as [29] where a rx (θ) and a tx (θ) are the receiver and transmitter array responses in the 3D angle θ, (·) * is the conjugate operator, and (·) H is the Hermitian operator. The SINR for the link between the transmitter t and the receiver r is where P tx,t is the transmit power of device t, w i,j is the beamforming vector used by device i to communicate with device j (and w i, * is used with abuse of notation to indicate the beamforming vector used by device i to transmit towards a connected device, or 0 if i is not transmitting), N 0 is the noise power spectral density, B is the bandwidth, and F = 10 NF 10 is the noise figure of the receiver.
ns-3 Configuration: For ns-3, we extended the channel model implementation described in [35] to account for a generic channel matrix computed, in this case, as expressed in (8). 5 In the performance evaluation, this channel model has been combined with the 3GPP-like protocol stack of the 5G mmWave module for ns-3 [27], which features physical and Medium Access Control (MAC) layers with an Orthogonal Frequency Division Multiplexing (OFDM)-based frame structure, dynamic Time Division Duplexing (TDD), Adaptive Modulation and Coding (AMC), and several scheduler implementations. The channel model implementation influences the protocol stack performance through an error model that maps the SINR to the capacity of the physical layer. Besides, the User Equipments (UEs) and base stations protocol stacks are completed by 3GPP Radio Link Control (RLC) and Packet Data Convergence Protocol (PDCP) layers, together with a realistic control plane based on the Radio Resource Control (RRC) layer which supports mobility-related procedures [57]. We consider two configurations for the uniform planar antenna arrays: large arrays, comprising 8 × 8 elements for the TXs and 4 × 4 elements for the RXs, and small arrays, comprising 2 × 2 elements for both TXs and RXs. All the arrays feature omni-directional elements spaced by λ/2. The planes on which all planar arrays lie are parallel to the y-z plane with a fixed orientation throughout the simulation. The beamforming is based on the Singular Value Decomposition (SVD) of the channel matrix H t,r , i.e., the beamforming vector w t,r is the eigenvector associated to the largest eigenvalue of H t,r [58]. Finally, thanks to the integration with ns-3, it is possible to equip the UEs with the TCP/IP stack and applications which connect to remote servers in the Internet.
The results shown in the following sections are benchmarked, unless stated otherwise, with the most complete and accurate baseline possible, where very conservative simplifications are introduced which produce negligible differences with respect to the measurements [24], allowing us to consider these baselines as equivalent to the actual measurements. Specifically, we set a maximum reflection order R = 3 for the Parking Lot scenario, and R = 4 for the others (to consider several reflections of the MPCs), a relative threshold γ th = −∞ (thus implying that no MPCs are discarded, regardless of their power level at the detector), a conservative absolute threshold Γ th = −200 dB, a large antenna array configuration, only deterministic rays (i.e., no QD model), and a UDP stream with an offered traffic equal to 800 Mbps.

B. Link-Level Performance Results
The first step towards proper protocol design is gaining a deep understanding of how the proposed ray-tracing simplifications impact the link-level performance of the network, Fig. 4. Evolution of the SINR experienced when the test RX moves in the L-Room scenario along the path described in Fig. 3b. neglecting, at this stage, the effects at the upper layers. In this perspective, we are interested in investigating how the strategies described in Sec. IV result in different SINR regimes.
1) SINR Evolution: From Fig. 4a, which plots the temporal evolution of the SINR experienced when the RX moves in the L-Room scenario along the path described in Fig. 3b, we see that the impact of R is certainly non-negligible: the trend of the SINR visibly changes when progressively reducing the number of reflections per MPC. Moreover, we see that the SINR evolves consistently with the mobility of the RX. The SINR indeed drops by more than 30 dB when the RX loses the LoS (position B in Fig. 3b), while the SINR degradation experienced at time t = 3.4 s (position A in Fig. 3b) is due to the interference from TX interf . Rapid fluctuations within the SINR trace are then due to the fact that different MPCs travel different paths. At 60 GHz, where the wavelength is as short as λ = 5 mm, even small variations of the path length between the direct ray and the reflected ones from the back wall (behind the TX), side walls, ceiling, and floor of the room, may result in strong fading. Fig. 4a also shows that the impact of R is particularly evident when the RX operates in NLoS: in this region, in fact, the received power drops to zero when first-order (R = 1) and second-order (R = 2) reflections are removed (positions C and D in Fig. 3b, respectively).
2) Impact of Interference: For completeness, in Fig. 4b we compare the metrics from the MATLAB (solid lines) and ns-3 (dots) simulations. The lower bound (red line) assumes an always-on interferer, while the upper bound (blue line) assumes an interference-free channel. On the other hand, ns-3 models a realistic transmission pattern for the primary and interferer links, which could occupy the channel in overlapping, partially overlapping, or non-overlapping time intervals. Therefore, for each time interval, the SINR generated by the ns-3 simulations is lower and upper bounded by the two other curves, with the ns-3 SINR much closer to the upper bound when the two transmitters use non-overlapping slots for communications with their receivers. We observe that the SINR, after steadily decreasing until time 17 s, starts increasing again. This happens because the TX interf is pointing its beam towards its own receiver RX interf , and as RX ref approaches TX interf it becomes less and less aligned with the interfering beam, which makes the interfering power actually decrease as a result.
3) SINR Distribution: Fig. 5 further investigates the impact of parameter R on the SINR, reporting its Cumulative Distribution Function (CDF) for the three scenarios described in Sec. V-A as a function of R. In the L-Room scenario, the shape of the CDF of the SINR changes significantly when varying the value of R. This has eventually an impact on the capacity and packet error rate simulated at the physical layer. Specifically, reducing the number of MPCs of the channel may imply that the rays are not able to reach the RX position with sufficiently high power, thus resulting in a complete outage: for example, the CDF terminates at SINR −10 dB for R = 1 and at SINR −23 dB for R = 2. On the other hand, both the Indoor1 and the Parking Lot scenarios are able to preserve the LoS for the whole duration of the simulation, thereby making it possible for the signal to propagate with a minor impact on the received power even when limiting the number of reflections R per MPC. Notice that the 20 dB gap of SINR between the Parking Lot and Indoor1 configurations is due to the larger distance between the TX and the RX, and to the reflecting surfaces (e.g., buildings) in the outdoor scenario.
4) Impact of Array Size on the SINR: Finally, in Fig. 6 we plot the SINR vs. the relative threshold γ th as a function of the antenna size at the TX and the RX. As expected, Fig. 7. End-to-end performance vs. R for the L-Room scenario with an offered UDP CBR traffic of 800 Mbps. the SINR increases when increasing the number of antenna elements, which increases the beamforming gain. Also, while the impact of R severely affects the SINR in the L-Room scenario, increasing the relative threshold γ th to reduce the number of MPCs to be processed by the RT results in negligible deterioration at the link level, while speeding up the simulation, as will be discussed in Sec. V-D.

C. End-to-End Performance Results
Many of the conclusions we derived from the link-level performance in Sec. V-B can be extended to the end-to-end metrics, i.e., throughput and delay at the PDCP layer. Statistics have been collected at this layer since they can easily profile both UDP and TCP traffic and are very close to the application layer performance, without the addition of extra delays due to the specific architecture of the simulation scenario. In this section we study three types of traffic, namely full-buffer TCP traffic and UDP traffic with a Constant Bit Rate (CBR) of 100 Mbps and 800 Mbps. The latter was chosen to be the default for the results shown in this section, unless stated differently. Both throughput and delay are averaged over 100 ms windows. Fig. 7 reports end-to-end metrics over time for the L-Room scenario as a function of the maximum number of reflections R. First of all, we notice that the 800 Mbps data rate for UDP was chosen to saturate the channel capacity, as the physical layer only supports 630 Mbps of peak rate. The interference starts to impact the UDP performance at 7 s, followed by a rapid performance degradation when the direct ray is lost in point B (see Fig. 3), at about 9.65 s. Between point B and point C (11.33 s), the signal is still strong enough to allow for some transmissions, resulting, however, in a rapid increase of the delay of received packets due to buffering and retransmissions. When RX ref gets closer to TX interf (i.e., around 17 s), as already observed (see the discussion about Fig. 4) the misalignment of the interfering signal results in a slight increase of the SINR, which allows to transmit the enqueued packets producing some non-zero throughput and a reduction of the delay (even though the transmission conditions remain quite severe and the performance is rather poor). We also observe that this effect is visible only for R ≥ 3, since more simplified models have too few reflections, thus failing to reach the final part of the corridor. Fig. 8a shows the corresponding simulations using a full-buffer TCP traffic stream, which reaches a peak rate of 536 Mbps. As expected, sudden jumps in the channel quality lead to sudden performance drops in TCP. This is the case in point A at 3.415 s (see Fig. 3), where the strong first-order rays of TX interf are received by RX ref , at the beginning of the interfering regime at about 7 s, when the direct ray is lost in point B at 9.65 s, and finally when the strong first-order reflections are lost in point C at 11.33 s.

2) Throughput and Delay Analysis With TCP Traffic:
For these reasons, Fig. 8b highlights a slightly positive correlation between the average throughput and the maximum reflection order R, although only a 9% increase is observed with γ th = −∞ dB, going from 246 Mbps for R = 1 to 268 Mbps for R = 4. In general, instead, delay statistics do not show a clear trend in the reflection order, nor in the relative threshold, probably due to the extra complexity created by retransmissions and queues. An example is shown in Fig. 8c where most statistics follow a very similar trend, separating only towards extreme values of delay, corresponding to the  portion of the scenario after point B, i.e., when the direct ray is lost. Notice that the CDFs for the delay do not reach 1 since windows where no packets were received were considered to have infinite average delay.
Unlike for the UDP case at 800 Mbps, TCP decreases the congestion window when strong interference affects the communication for both the reference and the interfering streams. For this reason, packets sent during the interfering regime do not always collide with each other. The reduced interference greatly increases the perceived SINR, as explained in Sec. V-B for Fig. 4b, thus triggering transmissions even after point B for R ≥ 2. Although not shown here, similar conclusions can be drawn for UDP traffic at 100 Mbps, which is able to transmit after point B as well, sending data at a rate that depends on the small scale fading affecting the communication.

3) Throughput Analysis for Different Scenarios and Antenna
Array Configurations: The average throughput for different configurations is shown in Fig. 9, including the 95% Confidence Interval (CI). The Indoor1 scenario shows virtually no variations across different values of R for all the three types of traffic considered. Minor variations can only be observed for the Parking Lot and L-Room scenarios. For the latter, the NLoS regime sets apart simulations with R ≤ 2 from those with R ≥ 3, not being able to exploit the last part of the path with lower interference and thus showing slightly lower performance.
Similar results are shown in Fig. 10, where two sets of antenna configurations are considered. When smaller antenna arrays, and thus smaller antenna gains, are simulated, the average performance of the system decreases for all scenarios. The largest performance hit is experienced by the Parking Lot scenario, since the stronger path loss experienced as a result of the larger propagation distances involved in the outdoor scenario can be mitigated by the antenna gain. The performance drop observed in Fig. 10 can be also due to stronger interference. Smaller arrays, in fact, are not able to create narrow beams, making TX interf interfere more strongly with RX ref .
Moreover, we see that, even though a larger beam pattern would capture more rays than it would be possible in the case of narrower beams (that typically only absorb the strongest rays), both small and large antenna arrays show a minor impact of the reflection order R on the system performance, e.g., for the L-Room case, the throughput changes as little as 4% when R is reduced from 4 (our baseline) to 1. In fact the reflected rays do not contribute much to the overall received power budget, and can be pruned with minimal performance degradation. Small throughput variations are also experienced when increasing γ th from −∞ (our baseline) to −15 dB, except when R ≥ 2 in the L-Room and Parking Lot scenarios, i.e., when reflected rays may still carry significant power.

4) Throughput Analysis With and Without QD Model:
Finally, a comparison between a purely-deterministic and a quasi-deterministic channel with the QD model described in Sec. III-B is shown in Fig. 11. Results are given for R = 4 (to consider as many detectable reflections of the MPCs as possible) and γ th = −∞ (thus implying that no MPCs are discarded, regardless of their power level at the detector), to analyze the most complete and accurate baseline possible, where very conservative simplifications are introduced in the ray tracing computation. In general, from Fig. 11a it is possible to notice that the added random rays from the QD model tend to (i) increase the average received power and (ii) increase the frequency and amplitude of power fluctuations Fig. 11. Throughput performance for the L-Room scenario with a purely deterministic and quasi deterministic channel model, with a UDP CBR traffic of 800 Mbps. Fig. 12. Simulation runtime vs. R and γ th for the L-Room scenario. The total simulation campaign runtime (Fig. 12c) accounts for an RT simulation (Fig. 12a) and 100 sequential ns-3 simulator runs (Fig. 12b). A purely-deterministic channel and a quasi-deterministic channel are considered. due to small scale fading, which is considered independent across subsequent time steps of 5 ms at a speed of 1.2 m/s. These fluctuations can also affect the end-to-end performance, making it significantly less stable.
To further study these random fluctuations, the CDF of the standard deviation of the throughput over 100 ms windows has been computed. To do so, we first computed the average throughput over 5 ms sub-windows, i.e., the sampling period chosen for the ray-traced channel, and subsequently the standard deviation over 20 consecutive sub-windows. This approach captures the deviation of the throughput over short time intervals, where it can be considered roughly constant. Computing the standard deviation over the whole simulation, in fact, would yield a misleading metric, given the extreme differences over the almost 20 s long scenario. Fig. 11b shows how an increasing number of rays tends to increase the standard deviation of the throughput due to an increased small scale fading, especially when a QD model with random diffuse components is considered: the gap is as large as 10% when transitioning from R = 1 to R = 4. This effect should be taken into account when evaluating the performance of protocols for mmWave communications which adapt to the channel conditions, e.g., TCP [7].

D. Computational Performance
The simulation techniques proposed in Sec. IV offer a trade-off between the simulation speedup given by the lower complexity, and a corresponding loss of accuracy. Secs. V-B and V-C analyzed in depth the impact of the simplifications on the network metrics at two distinct levels. Here, we compare the proposed simplifications from a computational complexity point of view, and then draw guidelines on the optimal combination of parameters that maximizes the accuracy.

1) Simulation Time:
For completeness, we need to distinguish the different contributions to the total simulation campaign runtime T camp required by a campaign of network simulations. The first is the RT runtime T RT , required by the RT to generate the MPCs for the channel matrix (Fig. 12a). The second contribution, T ns , is due to the network simulator (either MATLAB or ns-3 in this work), which includes the computation of the channel matrix with the RT data and what can be considered as simulation overhead (Fig. 12b). Usually, a simulation campaign aiming at proving a new result requires running simulations multiple times with different random seeds (i.e., Monte Carlo analysis), for a total number of simulations in the order of thousands. Running such simulations over tens of parallel processes is equivalent to running only hundreds of simulations sequentially, thus requiring a total of about T camp = T RT + 100 T ns to first obtain the RT traces, and then re-use them for the whole simulation campaign (Fig. 12c). Fig. 12 shows the RT, the ns-3, and the corresponding total simulation campaign runtime, for the L-Room scenario. The figure compares the impact on the computational complexity of the simplification introduced by the reduction of the maximum order of reflection R. We consider the two extreme values of γ th , i.e., −15 dB and −∞ dB, and compare a purely-deterministic channel with a quasi-deterministic channel that includes the random diffuse components introduced by the QD model. First, it can be observed that R has the greatest impact on the runtime: the RT runtime T RT increases by more than 2 orders of magnitude when increasing R, and  Trade-off between the throughput performance and the speedup obtained with the different simplification parameters for the three scenarios. As in Sec. V-C, for the throughput ns-3 has been considered. the ns-3 runtime T ns experiences a similar effect. The impact of the QD model is clearly visible in Fig. 12b and Fig. 12c. Nevertheless, in this case increasing γ th to −15 dB can effectively reduce the gap between the runtime with and without QD model, for every reflection order. However, whether and how this combination of parameters, while speeding up the ray tracer, affects the accuracy of the simulations will be discussed in the following paragraphs.
2) Accuracy Metric: To summarize the conclusions from Secs. V-B and V-C quantitatively, we compare the network performance of the simplified models and the baseline considering the Normalized Root Mean Square Error (NRMSE), computed as [1] where x is the metric with the configuration of interest,x is the baseline metric, and σx represents the standard deviation of the baseline metric. This metric evaluates the distance between each baseline-simplified pair of a given simulated metric in the time domain. As in [1], we compare it with a speedup metric, defined as the factor by which the overall simulation T camp runtime is reduced compared to the baseline. For the remainder of this section, we will consider 0.05 as the maximum acceptable value for the NRMSE, meaning that we deem acceptable an RMSE equal to 5% of the standard deviation of the considered metric.

3) Link-Level Performance:
The variations of the SINR due to the simplifications are shown in Fig. 13. In general we notice that markers with the same shape (same R) tend to increase with increasing steepness as the relative threshold increases, thus showing diminishing returns for the largest value of γ th .
For the Indoor1 scenario (Fig. 13a), significant deviation from the baseline occurs with R = 1, and with γ th = −15 dB. Nevertheless, for all the considered cases, the NRMSE is smaller than 0.07, confirming what was anticipated in Sec. V-B, i.e., that only minor changes take place even with the most aggressive simplifications. Within the maximum accepted NRMSE it is possible to accelerate the simulator up to a factor of almost 6 with an NRMSE equal to 0.048 choosing R = 1, γ th = −25 dB.
Good performance is also obtained in the L-Room scenario (Fig. 13b). In this case, choosing γ th < −15 dB and R > 1 makes it possible for the NRMSE to remain below 0.05, but an overall speedup of a factor of 9.2 is obtained with R = 2 and γ th = −40 dB. In this case, using R = 1 even with γ th = −25 dB might still be acceptable, with an NRMSE of only 0.057 but a speedup factor equal to 14.3. Finally, good performance is also obtained in the Parking Lot scenario (Fig. 13c), with a behavior extremely similar to the L-Room scenario but with significantly higher speedup factors due to the higher complexity of this scenario. In fact, for γ th < −15 dB the NRMSE stays below the 0.05 threshold while the campaign runs orders of magnitude faster. Specifically, choosing R = 1 with γ th = −25 dB yields an NRMSE of 0.017 with a speedup factor of more than 2000. 4) End-to-End Performance: Fig. 14 reports the NRMSE of the throughput vs. the speedup for end-to-end simulations. Similarly to what happened for the link-level performance, γ th shows diminishing returns especially for the L-Room and Parking Lot scenarios. Fig. 14a shows that virtually no variation occurs when introducing simplifications in the Indoor1 scenario. This suggests that setting R = 1 and γ th = −15 dB can speed up the simulation by a factor of almost 4 with negligible accuracy loss with respect to the baseline configuration.
Good results are obtained also for the L-Room scenario in Fig. 14b, and with R = 2 and γ th = −25 dB it is possible to gain a 3.9 speedup factor with an NRMSE of 0.033, or even a 5.1 speedup if an NRMSE of 0.054 is still accepted using R = 1.
Conversely, the end-to-end simulations for the Parking Lot scenario are much more severely affected by the simplifications introduced in this paper. In fact, Fig. 14c shows that the proposed simplifications are not able to achieve a significant speed-up without sacrificing the fidelity of the results, unlike in the previous scenarios.

E. Design Guidelines
Without focusing on the exact numbers, that are specific to the scenarios and the simulation tools employed in this work, it is still possible to draw some general guidelines for an efficient use of RT simulations. In particular, we do not only focus on link-level metrics, as generally considered in literature studies, but also identify which combination(s) of simplifications affect the system-level performance (which typically drives network design choices).
1) Scenario: The simulation scenario plays a key role in the simplification choice. Specifically, when considering indoor LoS simulations, the secondary rays can be neglected with good approximation and very significant time savings. When considering also NLoS conditions in indoor scenarios, instead, less flexibility should be expected, although it is still possible to considerably reduce the runtime with a minor accuracy loss. Finally, outdoor scenarios should be treated carefully, as aggressive simplifications may have detrimental effects on the fidelity of the simulations. Nevertheless, a working point can usually be identified which offers a significant speedup.
2) Simplification Strategy: Although their effect can vary substantially depending on the considered scenario, some general considerations can be drawn regarding the simplification techniques. The link-level metrics, such as the SINR, benefit, in terms of runtime, more from a reduction of the maximum reflection order than from an increase of the MPC threshold. Conversely, full-stack metrics, such as end-to-end throughput and delay, are more transparent to ray tracer's simplifications. In any case, an aggressive thresholding policy leads to a performance degradation that is not justified by a corresponding speedup improvement. Finally, end-to-end results require a balanced use of both simplification techniques to achieve an optimal working point. In particular, for the L-Room scenario we showed that considering more than two reflections, i.e., R = 3 or R = 4, would not decrease the NRMSE significantly, while in turn reducing the speedup gain by a factor of 2. At the same time, for the Parking Lot scenario, it may be fundamental to decrease γ th below −25 dB to maintain a good trade-off between accuracy and speedup, even though an NRMSE below the 0.05 threshold could be guaranteed only setting γ th = −∞. Conversely, for the Indoor1 scenario even the most aggressive simplifications would not virtually affect the RT accuracy. In light of this, setting R = 2 and a relative threshold γ th ∈ [−40, −25] dB consistently yields a good balance of accuracy and speedup in almost all cases, and for this reason it is the suggested configuration. Moreover, end-to-end evaluations of protocols that adapt to the channel behavior should consider combining the ray tracing process with a QD model.

VI. CONCLUSION AND FUTURE WORK
Accurate and scalable simulations are the basis for the design and evaluation of future 5G networks operating at mmWave frequencies. The usage of large antenna arrays, however, requires an accurate representation of the spatial dimension of the channel, for example through RTs, which can model the propagation of the different multipath components of a mmWave signal based on the geometry of the scenario. In this paper, we discussed two possible simplification techniques to decrease the computational complexity of RT-based network simulations and improve scalability. The impact of those simplifications on the physical layer and endto-end network performance has been numerically evaluated for different scenarios, applications, and antenna array configurations. Notably, we showed that the optimal trade-off is achieved when neglecting third-order (and beyond) reflections, and multipath components with path gain lower than −40 dB compared to the strongest ray. We believe that the insights that resulted from the extensive profiling and performance evaluation can guide researchers in designing accurate, yet scalable, simulations of mmWave networks.
As a future work, we plan to investigate optimization techniques that can automatically tune the simplification parameters to obtain the maximum speed up while minimizing accuracy degradation. Additionally, we will extend the profiling and comparison considering end-to-end measurements in actual mmWave deployments, reproducing the scenario in the RT. Finally, we also plan to study the impact of the simplifications on different beamforming architectures (e.g., hybrid beamforming). equipment identified are necessarily the best available for the purpose.