Real-Time Rendering of Point Clouds with Photorealistic Effects: A Survey

Readily available RGB-D cameras in smart phones and improving 3D scanning technologies have made it possible to produce detailed point cloud and point-based models of real world objects even in real time. Rendering such models in high quality and at satisfactory frame rates is needed for realistic extended reality (XR) applications. This publication reviews real-time photorealistic point cloud rendering methods which directly ray trace or rasterize point cloud models, with an emphasis on ray tracing and real-time performance. We found that real-time direct point cloud ray tracing research has been focused on static non-animated content, and thus, open research possibilities include adapting modern dedicated ray tracing hardware for increased performance for animated and live captured scenes, and adding path tracing techniques to increase photorealistic effects in the rendering result. A categorization and discussion on the capabilities of state-of-the-art photorealistic point cloud rendering methods is presented by surveying both real-time and offline methods, which are assumed to become real-time capable with the advances in near-future hardware. Challenges and future trends are derived by comparing different rasterization and ray tracing methods as well as acceleration structures for point clouds in terms of produced rendering effects and speed.


I. INTRODUCTION
Point cloud and point-based visualization have been studied for more than three decades. Levoy and Whitted suggested using points as a geometric primitive for rendering instead of polygonal meshes or parametric surfaces in 1985 [1]. The argument was that the number of geometric primitives in rendered scenes would keep increasing such that their projected sizes would decrease to sub-pixel areas in screen space, thus, justifying the use of point primitives instead. Since then, research on visualization of point cloud and point-based models has produced techniques like splatting [2], [3] and direct ray tracing of point cloud and point-based models [4], [5].
The challenge in point cloud visualization has been the lack of a continuous surface representation. This poses both benefits and disadvantages. On the one hand, point clouds are more flexible than connected meshes and can be used to store and visualize huge data sets with up to 100 million points in real time [6]. On the other hand, generating correct surface attributes between point cloud points in a rasterization pipeline needs special handling compared to the interpolation and intersection of values between vertices in a triangle or other polygonal-based model. This is also true when finding an intersection point between a viewing ray and the implied surface representation of a point cloud.
Direct point cloud rendering challenges can be circumvented by reconstructing the whole point cloud into a renderable surface representation such as a triangulated mesh. The capturing and reconstruction of real-world scenes especially FIGURE 1. An overview schema of the different pipeline steps from point cloud capture to rendering. Within solid lines are the subjects covered and in dashed lines are the subjects omitted in this survey. The arrows represent the point cloud data flow. Generally, acceleration can be considered either preprocessing or rendering, but in our case, we only consider methods performant enough to reconstruct acceleration structures at rendering time.
with RGB-depth (RGB-D) cameras -reviewed in various publications [7]- [9] -has seen a lot of research after the seminal publication introducing KinectFusion [10]. Many of the fusion-based systems reconstruct the scenes directly into mesh representations with an intermediate scene description based on signed distance fields (SDF) at capture time [10]- [13]. However, full global reconstruction is computationally expensive especially on large models, and it can be wasteful if only parts of the scene are viewed or models become obsolete, e.g., in animated or streamed point cloud scenarios. Furthermore, recent advances in point cloud compression by the MPEG standardization group [14] have enhanced the possibilities of transferring raw point clouds efficiently, e.g., from point cloud capture sites to end users. This opens the possibility for doing the direct rendering effort at the user's end. Thus, the motivation behind this survey is the availability of methods specifically for direct real-time photorealistic rendering for surface point clouds.
A key aspect of photorealism is the detail level of the model which, in the case of point clouds, means the number of points used to represent a specific size object in the real world. Available models in repositories such as The Stanford 3D Scanning Repository [15] have a typical point count in the order of 10 6 and single detailed human-like models, such as Lucy, have up to 10 7 points. The previously mentioned KinectFusion and its successors work with up to 512 3 grid structures corresponding to almost 135 million potential data points, but most of the data entries are empty space, making it hard to illustrate their effective point resolution. Level of detail (LOD) systems and culling techniques can further reduce the number of points considered at rendering time, easing computationally expensive rendering methods like ray tracing. However, lowering detail and omitting points not inside a view frustum need to be used carefully in conjunction with ray tracing, because global illumination (GI) effects might be negatively impacted. Reviewing LOD and culling techniques is out of scope in this survey.
For applications such as holoportation [16] and other telepresence systems [17], [18], the end-to-end latency from scene capture to rendered image frame in client scene is a motivating factor. Multiple human models may be transferred, leading to even more complex mesh or point cloud models with a lot of data points. These systems have shown real-time capabilities on large device clusters of up to 8 high-end PCs and multiple simultaneous capturing devices. Furthermore, these applications have focused on delivering a coherent scene model from the capturing site to the client rendering site. However, producing photorealistic lighting effects on the transferred models has not been considered. High-quality mesh and other surface generation methods for point clouds have been thoroughly surveyed [19]- [27]. Thus, the subject of surface reconstruction for point clouds is out of scope.

A. DEFINING THE RESEARCH PROBLEM
In the context of this survey, live-captured point clouds refer to streams of point clouds that may not have any coherence between consecutive frames. Contrary to this, synthetic skeletal animated models can reuse lower-level parts of an acceleration structure by applying the respective affine transforms directly to subsets of the data structure. However, such assumptions cannot be made for an unpredictable stream of points in a live scenario, which leads to the reconstruction of the acceleration structure for each frame.
A frame rate of over 75 FPS and an HD resolution of 1080p or more for interactive content in XR is needed for a comfortable viewing experience on head mounted displays [28]. These requirements are satisfied with modern virtual reality headsets with frame refresh rates of 80 to 144 Hz and up to 2K resolution per eye [29], [30]. Furthermore, as soft shadows and reflections tend to be ubiquitous in all rendering applications, a stricter demand for photorealistic rendering is applied. All of this should preferably be done in an end-toend fashion, meaning that the rendering pipeline starts after the captured point cloud and ends in the photorealistically rendered frame. Thus, we derive the following research question for our survey: What is the state-of-the-art technique for photorealistic end-to-end direct point cloud rendering for a high-quality human-sized model (10 7 points), in 75 FPS, and a resolution of 1080p on consumer hardware?

B. INCLUSION CRITERIA
There is no objective method found in the literature to classify applications as real-time or interactive. For this survey, however, we defined a lower bound for real-time frame rate for the surveyed methods as at least 10 FPS, interactive frame rate as at least 1 FPS, and offline to refer to the rest. These definitions ensure the inclusion of methods measured on older hardware in the survey.
Point cloud ray tracing methods, including offline methods, were exhaustively surveyed and they were deemed to be real time or interactive if they achieved these results with at least a medium resolution of 512 × 512 and a small point cloud in the order of 10 3 points. Computationally demanding effects, such as caustics, were not required of the methods because they require pre-calculations even in mesh-based ray tracing. Similar requirements were demanded from rasterization papers with the exception of a point cloud size in the order of 10 4 points (to accommodate the needed resources for photorealistic effects implemented with rasterization) and at least some photorealistic effects supported, such as reflections or refractions. We estimated that these requirements, considerably smaller compared to the research question, correspond to the gain in performance when extrapolating from older to modern hardware.

C. CONTRIBUTIONS
This survey provides the following novel contributions to the existing body of point cloud rendering literature: • An exhaustive survey on ray and path tracing methods for surface point cloud rendering (Section III). • A review of real-time and interactive rasterization, hybrid (combining rasterization and ray tracing), and point-based neural rendering methods that exhibit photorealistic rendering effects (Sections IV, V, and VI). Additionally, we provide an overview of acceleration methods designed and applicable for real-time and interactive photorealistic point cloud rendering that could be utilized in an animated or live-captured point cloud scene (Appendix A). Furthermore, the computational capabilities of older surveyed methods are analyzed on and extrapolated to modern hardware in Appendix B.

D. STRUCTURE OF THE SURVEY
The rest of the survey is structured in the following way. Section II covers surveys and reviews related to real-time photorealistic point cloud rendering with justifications on what this survey provides compared to previous work on the subject. The main contribution of this survey, namely, reviewing real-time, interactive, and near interactive methods for photorealistic point cloud rendering is presented in Sections III, IV, V, and VI. A discussion on current capabilities and trends in real-time photorealistic point cloud rendering as well as on future research possibilities is given in Section VII together with answers to the research question. We conclude the publication in Section VIII.
Additionally, Appendix A surveys acceleration methods designed or easily applicable for point cloud rendering, and Appendix B extrapolates the computational performance of older surveyed methods to modern hardware.

II. RELATED SURVEYS AND REVIEWS
In this section, surveys and reviews that relate to point cloud rendering and acceleration structures for point clouds are covered. The goal is to briefly summarize what the previous survey and review publications have covered and what this survey contributes to the existing survey literature. Furthermore, parallel surveys related to the point cloud processing pipeline from capture to pre-processing (depicted inside dashed lines in Figure 1) are presented with justifications why the subject matter in those surveys is not covered in this survey. Finally, the contributions of this survey compared to the existing survey literature are reiterated.
Related surveys and reviews include general point cloud visualization mostly focused on rasterization-based methods and techniques [31]. Specifically close to this survey is the publication comparing different rasterization-based methods to meshed model rendering for real time point cloud visualization [32]. Our contribution, compared to [32], is the review of photorealistic methods of point cloud rendering and the review of modern methods, because the publication in [32] is from 2004. Additionally, a book on point-based graphics is available [33]. Participating media uses an underlying particle-or point-based schema, but generally they approximate the effect of particles in the air as a density texture instead of a point cloud (survey available in [34]). RGB-D image registration techniques are surveyed in [7] and simultaneous localization and mapping (SLAM) methods are reviewed in [35], both of which are out of scope in this survey. The subject of capturing and scanning real world objects has been extensively surveyed with a focus on RGB-D images in [8] and VR-centered capturing in [9]. The capturing and scanning of point clouds is also out of scope in this survey. Methods concerning smoothed-particle hydrodynamics (SPH) fluid simulation and rendering have been reviewed in [36]. SPH fluid rendering is closely related to surface point cloud rendering as SPH methods consider the fluid as single particles inside a fluid volume interacting with each other. Compared to surface point clouds, SPH fluid particles have attributes from the simulation, such as mass and velocity, rarely present in generic point clouds. Thus, by-product information like particle radii can be used in the rendering of particles. Although this survey considers specific simulation techniques in SPH as being out of scope, the rendering of SPH is reviewed. In [36], an overview of the SPH rendering methods is given, whereas our survey concentrates on methods with applicability to point cloud rendering and provides a more in-depth analysis of the rendering features and computational performance of these methods.
Augmented (AR) and mixed reality (MR) rendering systems strive to visualize real and virtual elements seamlessly together. The immersiveness is enhanced by producing realistic interactions between real and virtual objects, e.g., correct occlusions and lighting effects between objects. Generally, point cloud rendering also tries to visualize objects in different locations and lighting than their original environment. However, the virtual objects planted inside AR and MR environments are almost exclusively in a mesh representation and, as such, the rendering of the objects is done by meshbased methods. Thus, the review of AR and MR methods in general have been excluded from this survey, and we refer the reader to two state-of-the-art surveys on the subject [37], [38]. VOLUME X, 2021

III. RAY TRACING POINT CLOUDS
In the literature, three main methods for direct point cloud ray tracing have been established. These methods are illustrated in Figure 2. Cone and cylinder/beam tracing are methods which circumvent the problem of individual points having no explicit surface to intersect, by expanding rays into a volume object able to capture points inside volume segments of the ray. Implicit and isosurface approaches use a metric, such as Euclidian distance, to evaluate the proximity of a surface implied by points near an evaluation point along an intersecting ray. Finally, the MLS-based method produces a local polynomial fit, such as a second-order curved surface, with respect to an evaluation point among nearby points. This surface can be iteratively or non-iteratively constructed and intersected by a viewing ray. The main difference between general implicit surfaces and MLS-based surfaces is that implicit surfaces are locally linear whereas MLS surfaces typically have a higher order local approximation.
The previous book on point-based graphics from 2007 [33] has already covered many of the earlier ray tracing methods, which we briefly summarize first. Beam tracing point cloud was one the first methods to produce Whitted style reflective and refractive rendering [4] for point clouds. A predefined beam radius based on point density was used and points accelerated by an octree data structure and falling inside a beam section within a leaf node were weighted based on on orthogonal distance to beam center. Gaussian weighting ensured smoothly blended attribute values. In [39], the idea was expanded with cone tracing in order to support advanced effects such as soft shadows. They used a dual resolution model of a scene with high resolution triangles and lower resolution point primitives in an octree data structure. A more reconstruction oriented approach used actual rays instead of beams or cones to intersect a point cloud as locally reconstructed MLS surfaces [40]. Points were stored in a bounding sphere hierarchy (BSH), and the MLS surfaces were reconstructed using the centers of the leaf node spheres as a reference point, providing a cached reconstruction for multiple instances of intersection tests. Previous methods supported only static point clouds, which was addressed in [41] for animated and deforming scenes. Surfels (points with radii and surface normal attributes) were lazily updated based on underlying simulation nodes, i.e., only surfels visible to primary and secondary rays were updated at rendering time. Finally, the first interactive ray tracing method for static point clouds represented as surfels was introduced in [5]. A k-d tree was first traversed to a leaf node where implicit surface values were evaluated at constant intervals. The interval where a sign change occurred was further sub-divided until a threshold value was reached and an intersection was registered at an interpolated value. A commonality between these methods was a CPU implementation without utilization of the GPU.
In the rest of this section we review the various ray tracing methods not covered in other literature that are capable of producing photorealism in a scene by rendering point clouds in real time. The focus is on surveying purely ray-tracing-based methods exhibiting one or more photorealistic effects such as reflections, hard or soft shadows, refractions, and GI.

A. OFFLINE METHODS
The methods introduced in this section do not achieve interactive frame rates of larger than 1 FPS with any of their reported point cloud sizes or screen pixel amounts. These include the already covered methods in the pointbased graphics book [4], namely [39]- [41]. However, most of the methods are from the early 2000s and they might be implementable in real time on modern hardware. We discuss the possible application of these methods further in Sections III-D and VII. In [42], a splat-based ray tracing scheme was utilized. A raw point cloud, possibly accompanied with surface normals, was used to generate intersectable surface splats. Splats were created iteratively by expanding a linear plane fit onto the point cloud until an error bound was violated. A normal gradient field on each splat was solved as a linear system of two unknown major gradient directions given the point normals and locations within the neighborhood of the splat. If normals were not present in the raw point cloud, then a least squares plane fiJt on the nearest neighbors of a given point was done as a preprocess and the direction of the plane was used as the surface normal of the given point. The splat generation took 4 seconds for the Fuel model of 35 thousand points transformed into 28 thousand splats and 85 seconds for the Buddha model of 544 thousand points transformed into 384 thousand splats. The actual Whitted style ray tracing accelerated by a dynamic splat octree with hard shadows, reflections, and refractions took 84 seconds for 28 thousand splats and 408 seconds for 384 thousand splats. In both cases a resolution of 1200 × 1200 and 1 spp with two ray bounces was used. A 3.06 GHz Intel Xeon CPU was used for all processing.

B. INTERACTIVE METHODS
In [43], a LOD system accelerated with a k-d tree was generated to ray trace splat primitives acquired from a point cloud. Later, their compression-based method was able to achieve ray tracing at interactive frame rates of 0.5-2 FPS with up to 28 million points [44]. The used compression method generated a dictionary of MLS patches of points with orientation and location placed in an instanced k-d tree for accelerated ray-patch intersection. Once a k-d tree leaf node with a few patches was traversed each patch was intersected by iteratively refining MLS surface intersections until a sufficient error tolerance was achieved and all patch intersection were interpolated based on patch center distance. The performance with huge point cloud models was due to the instanced patch-based compression being able to fit the compressed point cloud in GPU memory reducing CPU-GPU memory transfers at rendering time. The number of spp and supported rendering effects were not reported. However, based on the rendered images only correct direct lighting without stochastic effects were presented and thus 1 spp is implied. Several hours of preprocessing time was reported for the compression encoding and patch-based k-d tree construction.

C. REAL-TIME METHODS
As discussed before, the method in [5] was the first CPUbased method that achieved interactive frame rates on plausibly sized point clouds. However, the first parallel real-time GPU-based point cloud ray tracer was published in [45]. As they did not use any acceleration structure for the secondary rays, they had to limit the ray depth of reflection ray traversal to achieve reflections and refractions at interactive frame rates. Contrary to the implicit surface approach, a local MLS surface reconstruction was used to construct an intersectable surface along the primary rays, which was a more suitable algorithm for parallel execution. For secondary rays, a coarse intersection point was generated with a single pass of the MLS algorithm. A follow up publication [46] improved the method by introducing a grid-based acceleration structure and refraction rays.
A more traditional approach of using splats as the intersectable primitive was done in [47] with a performant splat octree method executed on the GPU. Up to 4 spp for primary rays and 10 bounces for GI effects were used. However, the octree was built in a preprocessing step and the execution time was not discussed. The authors in [48] used the same method of textured splats stored in an octree and used normal shading for primary lighting and tracing rays to generate shadows. The splat octree differed from the implicit and MLS surface methods in that the intersection querying was done on the splats themselves and not by ray marching implicit values or reconstructing local surfaces. However, the challenge of producing the input splats from a raw point cloud was not discussed.
The same authors later improved their splat octree methods by transforming the splats into an implicit representation. This enabled real-time performance with reflections, refractions, and shadows [49]. Exactly like in [5], isosurface values of the implicit function were stored in the leaf node corners and interpolated when the ray intersections were queried. Despite real-time performance claims, constructing the octree and the implicit surface values in the corner points was a preprocess step and took tens of second making the method applicable in real time only for static point clouds. Precomputed caustic effects were later added into a similar octree ray tracing structure in [50]. A caustic sample map was stored in the octree leaf node corners the same way as the implicit function values and the luminance values were interpolated at the intersection point to produce realistic caustics. However, dynamic point clouds and lighting were not supported in real time.
Finally, two methods for point cloud ray tracing -one using splat primitives [51] and the other using sphere tracing [52] were published. The former publication used a k-d tree to accelerate intersection testing iteratively yielding an intersection point and an average surface normal from neighboring points at interactive frame rates. The latter publication did not explicitly use geometric primitives, but rather used sphere tracing and measured the weighted average distance to the points stored in a BSH. They also introduced an advanced method which reconstructed a mesh between the points of a BSH leaf node, but both the advanced and the basic method performed offline.

D. SUMMARY AND DISCUSSION
Real-time direct surface point cloud ray tracing methods mostly require oriented points with radii (splats) as input for their rendering pipeline implying a splat generation algorithm for surface point clouds. The authors in [44] generate their MLS-based surface from raw point clouds but require heavy preprocessing and compression taking up to several hours. Preprocessing is utilized by all methods ranging from 0.17 FPS (5.9 seconds) of point-set surface construction on a 32 3 = 32768 voxel grid [45] to approximately 20 to 180 seconds for isosurface value octree construction [48], [50]. Consequently, the methods were only capable of ray tracing static scenes in real time with the reported hardware.
We identify the state-of-the-art methods in real-time photorealistic point cloud rendering in terms of rendering speed and the amount of supported photorealistic effects and summarize all of the methods in Table 1. In order to make the results as comparable as possible we have to account for the three major factors affecting the rendering time: the number of points/particles in the scene, the screen resolution, and the number of spp. All methods use 1 spp except [53] and [47], which use 2 spp for particle inter-reflections and 4 spp for supersampling, respectively. This makes most of VOLUME X, 2021 the methods' spp counts fairly compatible when assessing performance. We have categorized methods achieving real time (≥ 10FPS) or interactive real time (≥ 1FPS), mutually exclusively, and reported the closest maximum number of points still renderable in real-time or interactive frame rates. Further details on scalability based on resolution and point count as well as the used rendering hardware are presented in Table 5.
Surface point cloud ray tracing methods have challenges achieving higher real-time frame rates. Generally the methods have a medium resolution of approximately 512 × 512 and work on point clouds ranging from a magnitude of 10 3 to 10 6 points. The most performant method in this category is presented in [47], which achieves 55 FPS with a resolution of 512 × 512 and 10 6 points. Furthermore, all the methods in this category have omitted the time needed for preprocessing the point cloud, including surface normal and point radius generation.

IV. RASTERIZATION
In this section, rasterization-based methods for real-time photorealistic point cloud rendering from the 2010s are surveyed. With the increasing computational power of GPUs and the introduction of more programmable processing units for GPGPU computing, sophisticated real time rendering algorithms for point clouds could be developed further in the new decade. Nevertheless, the requirement for photorealism is relaxed compared to ray tracing methods as rasterization does not inherently support, e.g., shadows and reflections, and separate shading passes have to be executed to generate these effects. The goal is to review methods that at least support some sort of realistic shading model. Additionally, the early general point cloud rasterization methods from the 2000s are briefly reviewed for context.

1) Early Point Cloud Splatting in the 2000s
Seminal methods for the popular splatting method ( Figure 3) were developed in both [2] and [54]. Even though they were not focused on photorealistic effects as such, they paved the way for other splat-based methods. Consequently in [55], radiance (direct lighting and shadows) and irradiance (indirect lighting) were stored into a splat cache on the CPU/RAM and an octree splat cache on the GPU, respectively. It was designed for splatting the radiance and irradiance on a triangle mesh model, but it was extendable to point cloud models if a method for point cloud splat generation was available. For proper realistic GI, splat radii, normals, normal gradients/interpolation, and material properties (BRDF definitions) were needed. The irradiance cache was rasterized on a 60 × 60 unit hemisphere covering a single splat. Instead of targeting photorealistic effects as such, fast splat rendering was used in [56]. The method assumed a readily available elliptical splat model with surface normals and ellipse axis radii, but the generation time of such a model from a raw point cloud was not discussed. Compared to [55], EWA filtering between the splats in object space and approximate screen space anti-aliasing for edge aliasing was added for rendering quality enhancement. However, the method only supported traditional shading and shadow maps and extendability for other photorealistic rendering effects was not discussed. The method achieved a rendering speed of up to 17.5 million splats per second. A similar splatting method implemented on the GPU added transparent, reflective, and refractive effects in [57] with deferred blending, which already achieved interactive frame rates with up to 2 · 10 6 points. In order to reach real-frame rates, a LOD system was introduced for rendering point clouds of up to 10 7 number of points [58]. Instead of accurate elliptical splats in previous method, they used screen space splatting with pixel sized splats and a nested octree structure implemented as a parallel memory optimized sequential point tree, which sequentialized a subset of vertices in the octree sorted by importance into a buffer. Nevertheless, applicability to dynamic scenes was hindered by the construction times of the sequential point tree, which took from minutes up to hours on the CPU with point cloud sizes of 10 7 to 10 8 . Furthermore, only crude primary color shading was used due to the lack of point normals. These problems were alleviated in [59] with a fully real-time splat-based rendering method. The method used a variation of the PSS definition called the algebraic point set surface (APSS), which fit spheres with varying curvature instead of polynomials onto the surface defined by the circular splats. To further enhance the rendering quality, the splats were upsampled before projection by placing a constant amount of sub-splats on regular intervals inside a square plane fit around the super-splat. Several acceleration structures were utilized for efficient local reconstruction, and their optimal redundant pyramid data structure yielded a throughput of 100 thousand points in about 2 milliseconds. Compared to using splats directly, APSS provided a smoother reconstruction than using, e.g., the previously popular EWA filtering for splats.

2) Modern Point Cloud Rasterization Pipelines in the 2010s
Adopting the previous ideas of using splat-based rendering for point data, a multi-layered splatting method with spheres for particle-based liquid rendering was introduced in [60]. Effectively, using spheres provided a similarly smoothed effect for liquid surfaces as the APSS approach, but needed additional depth map smoothing with an iterative curvature This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. flow filter. The authors rendered several layers of the liquid in order to produce separated foam and normal perturbed refractions. In [61], the authors took the idea of multi-layer depth map rendering further by using up to four layers for transparency with hard and soft shadows at real-time frame rates. However, they returned to splat primitives and, thus, needed screen space hole and occlusion filling as well as edge-aware smoothing and anti-aliasing for consistent results. Another application area for the splat primitive was the detailed rendering of facial scans [62]. The authors adaptively distributed the splats depending on detail level and adjusted the splat sizes accordingly with screen space hole filling applied accordingly. Self-shadowing with hard shadows on the face and up to six layers of sub-surface scattering was achieved in real time partly due to the reduced rendering effort with the adaptive splat distribution, but the preprocessing work entailed static models. With increasing point and splat counts in more intricate models, a LOD structure based on multi-way k-d trees (MWKD) was introduced [63] (depicted in Figure 4). The multi-way nature of the data structure was due to several feature discrimination strategies such as normal deviation clustering, entropy-based point reduction, and modified kclustering. These were used for splat count reduction with a representative MWKD node and hashing as well as compression reduced the resulting model size further. Additionally, hierarchical interpolation between LOD levels increased smoothness and temporal reusage decreased costly out-ofcore fetches. This approach was applied in an archaeological setting with features such as relighting and shadows in scenes with a torch tool and laser pointers with intersection testing [64].
The computational challenges of preprocessing for dynamic scenes were tackled in the AutoSplats method, which did not assume point radii or surface normals [3]. The authors generated object space oriented splats with a screen space k-NN search algorithm that iteratively grew or shrunk the considered area around points. The surface normals of points were estimated with PCA on k-NN points, after which a weighted average of the points was used for a plane fit. Basic shading was incorporated but other realistic rendering effects such as shadows were not present.

A. SUMMARY AND DISCUSSION
Surface point cloud rasterization methods are typically reliant on pregenerated splats, i.e., both pregenerated surface normals and point radii. The best method with regards to computational efficiency is the method in [64] achieving up to 100 FPS with a resolution of 1920 × 1080 and 14 · 10 6 points with 2 · 10 6 visible points. However, as discussed previously, the rasterization-based methods, with the exception of [61], only provide hard shadows and Phong shading which is considerably less photorealistic than the effects produced by the ray tracing methods.

V. HYBRID RENDERING METHODS
Methods combining both rasterization and ray tracing for their rendering pipeline are reviewed in this section. Specifically, methods from the subject areas of PBGI, point-based indirect illumination, and SPH fluid rendering are able to produce photorealistic effects for point-based models with a hybrid approach incorporating the benefits of both rasterizationbased and ray-tracing-based techniques.

A. POINT-BASED GLOBAL ILLUMINATION
A precursor to the PBGI method produced meshless radiosity and GI effects [65]. An intersectable model was sub-sampled with splats and stored into a multi-level R-tree with fine to coarse basis function approximations. Several ray traced bounces, including support for three bounces of glossy reflections or multi-light irradiance, were used to generate GI effects with irradiance gathering -this is depicted in Figure 5. The authors later increased rendering time performance by constructing the R-tree and ray tracing single bounce GI in a preprocessing step [66]. The original method [65] was also re-implemented on the GPU with a parallel implementation of the R-tree basis function hierarchy evaluation and ray tracing [67]. They achieved an order of magnitude faster rendering time for meshless radiosity GI, but due to the limitations of the GPU implementation, only the indirect lighting was splatted and ray traced whereas the direct lighting was rasterized.
The original technical report on approximate PBGI introduced GI effects with environment maps and spherical VOLUME X, 2021 harmonics projected onto surfels (surface splats) [68]. Surfels were stored in an octree for accelerated environment mapping and spherical harmonics. The method was reported to be extendable to any surface representation that can be transformed into surfels with the necessary attributes for photorealistic rendering effects. Because the spherical harmonics approach was an approximation for GI only smooth effects such as glossy reflections, blurry refractions, and soft shadows were supported. Actual ray tracing was suggested as an extension to the rendering pipeline for sharp reflections, refractions, and shadows. Preprocessing took up to minutes, of which the generation of surfels took 18 seconds. An enhanced version significantly sped up the generation of surfels and the octree structure from mesh models [69]. Furthermore, the extendability of the method to pure point cloud models with rasterized primary visibility was explicitly discussed, but this approach assumed readily available surface normals and point radii from the raw point cloud.
[70] further improved the PBGI indirect lighting method in terms of image quality by introducing light voxels for volume data points which resembled probe-based lighting. They added inward gathering and outward scattering of indirect light for realistic light interactions inside a volume point cloud and between the rest of the scene. Metric results showed 2.1 to 4.3 percent deviation from a reference rendered with a Monte Carlo ray tracer compared to 5.6 to 14.4 percent with the method in [68]. The computational overhead of octree construction for the surfels on the CPU was tackled with an out-of-core point sample octree construction algorithm [71]. The octree provided a data structure utilized by several CPU cores in parallel for LOD ray tracing producing GI effects like single bounce diffuse inter-reflections, ambient occlusion, and high-dynamic-range environment map lighting.
[71] continued the work on PBGI by developing an outof-core point sample octree construction algorithm based on efficient Morton code sorting. The octree provided a data structure utilized by several CPU cores in parallel for LOD ray tracing for GI effects like single bounce diffuse inter-reflections, ambient occlusion, and high-dynamic-range environment map lighting. A similar PBGI implementation was presented in a short paper [72], which was focused on reducing the memory footprint of the point cloud size accompanying PBGI methods. They achieved up to two orders of magnitude smaller point clouds with a multi-resolution implementation of the octree structure. Instead of using explicit ray tracing for the indirect lighting, an intricate progressive rasterization with hemispherical microbuffers was used for radiosity gathering on surfels [73]. The idea was that reduced microbuffer sizes of 32 × 32 were sufficient to capture indirect lighting and glossy-to-glossy reflections with additional glossy environment lobes. Furthermore, surfels were clustered via k-means such that random samples could be used as representatives of a clusters contribution for performance gains. The microbuffer approach was further advanced with LOD-like k-clustering of near and far surfels based on attribute mean values [74]. Far away surfels were approximated with the cluster mean values whereas the near surfels were further refined by their exact attribute values. The method was offline but was able to reduce frame times by a factor of 2 to 3 to the original PBGI method [68].

Scattering radiance
Gathering irradiance A return to explicit ray tracing for PBGI in [75] featured a GPU-based implementation with accurate surfel occlusion evaluations. The authors used a fixed budget of 500 thousand surfels generated in a preprocessing step and inserted into a tree structure in a few seconds. The novelty of the method was a reduction step which traversed the PBGI tree up and down to find the optimal level in the hierarchy in terms of rendering speed and GI quality. Ideas from photon mapping, PBGI, and radiosity caching were combined in an approximate GI ray tracing pipeline [76]. In addition to a traditional radiance-and irradiance-based PBGI solution (shown in Figure 5), the authors proposed a photon-mappinglike stage where they ray traced light rays from emissive objects to pre-generated radiosity particle locations weighted by realistically correct BRDF and normal-based coefficients. To accelerate scattering and gathering of light into the radiance particles, the authors generated geometry lists between the most contributing particles in a preprocessing step.
For applications with purely static scenes, precomputed PBGI was a more efficient approach. Volumetric shadows for point cloud represented participating media were presented in [77]. They generated a height field occlusion map in 2D light space stored in a quad-tree for acceleration. At rendering time, the view rays were transformed into light space and shadows were cast in the participating media where the sampled view ray locations were below the occlusion height field values. Similarly in [78], real-time GI re-lighting in realworld scanned cave scenes was achieved by storing a point cloud and its direct illumination into a sparse voxel octree in an offline preprocessing step. In [79], the result of radiosity gathering and scattering inside a homogeneous volume was stored into a precomputed PBGI octree, which was used at rendering time for accelerated light gathering. The system supported multiple scattering events and light bounces inside the volumes [79]. A 60-fold decrease in computation time and a 6-fold decrease in memory consumption compared to a reference solution was reported. Later, a GPU version of the method was implemented, which achieved real-time frame rates with up to 50 million particles [80]. Furthermore, [81] extended the original offline method in single scattering cases by transforming the angle dependent light transport values from RGB-angle space into the frequency domain with a covariance matrix eigenvector representation. They adopted the idea from [82] where surface and participating media events, such as free space transport and reflections, altered the covariance space eigenvectors with an appropriate matrix transformation.
An incremental improvement to the PBGI method ray traversal was proposed in [83]. A BSH tree traversal was implemented on 16-vector wide SIMD operations, which achieved a 2 to 3-fold decrease in rendering time compared to a non-parallel implementation. The authors elaborated their methods further in [84], by introducing a hybrid parallel BVH tree traversal scheme using a packet and single thread SIMD depending on the BSH-tree detail level. They examined microbuffer sizes of 32 × 32 and 128 × 128 for radiosity gathering in point samples and also used an adaptive resolution method to refine glossy reflection areas. Recently in [85], the surfel-based global illumination scheme was further enhanced with an adaptive ray tracing heuristic and dynamic surfel spawning for constant screen space surfel occupation. Local variance and frequency of surfel visibility were used together with global ray count limit to adaptively send more rays in more frequently used and higher variance surfels. Also, advanced techniques such as importance sampling with ray guiding, local surfel irradiance sharing, and light cuts for multiple light sampling were used to produce fast converging and robust irradiance results even in varying complex scenes and lighting conditions.

B. INDIRECT LIGHTING WITH POINT-BASED METHODS
Using a point-based method solely for indirect lighting have been explored for both shadow generation with imperfect shadow maps (ISM) and reflection generation with reflective shadow maps (RSM). These methods use a reduced representation of the scene and splat it into second bounce screen space microbuffers for sufficiently accurate effects which is depicted in Figure 6.
Real-time indirect lighting with virtual point lights (VPL) and ISMs were presented in [86]. As in standard VPL methods, the scene was importance sampled via cube maps from the view of the scene lighting to generate points gathering virtual light contributions. Holes in the ISMs were filled with a push-pull filtering kernel and finally the individual VPL contributions were gathered and blurred with G-buffer awareness. The authors later improved their method in terms of quality by substituting the VPLs with a full splat-based surface representation stored in a BSH binary tree In [87]. The splat-based scene representation was used for both indirect and direct lighting and locating a suitable resolution tree cut for LOD was based on solid angle coverage. The novel contribution was that preconstructed BSH was updated in real time even in dynamic scenes. Several publications took the ideas of ISM, RSM, and instant radiosity with VPLs and applied them in various realtime rendering applications. [88] produced several bounce GI effects between real and virtual objects in real time. Their contribution to the point-based approach of ISMs was the use of surface normal aligned quad splats and parabolic projection to achieve depth-based splat sizes instead of the original inverse depth splat sizes. In a later publication [89], the problem of unsupported double occlusions and incorrect color bleeding between real and virtual objects was resolved with rasterizing two depth maps: one for real objects and one for virtual objects. Furthermore, the authors achieved interactive scene reconstruction by integrating the KinectFusion [10] method to their rendering pipeline.
In [90], the ISM technique was extended to imperfect volumes by rasterizing stochastic point samples based on triangle sizes into a voxel grid and generating cached radiance samples per voxel via ISMs, and finally rendering the voxels with ray marching in screen space. Similarly, [91] presented a voxelized shadow mapping technique by synthesizing VPL shadow maps into a 3D grid, which they also extended to ISMs in a later publication [92]. Instead of using VPLs directly in [93], virtual area lights were generated by clustering VPLs with importance-sampling-based warping and, thus, generating fewer ISMs with similar quality to VPLs in real time. The remaining challenge of distributing VPL samples in RSM and sampling the scene with points in ISM was tackled in [94]. They generated a bi-directional RSM to place VPLs in locations contributing the most to the final lighting of the scene. Furthermore, they improved the point sample distribution in ISM by placing stochastic samples based on triangle solid angle contribution and distance in view space. Finally in [95], used GPU accelerated tessellation shaders with point generation mode to produce point samples for fast ISM rasterization based on triangle areas. Furthermore, the adaptive VPL placement was based on Metropolis-Hastings VOLUME X, 2021 sampling.

C. REAL-TIME PARTICLE-AND FLUID-BASED METHODS
The ubiquitous screen space meshless rendering pipeline for realistic real-time visualization of particle-based simulations, utilized by almost all methods surveyed in this section, was established in [96]. An overview of the method is depicted in Figure 7. They utilized a sphere rasterization method inspired by splatting [2], [54] for depth map generation from pointbased models and combined it with curvature-flow-based depth map smoothing as well as fluid thickness extraction to produce composited fluid effects like Fresnel coefficient weighted environment map reflections and refractions in addition to thickness-based transparency and light color attenuation.

Sphere primitives Rasterized depth maps
Depth map filtering Rasterize

Final depth maps Environment map
Ray trace FIGURE 7. The ubiquitous particle-based fluid rendering pipeline for refractive and reflective effects consist of the following steps: one or multiple depth map generation with sphere splatting (rasterization), depth map smoothing or reconstruction with screen space filter kernels, and ray casting or rasterizing depth maps and environment map with single or multiple reflective or refractive fluid surface events. Optionally, fluid thickness-based color attenuation and alpha-transparency can be added.
In [97], existing SPH fluid simulation and rendering methods were improved by implementing fully GPU-based voxelized SDF rendering for particles. A surface particle subset was extracted from the fluid by comparing the distances of particles to their respective neighborhood mass centers. The surface particles and their proximities were rasterized into a 3D voxelized SDF texture with preset resolution and the resulting SDF was ray cast from the camera and composited into a final transparent image of the fluid and the surrounding scene. In [98], actual refractive effects were produced with a hybrid approach with combined rasterized sphere sprites and bilateral smoothing which generated smoothed fluid front and back face depth maps. Refraction rays were cast from the front face pixel depth locations, and an iterative secant technique refined the ray exit point in the back facing depth map. Later, the authors improved the method in terms of image quality and computational performance by separating splash and surface particles and generating 4 layers of front and back depth maps in total [99]. Four-layered refractions with realistically blended specular reflections were achieved at a considerably small resolution in real time. Furthermore, a light weight light attenuation model based on the Beer-Lambert equations was added with negligible computation times to increase photorealism with differently absorbing wavelengths.
A similar technique was also applied in a deferred shading pipeline [100] but only with single layer depth maps. The particles were quantized into a fixed resolution grid in order to fit more points into memory and speed up rendering. The authors generated an approximate refraction effect by distorting the background transparent through the fluid based on fluid surface normals. A simple ray casting technique for adaptive rendering of transparent fluids was also published in [101]. The method aimed at accelerating rendering of fluid simulations by using a perspective grid acceleration structure (consisting of pyramids with cut-off tops) and representative rays for each grid cell for adaptive ray casting. Thus, ray casting was only used for adapting the sample rate but not to produce photorealistic effects other than aggregate transparency.
Further optimizations for GPU utilization were presented in several later publications. A GPU hashing scheme was used for accelerated neighborhood queries and real-time rendering of dynamic particle-based fluid simulations in [102]. The method used familiar screen space splatting with sphere primitives but, instead of depth maps, generated a ray marchable isosurface. Only the front-most surface of the fluid volume was rendered, and an alpha transparency technique extracted an approximate fluid thickness to produce fluid depth and refractions. Multiple GPUs were utilized in a distributed render in [103]. Instead of using curvature flow, the authors used a more lightweight bilateral filter for depth map smoothing. A full GPU implementation of a view aligned voxelized sphere particle structure showed that interparticle refractions and reflections with up to seven fluid layers was possible at interactive frame rates. A view space aligned voxelized sphere particle method fully implemented on the GPU was published in [53]. With photorealistic interparticle refractions and reflections between up to seven fluid layers, they achieved interactive frame rates. The voxels were traversed to find up to seven levels of continuous fluid surfaces based on neighboring voxel cells, and finally the extracted and linked surface cells were smoothed with an iterative curvature flow kernel to produce continuous and smooth intersectable surfaces. Finally, a modern implementation with the Nvidia OptiX ray tracing framework [104] and a CUDA-based GPU rendering system was presented in in [105].

D. SUMMARY AND DISCUSSION
Particle-based ray tracing methods are a viable substitute for surface point cloud rendering methods. They typically trump more sophisticated local reconstruction and splatting rendering methods in speed, resolution, photorealistic effects, and the number of rendered points. However, it should be noted that due to the depth map rasterization approach, the method is naturally limited only to effects produced by particles inside the view frustum, which makes it similar to screen space ray tracing in its extent. The basic rendering pipeline for particle-based homogeneous fluid rendering popularized in [96] and utilized in practically all the surveyed particlebased methods is depicted in Figure 7. It consists of 3 steps: sphere splatting into one or more screen space depth maps, smoothing and filtering the generated depth maps with sphere flattening and edge detail preserving filtering kernels, and rasterizing or ray tracing the front most depth map and generating photorealistic effects with layered depth map refractions, environment map reflections and fluid color attenuation.
The focus in the particle-based rendering literature has been to generate real-time rendering results with photorealistic fluid effects from SPH-based simulation, implying rendering support for dynamic scenes and, thus, real-time acceleration construction for ray tracing or direct depth map rasterization methods. The SPH simulations usually yield per-particle densities, masses, and motion vectors, which are available for rendering time effects such as foam generation and velocity deformation on particles. However, these attributes can be omitted in the surveyed methods and they are still applicable to raw point/particle clouds with only coordinate information and other optional attributes for traditional rendering pipelines (e.g. color and material properties).
All surveyed particle-based methods exhibit real-time frame rates with single layer depth map [96], [102], [103], [105] as well as double layer front-and back-face depth maps [98]. Additionally, foam and fluid separated particle depth maps [100] including up to four layer depth maps [99] and up to seven layer depth maps with particle interreflections [53] are also supported in real-time or interactive frame rates. All methods generate translucency, thicknessbased light attenuation, Phong shading, environment map specular reflections and approximated or exact refractions, and Fresnel equation composited final color.
Particle-based volume methods work on larger frame resolutions compared to ray tracing methods. Most of them achieve relatively high frame rates even with a resolution of 1280 × 720. The fastest method achieved a frame rate of 97 FPS with a resolution of 1024 × 1024 and 16 · 10 3 points [97]. However, the number of used points is comparably low given the fact that the particle-based methods have a lower number of actual surface points. Thus, a more comparable method with a larger point cloud size is the method in [99], which achieved 48 FPS with a resolution of 1024 × 768 and 4 · 10 6 points.
Out of the particle-based fluid volume rendering methods, the one in [53] was able to exhibit the most photorealistic effects at low interactive frame rates with up to 7 · 10 6 volume particles. Supported effects included 2 spp reflections for particle inter-reflection-based GI, multi layered (up to 7) refraction, and extendability to other ray tracing effects such as shadows and path tracing. In terms of rendering performance, the method in [100] achieves real-time performance on particle-based fluid volumes with up to 5 · 10 8 particle points. They used a binning strategy with up to 2 · 10 3 brick bins for accelerated rendering with single-layer approximate refractions and specular reflections. As discussed earlier, the number of points/particles is not directly comparable to surface point cloud rendering methods. We identify that the relationship between the particle-based volume rendering and surface point cloud rendering in this regard and in terms of method applicability and scalability in each domain would be a fruitful future subject of research.

VI. POINT-BASED NEURAL RENDERING
We briefly review the latest state-of-the-art neural network solutions for point cloud and point-based model rendering.
Only a handful of current solutions can produce plausibly photorealistic point cloud renderings in real time, and all of them require scenes to have static geometry. Furthermore, many of the neural networks have to be re-trained on target specific point cloud inputs and they need to have ground truth images for training, which are not readily available in applications such as holoportation or teleconferencing solutions. However, as the area of point-based neural network rendering is emerging, more general and computationally effective solutions independent of domain-specific training might be possible in the near future.
A neural network solution for upsampling and filling a splatted point cloud was published in [106]. Instead of using a rendering network to learn a mapping from a point cloud to final image, it used a GAN-based network to learn a post-processing step from an incomplete and low-frequency splatted rendering to a high quality final frame. The splatting method worked in real time by using a k-d tree to fit a suitable point-wise normal and splat radius based on k-NN in 12 sectors around the splat with similar surface normals. However, the actual upsampling and filling GAN network ran at interactive frame rates only for low resolution images. Thus, the authors used the network to render or pre-render high quality key locations and viewpoints in a scene and used the lower quality splatted results for interactive 3D scene navigation.
One of the first neural network-based methods attempting full scene capture (capturing or deducing scene geometry, material properties and lighting information, from multiple unstructured image sets in a single system) was proposed in [107]. They used a point cloud, a depth map, and a segmentation image constructed from multiple images of a single landmark scene as an input to a two-stage network. The first stage learned a descriptor vector set from the input point cloud, depth map, and segmentation to further encode various aspects of the scene. The descriptor set together with the inputs were further fed to the second stage (rendering network), which mapped the inputs to a final rendered image from a novel viewpoint with new lighting conditions. Furthermore, the semantic segmentation was used to mark nonrelevant foreground and non-stationary objects in the view of the rendered landmark so that the network could both learn to remove and re-instantiate these elements in novel views. The inference time during rendering and generation was reported to be 330 ms at a resolution of 512×512 on an Nvidia TitanV VOLUME X, 2021 GPU. Training time was not reported.
In [108], a two-part neural network, based on convolutional U-Net architecture, was used to render a point cloud input in novel viewpoints. The first part consisted of an 8 dimensional point descriptor vector set and camera parameters, which was rasterized into multiple-resolutions and given to the second, neural rendering network part to learn the mapping of the descriptors to an RGB image. Both the pointbased neural rendering network and the descriptor vector set were trained in a two-step fashion. First, the rendering network and the descriptor set were pre-trained on a general input sequence, after which the descriptor set was reset and retrained with a local input sequence (closely resembling the validation set) with the pre-trained rendering network. The authors reported better image quality compared to similar neural rendering methods based on mesh primitives. However, their method was also limited to scenes with static models and lighting, making it non-transferable to scenarios like dynamic point cloud streaming and re-lighting.
To alleviate the problems of rasterizing the descriptor vectors directly onto the image plane, a voxelization-based method was suggested in [109]. Instead of using a multiresolution approach on the same image plane with nearest depth pixel selection, the authors suggested rasterizing the descriptor vectors onto different depth planes on the view frustum called frustum voxels. Additionally, they blended all the fragments on single pixels based on a distance metric to the pixel center on both the image plane and perpendicular depth weighting the closer fragments more. This improved quality on sparse point cloud inputs and decreased background bleeding due to holes in the rasterized point clouds. Nevertheless, the rendering times were not disclosed. Furthermore, the method was similarly designed for static models and lighting.
The method in [110] used a radiance field (i.e. light field) representation of the scene to render an image from novel viewpoints via a volume rendering technique. The radiance field, produced from images of multiple viewpoints, was given as an input to two parallel fine and coarse fully connected neural networks, which gave a color and density output to a volume renderer for final composition. Entries from the radiance field were encoded in a higher dimensional space in order to preserve fine geometrical details. The final rendering time was measured in tens of seconds.
Neural rendering with a sphere-based geometry representation was proposed in the Pulsar method [111]. A fully differentiable rendering pipeline simultaneously forward renders and backward propagates scene representation refinement, projection operation, and neural shading. The sphere representation included radii, transparencies, and feature vectors describing local geometry and lighting, which were all learned during training time and applied at rendering time.
A three stage differentiable point renderer, called ADOP, was able to beat OpenGL point primitive rendering in both image quality and rendering speed [112]. The system consisted of a differentiable rasterized (inspired by the efficient compute shader point rendering in [113]) and tonemapper with a neural renderer in between, which took the output of the rasterizer as input and produced an HDR image for the tonemapper. The system refined and updated the input point cloud and camera model as well as both the neural renderer weights and tonemapper HDR parameters. In order to increase both speed and image quality compared to native OpenGL point primitive rendering, a stochastic point discard heuristic with discarded pixel geometry utilization for spatial gradients was employed.

A. SUMMARY AND DISCUSSION
Various methods for point-based neural network rendering have been proposed. Both image-based post processing of splatted point clouds [106] and model-to-rendering direct mappings [107]- [110] have emerged. Specific solutions tried to tackle the problem of background bleeding in sparse point clouds with a multi-resolution image-space solution [108] and an object space view frustum voxelization [109]. In [110], a custom encoding of spatial and photometric data stored information in a radiance field (light field), and the authors in [107] were able to relight scenes based on nonorganized images of landmark locations and remove transient objects from the foreground with segmentation. However, only a few of these methods were able to perform in real time and all of them rely on external algorithms to produce inputs like point clouds, depth maps, segmentation, or splatting, which were not included in final rendering time.
Nevertheless, we highlight two prominent novel viewpoint point cloud renderers achieving 23.7 ms with 10 6 spheres at a 1000 × 1000 resolution [111] and 5.7 ms rendering time with 10 7 points in full HD [112], which are highly usable neural point cloud renderers in scenes with static lighting and geometry. The former of the methods can learn the sphere representation from multi-view images, whereas the latter method only needs coarsely triangulated textured point clouds from RGB images with rough camera parameter estimates as input. A summary of the real-time performance of the methods is in Table 5.

VII. DISCUSSION
In this section, we discuss the state of the art in real-time photorealistic point cloud rendering based on the surveyed methods, focusing on the capabilities of producing photorealistic rendering effects in real time in Section VII-A. The posed research question is answered in Section VII-B. Additionally, we discuss the current capabilities of dynamic acceleration structure construction and updating for point clouds. This section is concluded with a discussion on the possibilities of future research in Section VII-C.

A. STATE OF THE ART IN PHOTOREALISTIC POINT CLOUD RENDERING
As depicted in Table 5, 23 point cloud rendering methods achieve interactive or real-time frame rates. Particle-based volume visualization methods have seen the most active  Particle-based volume methods consider point clouds that represent an object within a volume with possible interactions, as discussed in Section V. This sets it apart from surface point clouds in two ways.
On the one hand, the number of points in a particle system depicting the surface of the volume is much smaller than the total number of points. As photorealistic effects, such as reflections and refractions, are mostly concerned with the interaction of light at surface boundaries, the visualization effort is largely concentrated on the surface points. Thus, the points within a volume can be ignored if the surface is opaque or the volume is homogeneous in color and material. The latter statement is true for all of the surveyed particle-based methods as they work with homogeneous liquids. It should be noted, however, that more accurate photorealistic refractions VOLUME X, 2021 would benefit from taking into account the effect of varying volume densities if such information was available.
Particle-based systems almost always have an underlying simulation governing the interaction between the particles and their movement based on the particle characteristics. Usually, these simulations produce extra attributes for the particles, which eases the visualization part. These attributes include, e.g., the particle radius, which is used in most of the hybrid approach methods where particles are first projected into screen space as spheres with the accompanying radius, and then ray traced in screen space.
The screen space nature of the particle-based volume surveyed methods means that all the photorealistic effects are inherently limited to the parts of the scene currently in the view frustum and the pre-rendered environment map. This means that the surface point cloud ray tracing methods provide more realism at least in theory because, e.g., their reflections and refractions can include interactions outside the view space. The screen space representation is both a benefit in terms of rendering speed and a hindrance in less realistic effects in real-time particle-based visualization.

1) Supported Photorealism
The supported photorealistic effects of real-time point cloud rendering methods are summarized in Table 1. It should be noted that all the methods concentrate on some photorealistic aspects, but none of them report real-time capabilities with a path tracing approach where actual GI effects would be present. As such, it is still an open question whether point clouds can be path traced in real time or whether GI effects, in general, can be produced for point clouds in real time.
In general, the effects produced by the different methods can be categorized in the following way. All surface point cloud ray tracing methods, except [47], use a Whitted style ray tracing approach, which means that they support hard shadows and sharp specular reflections. Furthermore, single layer refractions are supported in [46], [48]- [50]. [47] provide soft shadows from multiple light sources and sharp specular reflections with multiple bounces.
Real-time surface point cloud rasterization methods provide only hard shadows [56], [62], [64] or soft shadows [61] with Phong shading as their photorealistic effects. This means that even though rasterization methods are computationally more efficient compared to ray tracing, the supported photorealistic effects for point cloud rendering are very limited.
All of the particle-based volume methods are concerned with refractions through liquids and reflections on the surface of liquids. The reflections exhibited in all methods, except [101], are sharp specular reflections from the accompanying pre-rendered environment map or between objects in the scene. The dividing factor between these methods, however, is the approach to refractions. [96], [97], [100], [102], [103] produce refractions only by sampling the environment map whereas [53], [98], [99], [105] also refract light from other objects in the scene including intra-refractions. Furthermore, the number of interaction layers used for refraction varies between the methods with [97], [98] having dual layer refractions and [53], [99] having up to 4-layer refractions.
Photorealism in point cloud rendering is also affected by the detail level of the point cloud model. In the point cloud capturing literature, a definite point resolution has not been established, and it varies depending on the application ranging from smaller density of points for urban and outdoor environments to high resolution details for indoor and human capturing. For the use case of most of the methods surveyed in this publication, namely human-sized objects, some estimates can be made on typical point cloud resolutions. For example, the KinectFusion and its successors typically use a grid structure of up to 512 3 entries or over 10 8 potential data points in an indoor environment. However, most of these grid entries are empty, making the evaluation of the total number of relevant points difficult. Models in The Stanford 3D Scanning Repository [15] give a rough estimate of the model detail needed to represent scanned objects. The typical size is around 10 6 whereas more detailed models, such as the human-like Lucy model, can have up to 10 7 points. For urban and man-made structures typically exhibit less geometric detail and areas, like floors and ceilings, can be expressed with less data. Hybrid rendering primitive approaches, like using meshes for linear static areas of the scene and point clouds for more detailed areas, might yield better results in these cases.

1)
What is the state-of-the-art technique for photorealistic end-to-end direct point cloud rendering for a high-quality human-sized model (10 7 points), in 75 FPS, and a resolution of 1080p on consumer hardware? As discussed in Section III-D, none of the surveyed direct point cloud rendering methods are able to produce all of the photorealistic effects with their respective hardware with the established requirements. Therefore, no direct point cloud rendering methods have shown photorealistic rendering at 1080p and 75 FPS for at least 10 7 points. The method closest to these demands is presented in [47], capable of rendering soft shadows from multiple light sources and sharp reflections with multiple bounces for 10 6 points and a resolution of 512 × 512 at 55 FPS. Moreover, as discussed in Appendix B, projecting the measurement provided on an Nvidia RTX 275 to a modern RTX 2080 Ti could mean a ray tracing performance of 75 FPS to 130 FPS with requirements posed in the research question. Adding the needed acceleration structure, namely the octree structure, presents a negligible computational overhead to the end-to-end pipeline if 10 6 points are used or even up to 10 7 on modern GPUs. However, as splats are assumed for the input of the ray tracing method, it is difficult to estimate how much preprocessing effort is needed to generate the splats. The methods in [53] and [88] achieve real-time performance but lack the photorealistic effects required.
Point-based neural rendering has showed promising results for point cloud rendering from multi-view image inputs. Both the Pulsar system [111] and the ADOP method [112] achieve 42 FPS with 10 6 spheres and a 1000 × 1000 resolution and 175 FPS with 10 7 points and 1080p resolution, respectively. However, the systems are primarily used for novel view rendering and are thus limited to static geometry and lighting.
Furthermore, we highlight the method in [114] as the most efficient acceleration structure for the purposes of the research question. As discussed in Appendix A, the authors provided a state-of-the-art acceleration structure construction method for both octrees and BVHs with up to 1.77 · 10 6 points in real time. Specifically, the octree construction took 0.88 ms, meaning an almost negligible performance loss to an end-to-end pipeline. Extrapolating this result to accommodate the desired 10 7 point count already yields a significant cost of at least 8 ms. As discussed in Appendix B, however, an optimistic lower bound estimate for the performance of the construction on an Nvidia RTX 2080 Ti was established at 0.24 ms for BVHs and 0.62 ms for octrees and 10 7 points. However, an acceleration structure method for volumetric meshes showed a more conservative construction time of 50 ms and a reference OptiX acceleration structure construction time of 200 ms. This means that the method may meet the posed requirements on modern hardware, but an actual implementation is needed to verify the capabilities.

C. FUTURE RESEARCH
The latest research on real-time point cloud ray tracing methods was published almost a decade ago. There is research to be done especially on extending ray tracing with path tracing for point clouds. Based on this survey, combining both rasterization and ray tracing techniques into a hybrid approach similar to the particle-based volume rendering could be highly beneficial in terms of computational efficiency. As particle-based approaches have worked on volumes, novel approaches applying the same techniques on surface point clouds would be interesting. Nevertheless, as discussed before, these methods would inherently work in screen space and suffer from similar problems as other methods such as screen space ray tracing.
Further research could be done to establish the sufficient detail needed in captured scenes and models. Researching the trade-offs between model detail-level and network bandwidth as well as the impact on photorealism and rendering speed may provide an interesting aspect. Establishing the level of photorealism with regards to point cloud size that can be captured, transferred, and rendered in real time with current hardware is a possibly fruitful avenue for research. Also, modern point-based neural rendering methods have built-in capabilities for point cloud and scene reconstruction due to learnable geometry features. Utilizing the scene understanding aspect of deep learning methods may provide an insight to sufficient level-of-detail of point clouds and geometry for photorealistic rendering in varying scene and lighting settings.
With the recent arrival of GPUs with dedicated ray tracing hardware, such as the Nvidia RTX series with Turing [115] and Ampere [116] architectures, harnessing the support for custom intersection functions for point cloud intersection could yield interesting results. Combining this with acceleration structures specifically designed for point clouds and comparing them to existing supported acceleration structures could also be useful. Based on our evaluation in Appendix B, meeting the requirements posed in the research question is feasible. The publication dates of the reviewed methods span two decades, and consequently, many generations of hardware architectures have since been released. Implementing and testing the highlighted methods for point cloud rendering on identical modern hardware would provide a fair comparison between them and definitively answer our posed research question.

VIII. CONCLUSION
In this survey, we reviewed real-time photorealistic rendering methods for point cloud visualization. Specifically, ray tracing methods for point cloud rendering were exhaustively surveyed and real-time rasterization and hybrid methods for realistic rendering were reviewed for comparison.
We found that direct photorealistic rendering is possible at 130 FPS and HD resolution [47] when estimating the performance on modern hardware. For the desired point clouds in the order of 10 7 points of size, an acceleration structure could be constructed in negligible time on modern GPUs [114]. Furthermore, point-based neural rendering can achieve novel viewpoint rendering for point clouds in static scenes with 10 7 points and 1080p resolution at 175 FPS.
Based on our findings, we highlighted that photorealistic rendering of live captured point cloud content has open research problems, such as utilizing path tracing for point clouds directly. Extending the state-of-the-art methods with path tracing to verify the performance of the methods on the current dedicated ray tracing hardware is left as future work.
To conclude, the performance numbers achieved with the state-of-the-art methods do not yet satisfy the requirement of photorealistic point cloud rendering when considering a sufficient end-to-end latency of at least 75 FPS, a minimum high-quality screen resolution of 1080p, and an adequately detailed point cloud in the order of 10 7 points. Direct point cloud ray tracing is an order of magnitude behind in terms of point cloud size and resolution. However, based on our estimations, the posed requirements may be achievable on modern desktop-scale GPUs with optimistic assumptions, let alone on mobile scale GPUs where the AR use case is more typical. .

APPENDIX A ACCELERATION DATA STRUCTURES FOR POINT CLOUD DATA
In this section, we briefly review acceleration structures designed and applicable to point cloud rendering, specifically, acceleration for point cloud ray tracing. This serves the purpose of estimating the performance of dynamic acceleration construction in an end-to-end system with temporally incoherent point cloud input. For a more exhaustive review on the acceleration of triangle-based animated ray tracing, we refer to [117]. Based on our findings, the most prominent methods for accelerating point cloud rendering specifically are k-d trees and octrees. Both k-d trees and octrees are depicted in Figure 8. Apart from these, we review also general spatial data structures and K-nearest neighbors. One of the first fully parallelized octree data structure construction methods implemented completely on the GPU was presented in [118], which achieved approximately a tenfold speed up compared to the seminal QSplat system in [2]. The method provided efficient LOD queries and selection with a sorted node structure based on rendering distance and a hole-free result with slight overlapping between hereditary nodes. Similarly, a layered point cloud structure with a hierarchical LOD system for the GPU was introduced in [119]. The multiresolution structure precomputed and binned point cloud points into nodes in world space hierarchically, and instead of rendering distance, used coarse to fine sorting based on sampling densities. The system also supported culling techniques and compression for further computational efficiency. In [120], a k-d tree acceleration structure was used for k-means clustering and quantizing, which could also be applied to LOD support. However, instead of point clouds their method was utilized for spherical harmonics (SH) coefficients storage and they utilized clustered averaged SH coefficients for the LOD support in a point-based global illumination (PBGI) rendering system with a focus on quality, not real-time speed.
K-d trees were also used directly for ray tracing acceleration. A real-time k-d tree construction method on the GPU was presented in [121] that used triangles or points with color without any extra attributes as input. The method used a greedy surface area heuristic (SAH) scheme, which outperformed a thorough SAH-based CPU implementation in smaller scenes. Furthermore, ray tracing and traversal speeds indicated better quality in practice for all types of scenes compared to the exhaustive CPU implementation. In [122], k-d tree structures were utilized to support for volume ray tracing. They extended the OpenVDB software [122] with a novel contribution of offering fast dynamic updating and ray traversal operations for a GPU implementation of an accelerated volume ray tracing framework.
An extensive system that supported several different acceleration structures, such as Octrees and BVHs, for ray tracing were built from a common binary radix tree representation in [114]. The tree construction started with a Morton code generation and sorting step, which placed 3D pointwise data on a space ordered curve in lexicographical Morton code order. Finally, a data structure for general k-NN querying on point-based data implemented a random ball cover (RBC) algorithm using a subset of representative points to bin a set of 3D points [123]. Contrary to the exhaustive search of iterating and sorting points per representative point bin, their GPU implementation iterated the 3D points and placed them to their respective representative bins, which sacrificed completeness for a efficient parallel algorithm. Even though the authors did not mention ray tracing as an application, k-NN querying is essentially a part of all the covered intersection testing algorithms for point cloud data.

A. SUMMARY AND DISCUSSION
K-d tree solutions in [121] for k-NN search and PCA-based surface normals as well as in [120] for the storage of SH coefficients of a surfel-based scene have shown their effectiveness on point data. The former method achieved a performance of 21 ms for static and 310 ms for dynamic deforming scenes with 171 thousand points, yielding a frame rate of 48 FPS and 3.2 FPS, respectively. The latter method was quality-oriented and not specifically suitable for real-time acceleration. Octrees were utilized for LOD-based rasterization acceleration in [118], but only the final rendering speed of 50 to 80 million splats per second was disclosed without mentioning the construction or update time of the data structure.
The most prominent method for acceleration was presented in [114]. Even though the method was primarily oriented for triangle-based scenes, the fact of using triangle centroids to organize the data structures made the method inherently point-based. For scenes with up to 1.77 million points, 0.34 ms for BVH construction and 0.88 ms for octree construction were reported. By extrapolating with the asymptotic behavior of acceleration structure construction n log(n), the results for 10 7 points yields > 3 ms for BVH construction and > 8 ms for octree construction. Furthermore, with a modern Nvidia RTX 2080Ti GPU, the construction time decreases to a negligible 0.24 and 0.62 ms, respectively, in the optimistic case of the construction algorithm fully parallelizing.
The covered methods are mostly software-based methods for constructing the acceleration structures. Dedicated hardware solutions for further accelerating the construction have been proposed [124], [125]. Utilizing hardware-accelerated BVH and other data structure construction and updating for point cloud ray and path tracing could yield even more performance and support for larger point clouds.

APPENDIX B COMPUTATIONAL PERFORMANCE ANALYSIS
To equally juxtapose the methods implemented on various GPUs, we compare the increase in processing units/cores from the original GPU in the publications to a state-of-the-art VOLUME X, 2021 desktop GPU, namely an Nvidia RTX 2080 Ti. This allows us to get a crude estimate of the capabilities of the different methods on modern hardware assuming that the algorithms would fully parallelize and utilize all available cores, which is an optimistic assumption. Also, for older hardware with more fixed graphics pipelines the comparison is hard as they do not have processing units comparable to modern GPUs. However, this method was chosen to give the reader a general idea of how the methods would compare on equal terms. Furthermore, the task of actually implementing all of the methods on modern hardware is very time-consuming and out of scope in this survey.
The different GPU platforms and their respective processing unit or core counts are summarized in Table 3. For the target platform of RTX 2080 Ti, we omitted the calculation of RT cores into the total count of cores as it is not clear how they could be compared to other cores on older hardware. As an example calculation, we evaluate the performance of the direct point cloud ray tracing method in [47]. This evaluation assumes that the presented method is fully parallelizable and that memory transfer and storage do not present a bottleneck to the pipeline, which is of course unrealistically optimistic. Consequently, this approximation should be treated as an estimate of the optimistic potential of the method. We ap-proximate the number of processing elements in the Nvidia GTX 275 used in [47] with the numbers for Nvidia GTX 280 in [128] which has a total of 240 streaming processors, i.e., processing units. Nvidia RTX 2080 Ti has 4352 CUDA "cores", comparable to processing units, and additionally 68 RT cores which could further accelerate ray tracing. If we simply use the CUDA core number for the estimate, the increase is 18×. Assuming a linear increase in the workload when increasing the resolution from 512 × 512 to 1080p, almost an 8 fold increase in computation is gained. This would yield a total increase of 2.3× in compute power and increase the reported 55 FPS in [47] to 130 FPS on a an RTX 2080 Ti. The frame rate projections of the rest of the methods in Table 5 are done in a similar way.
The calculations made for the projected frame rates of the methods in Table 5 do not consider the number of rendering primitives used in ray tracing. Acceleration construction and updating is the most affected by the number of rendering primitives. Similar to before, we evaluate an upper bound estimate for the acceleration structure method in [114] highlighted in Appendix A-A. The original method was implemented on the Nvidia GTX 480, which means 9× more cores on the Nvidia RTX 2080 Ti excluding the RT cores (see Table  3). Assuming a O(n log(n)) increase in computation time