Understanding 3DGS Rendering Speed: Tiling, Rasterization, and Optimization

The Speed of 3D Gaussian Splatting: Leveraging GPU Rasterization for Real-Time Rendering

3D Gaussian Splatting (3DGS) has rapidly emerged as one of the fastest methods for rendering novel views in neural graphics. In fact, it’s become an industry standard for real-time 3D scene representation, powering systems from NVIDIA’s COSMOS to Meta’s AR glasses. The key reason behind 3DGS’s blazing speed is that it builds on decades of GPU rasterization optimizations. Instead of ray-marching through a neural field like NeRF, 3DGS represents the scene explicitly as millions of tiny 3D Gaussians and projects them onto the screen in a rasterization-like fashion. Modern GPUs are exceptionally good at parallelizing such rasterization tasks, so by aligning its rendering approach with the GPU’s strengths, 3DGS achieves real-time performance where older methods struggled.

How 3DGS Leverages Rasterization for Speed

For decades, graphics hardware has been designed to quickly transform 3D primitives (like triangles or points) into pixels. 3DGS takes advantage of this by treating each Gaussian as a splat (a small ellipse on screen) that can be rendered in parallel. Crucially, 3DGS parallelizes work over image pixels (in tiles) rather than processing one Gaussian at a time. This design means that groups of nearby pixels are handled together, allowing them to share work and cull unnecessary computations. If a given Gaussian doesn’t overlap a particular region of the screen, that region’s pixels can ignore it entirely – an efficiency gain that keeps rendering time largely independent of each Gaussian’s size. This strategy was inspired by the Pulsar renderer (Lassner & Zollhöfer, 2021), which demonstrated how dividing the image into tiles and doing pixel-parallel rasterization enables real-time rendering of millions of primitives. In essence, 3DGS adopts a classic rasterization mindset: organize the computation around pixels and screen-space tiles, so the GPU can do what it was built to do – crunch through visible pixels fast, while skipping work for anything unseen.

3DGS Rendering Pipeline: From Gaussians to Pixels

Under the hood, 3D Gaussian Splatting’s rendering pipeline consists of a preprocessing stage followed by a tile-based compositing stage. Here’s a step-by-step breakdown of how a 3DGS scene is rendered at high speed:

Project Gaussians to Screen: All 3D Gaussians (each defined by a mean position and covariance) are projected into the 2D image plane using the camera parameters. This gives each Gaussian’s screen-space position, size, and shape (an ellipse). We also compute the Gaussian’s depth (distance from the camera) at this stage for later use.
Depth Sort (Visibility Ordering): To ensure correct translucent blending (since Gaussians can overlap and accumulate color/opacity), the Gaussians are sorted by depth from farthest to nearest, similar to the front-to-back alpha compositing in NeRF. This sorted order will let us accumulate colors in the correct order.
Tile Binning: The screen is divided into a grid of small tiles (e.g. 16×16 pixel blocks), and each Gaussian is assigned to the tiles that its projected ellipse overlaps. This tiling step is crucial for parallelization – by grouping work per tile, we ensure each pixel only considers nearby Gaussians. Notably, this tile-based strategy comes directly from the Pulsar renderer and avoids wasted work.
Tile-wise Volumetric Rendering: Now the actual rendering happens per tile. For each tile (and for each pixel in it), we iterate through the Gaussians assigned to that tile and evaluate each Gaussian’s contribution to the pixel. The Gaussian’s 2D elliptical formula is used to compute its intensity at the pixel, multiplied by its color and opacity. These contributions are then accumulated using the volumetric rendering equation with alpha compositing – effectively simulating the light passing through many translucent Gaussians along the ray. Because we sorted the Gaussians by depth in Step 2, we can composite them in correct order (far to near), blending colors and opacities to get the final pixel color.

By organizing the pipeline in this way, 3DGS “front-loads” the work in a preprocessing step and then lets highly parallel GPU kernels handle the pixel-wise blending. The preprocessing (steps 1–3) projects and culls Gaussians so that each pixel only deals with a small subset of them, avoiding redundant computations. The final rendering (step 4) runs in parallel for all tiles/pixels, making full use of the GPU’s thousands of cores.

It’s worth noting that an alternative approach would be to parallelize over Gaussians instead – i.e. have threads iterate each Gaussian through all pixels – but that would be far less efficient. 3DGS follows the pixel-parallel tiling approach so that nearby pixels can quickly reject Gaussians that don’t affect them, keeping the workload manageable even as Gaussian sizes vary.

Tiling Bottleneck and the “Speedy Splat” Solution

While 3DGS’s tile-based rasterization is fast, one particular part of this pipeline became a performance bottleneck: determining which tiles each Gaussian overlaps. The original 3DGS implementation took a very conservative approach to this. When projecting a 3D Gaussian onto the screen, it gets an elliptical footprint. The original algorithm approximated that ellipse with a big circle (using the ellipse’s largest radius) and then used a bounding box around that circle to cover all potentially affected tiles. This approach overestimates the area of each Gaussian – the circle is often much larger than the true ellipse, meaning the Gaussian gets assigned to many tiles that it actually doesn’t contribute to. In practice, the original 3DGS was allocating far too many tiles per Gaussian, so a large fraction of the tile-pixel checks were wasted work that never affected the final image.

Comparison of original 3DGS bounding circle and Speedy Splat SnugBox

Speedy Splat (Hanson et al., 2024) introduced a clever fix for this tile assignment bottleneck. Instead of using an oversized bounding circle, Speedy Splat computes an analytically tight axis-aligned bounding box for each Gaussian’s projected ellipse – a method the authors call the SnugBox. By solving for the extreme extents of the ellipse, SnugBox finds the smallest pixel-aligned box that fully contains the Gaussian’s footprint. In effect, 3DGS no longer over-assigns tiles: each Gaussian is mapped only to the exact set of tiles its ellipse actually overlaps. This dramatically reduces the number of tile-pixel operations without sacrificing any image accuracy (the rendering remains bit-identical to the original). The result is a substantial speedup. With more efficient tile localization and other optimizations, Speedy Splat achieves on average a 6.7× faster rendering across various scenes compared to the original 3DGS, all while using an order of magnitude fewer Gaussian primitives.

Conclusion and Next Steps

3D Gaussian Splatting (3DGS) is now everywhere because it strikes a rare balance between rendering speed and visual quality. Its speed comes from decades of GPU engineering. GPUs were initially designed to accelerate rasterizing primitives like points and triangles, and 3DGS adapts its rendering process to follow this same hardware-efficient design.

For a deeper dive into how 3D Gaussian Splatting works (with actual code!), be sure to check out my step-by-step 3DGS tutorial, where we implement the full pipeline from scratch in PyTorch. You can also sign up for the 3DGS newsletter to get updates on new tutorials. Happy splatting!

📘 Learn 3DGS Step-by-Step (PyTorch Only)

Want to truly understand 3D Gaussian Splatting—not just run a repo? My 3D Gaussian Splatting Course teaches the full pipeline from first principles in PyTorch only (no C++, no CUDA). You’ll learn initialization, densification, rendering, and how to experiment with recent papers.

Explore the course →

💼 Research & Engineering Consulting

We help teams integrate 3D Gaussian Splatting techniques, build custom pipelines, and prototype new splatting research. If you need expertise, we can help.

Contact:
contact@qubitanalytics.be

References

Hanson et al. (2025). Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives (CVPR 2025): speedysplat.github.io
3D Gaussian Splatting Tutorial – 100 Lines of Code Implementation, Medium, 2025: tutorial
Accelerating 3D Gaussian Splatting with Speedy Splat, 2025: read post