3D Gaussian Splatting achieves real-time speed by aligning with how GPUs are designed: a tile-based rendering pipeline rooted in decades of rasterization acceleration.
3D Gaussian Splatting (3DGS) has rapidly emerged as one of the fastest methods for rendering novel views in neural graphics. In fact, it’s become an industry standard for real-time 3D scene representation, powering systems from NVIDIA’s COSMOS to Meta’s AR glasses. The key reason behind 3DGS’s blazing speed is that it builds on decades of GPU rasterization optimizations. Instead of ray-marching through a neural field like NeRF, 3DGS represents the scene explicitly as millions of tiny 3D Gaussians and projects them onto the screen in a rasterization-like fashion. Modern GPUs are exceptionally good at parallelizing such rasterization tasks, so by aligning its rendering approach with the GPU’s strengths, 3DGS achieves real-time performance where older methods struggled.
For decades, graphics hardware has been designed to quickly transform 3D primitives (like triangles or points) into pixels. 3DGS takes advantage of this by treating each Gaussian as a splat (a small ellipse on screen) that can be rendered in parallel. Crucially, 3DGS parallelizes work over image pixels (in tiles) rather than processing one Gaussian at a time. This design means that groups of nearby pixels are handled together, allowing them to share work and cull unnecessary computations. If a given Gaussian doesn’t overlap a particular region of the screen, that region’s pixels can ignore it entirely – an efficiency gain that keeps rendering time largely independent of each Gaussian’s size. This strategy was inspired by the Pulsar renderer (Lassner & Zollhöfer, 2021), which demonstrated how dividing the image into tiles and doing pixel-parallel rasterization enables real-time rendering of millions of primitives. In essence, 3DGS adopts a classic rasterization mindset: organize the computation around pixels and screen-space tiles, so the GPU can do what it was built to do – crunch through visible pixels fast, while skipping work for anything unseen.
Under the hood, 3D Gaussian Splatting’s rendering pipeline consists of a preprocessing stage followed by a tile-based compositing stage. Here’s a step-by-step breakdown of how a 3DGS scene is rendered at high speed:
By organizing the pipeline in this way, 3DGS “front-loads” the work in a preprocessing step and then lets highly parallel GPU kernels handle the pixel-wise blending. The preprocessing (steps 1–3) projects and culls Gaussians so that each pixel only deals with a small subset of them, avoiding redundant computations. The final rendering (step 4) runs in parallel for all tiles/pixels, making full use of the GPU’s thousands of cores.
It’s worth noting that an alternative approach would be to parallelize over Gaussians instead – i.e. have threads iterate each Gaussian through all pixels – but that would be far less efficient. 3DGS follows the pixel-parallel tiling approach so that nearby pixels can quickly reject Gaussians that don’t affect them, keeping the workload manageable even as Gaussian sizes vary.
While 3DGS’s tile-based rasterization is fast, one particular part of this pipeline became a performance bottleneck: determining which tiles each Gaussian overlaps. The original 3DGS implementation took a very conservative approach to this. When projecting a 3D Gaussian onto the screen, it gets an elliptical footprint. The original algorithm approximated that ellipse with a big circle (using the ellipse’s largest radius) and then used a bounding box around that circle to cover all potentially affected tiles. This approach overestimates the area of each Gaussian – the circle is often much larger than the true ellipse, meaning the Gaussian gets assigned to many tiles that it actually doesn’t contribute to. In practice, the original 3DGS was allocating far too many tiles per Gaussian, so a large fraction of the tile-pixel checks were wasted work that never affected the final image.
Speedy Splat (Hanson et al., 2024) introduced a clever fix for this tile assignment bottleneck. Instead of using an oversized bounding circle, Speedy Splat computes an analytically tight axis-aligned bounding box for each Gaussian’s projected ellipse – a method the authors call the SnugBox. By solving for the extreme extents of the ellipse, SnugBox finds the smallest pixel-aligned box that fully contains the Gaussian’s footprint. In effect, 3DGS no longer over-assigns tiles: each Gaussian is mapped only to the exact set of tiles its ellipse actually overlaps. This dramatically reduces the number of tile-pixel operations without sacrificing any image accuracy (the rendering remains bit-identical to the original). The result is a substantial speedup. With more efficient tile localization and other optimizations, Speedy Splat achieves on average a 6.7× faster rendering across various scenes compared to the original 3DGS, all while using an order of magnitude fewer Gaussian primitives.
3D Gaussian Splatting (3DGS) is now everywhere because it strikes a rare balance between rendering speed and visual quality. Its speed comes from decades of GPU engineering. GPUs were initially designed to accelerate rasterizing primitives like points and triangles, and 3DGS adapts its rendering process to follow this same hardware-efficient design.
For a deeper dive into how 3D Gaussian Splatting works (with actual code!), be sure to check out my step-by-step 3DGS tutorial, where we implement the full pipeline from scratch in PyTorch. You can also sign up for the 3DGS newsletter to get updates on new tutorials. Happy splatting!
Want to truly understand 3D Gaussian Splatting—not just run a repo? My 3D Gaussian Splatting Course teaches the full pipeline from first principles in PyTorch only (no C++, no CUDA). You’ll learn initialization, densification, rendering, and how to experiment with recent papers.
Explore the course →We help teams integrate 3D Gaussian Splatting techniques, build custom pipelines, and prototype new splatting research. If you need expertise, we can help.
Contact:
contact@qubitanalytics.be