Custom Gradients vs Autograd in 3D Gaussian Splatting

In a previous blog post, I showed how to write custom backward passes for 3D Gaussian Splatting (3DGS)—and, surprisingly, it was easy to do.

In this post, we are going to compare these custom gradients against the built-in PyTorch autograd:

Are the gradients correct?
Are they numerically identical?
And most importantly: what do we gain in practice?

The short answer: exact same gradients, radically different memory behavior.

Gradient correctness: bitwise identical

Let’s start with the most important question:
> Are custom gradients “approximate” or “less correct” than autograd?

The answer is a very strong no.

The figure below compares:

SH DC coefficients (ℓ = 0)
SH higher-order coefficients (ℓ ≥ 1)
Their corresponding gradients
over the course of training.

Both curves are overlaid:

Autograd (blue)
Custom backward (black)

They are not just visually similar — they are bitwise identical.

VRAM usage vs number of pixels: autograd vs custom gradients in 3DGS

This is a crucial point:

Same forward pass
Same math
Same floating-point operations
Just expressed explicitly instead of being inferred by the framework

Why this makes sense in 3DGS

When the math is known and structured (as in 3DGS), writing the backward pass manually gives the same result as autograd.

3D Gaussian Splatting is a particularly good candidate for custom gradients because:

The rendering equation is explicit
All operations are differentiable in closed form
There is no control flow or data-dependent branching in the math

In other words, 3DGS is closer to a graphics pipeline than to a black-box neural network.

Autograd reconstructs the backward pass dynamically.
Custom gradients simply encode the same derivatives directly.

Memory: where things get interesting

The figure below tells a very different story.

It compares VRAM usage vs image resolution for:

Autograd
Custom gradients.

on a 24 GB GPU.

What happens with autograd?

Autograd runs out of memory before even reaching 480p
The memory usage explodes with image resolution
This is expected: autograd must store all intermediate activations per pixel to replay the backward pass

This makes high-resolution training essentially impossible.

What happens with custom gradients?

Training at 1080p, or even 4K, works comfortably
Memory usage is:
- Low
- Stable
And surprisingly… slightly decreasing as resolution increases.

At first glance, this looks counter-intuitive.

Why custom gradients use (almost) constant memory

The key design choice is how parallelism is expressed.

In our custom 3DGS backpropagation, we parallelize over tiles, not over pixels.

This is not new:

It is the same strategy used in the original 3D Gaussian Splatting paper
It was already present in earlier work such as Pulsar: Efficient Sphere-based Neural Rendering

Tile-based parallelization

Each tile:

Covers a fixed screen-space region
Processes only the Gaussians that overlap that tile
Accumulates gradients locally

As a result:

Memory depends on the number of Gaussians
Not on the number of pixels

This is the core reason custom gradients scale to high resolutions.

Why memory can even go down with higher resolution

This is the most surprising observation.

As resolution increases:

The screen is divided into more tiles
Each tile becomes spatially smaller
Each tile is influenced by fewer Gaussians

So while:

The total number of pixels increases

the number of Gaussians per tile decreases.

Because memory is proportional to:
Gaussians per tile

The net effect is:

Flat memory usage
Or even a slight decrease

Takeaways

Custom gradients and autograd produce exactly the same gradients
- Bitwise identical
- No approximation
Autograd is prohibitively memory-hungry for 3DGS
Custom gradients unlock high-resolution training
- 1080p
- 4K
- On a single 24 GB GPU
The key is tile-based parallelization
- Memory scales with Gaussians, not pixels
- A design choice rooted in graphics, not deep learning

Want more posts like this?
Subscribe to my newsletter for future posts, updates, and practical guides on PyTorch, 3DGS, and differentiable rendering.

📘 Learn 3DGS Step-by-Step (PyTorch Only)

Want to truly understand 3D Gaussian Splatting—not just run a repo? My 3D Gaussian Splatting Course teaches the full pipeline from first principles in PyTorch only (no C++, no CUDA). You’ll learn initialization, rasterization, backward passes, training loops, and how to experiment with recent papers.

Explore the Course →

💼 Research & Engineering Consulting

We help teams integrate 3D Gaussian Splatting techniques, build custom pipelines, and prototype new splatting research. If you need expertise, we can help.

Contact:
contact@qubitanalytics.be