Custom Gradients vs Autograd in 3D Gaussian Splatting

Custom gradients can match autograd bit-for-bit, while using dramatically less VRAM.

In a previous blog post, I showed how to write custom backward passes for 3D Gaussian Splatting (3DGS)—and, surprisingly, it was easy to do.

In this post, we are going to compare these custom gradients against the built-in PyTorch autograd:

The short answer: exact same gradients, radically different memory behavior.

Gradient correctness: bitwise identical

Let’s start with the most important question:
> Are custom gradients “approximate” or “less correct” than autograd?

The answer is a very strong no.

The figure below compares:

Both curves are overlaid:

They are not just visually similar — they are bitwise identical.

VRAM usage vs number of pixels: autograd vs custom gradients in 3DGS

This is a crucial point:

Why this makes sense in 3DGS

When the math is known and structured (as in 3DGS), writing the backward pass manually gives the same result as autograd.

3D Gaussian Splatting is a particularly good candidate for custom gradients because:

In other words, 3DGS is closer to a graphics pipeline than to a black-box neural network.

Autograd reconstructs the backward pass dynamically.
Custom gradients simply encode the same derivatives directly.

Memory: where things get interesting

The figure below tells a very different story.

It compares VRAM usage vs image resolution for:

on a 24 GB GPU.

What happens with autograd?

This makes high-resolution training essentially impossible.

What happens with custom gradients?

At first glance, this looks counter-intuitive.

VRAM usage vs number of pixels: autograd vs custom gradients in 3DGS

Why custom gradients use (almost) constant memory

The key design choice is how parallelism is expressed.

In our custom 3DGS backpropagation, we parallelize over tiles, not over pixels.

This is not new:

Tile-based parallelization

Each tile:

As a result:

This is the core reason custom gradients scale to high resolutions.

Why memory can even go down with higher resolution

This is the most surprising observation.

As resolution increases:

So while:

the number of Gaussians per tile decreases.

Because memory is proportional to:
Gaussians per tile

The net effect is:

Takeaways

Want more posts like this?
Subscribe to my newsletter for future posts, updates, and practical guides on PyTorch, 3DGS, and differentiable rendering.

📘 Learn 3DGS Step-by-Step (PyTorch Only)

Want to truly understand 3D Gaussian Splatting—not just run a repo? My 3D Gaussian Splatting Course teaches the full pipeline from first principles in PyTorch only (no C++, no CUDA). You’ll learn initialization, rasterization, backward passes, training loops, and how to experiment with recent papers.

Explore the Course →

💼 Research & Engineering Consulting

We help teams integrate 3D Gaussian Splatting techniques, build custom pipelines, and prototype new splatting research. If you need expertise, we can help.

Contact:
contact@qubitanalytics.be