SpectralSplats: Robust Differentiable Tracking via Spectral Moment Supervision

1Technion - Israel Institute of Technology 2NVIDIA
SCENE SETUP
SOURCE
TARGET
PIXEL
OURS
Starting from a SOURCE and a spatially distant TARGET with zero initial overlap, SpectralSplats (OURS) enables robust tracking by shifting supervision from the spatial domain to spectral moments, whereas pixel-only optimization (PIXEL) fails to converge.

Abstract

3D Gaussian Splatting (3DGS) enables real-time, photorealistic novel view synthesis, making it a highly attractive representation for model-based video tracking. However, leveraging the differentiability of the 3DGS renderer "in the wild'" remains notoriously fragile. A fundamental bottleneck lies in the compact, local support of the Gaussian primitives. Standard photometric objectives implicitly rely on spatial overlap; if severe camera misalignment places the rendered object outside the target's local footprint, gradients strictly vanish, leaving the optimizer stranded. We introduce SpectralSplats, a robust tracking framework that resolves this "vanishing gradient" problem by shifting the optimization objective from the spatial to the frequency domain. By supervising the rendered image via a set of global complex sinusoidal features (Spectral Moments), we construct a global basin of attraction, ensuring that a valid, directional gradient toward the target exists across the entire image domain, even when pixel overlap is completely nonexistent. To harness this global basin without introducing periodic local minima associated with high frequencies, we derive a principled Frequency Annealing schedule from first principles, gracefully transitioning the optimizer from global convexity to precise spatial alignment. We demonstrate that SpectralSplats acts as a seamless, drop-in replacement for spatial losses across diverse deformation parameterizations (from MLPs to sparse control points), successfully recovering complex deformations even from severely misaligned initializations where standard appearance-based tracking catastrophically fails.

Breaking the Locality Trap

Standard photometric objectives implicitly rely on spatial overlap between rendered primitives and the target. As illustrated in the 1D optimization analysis below, standard L2 gradients strictly vanish when pulses have no spatial overlap, leaving the optimizer stranded in a "locality trap". While static spectral supervision can provide a non-vanishing signal, high frequencies introduce severe phase-wrapping that creates false local minima.

SpectralSplats (Ours) resolves this by supervising spectral moments through a principled frequency annealing schedule. By restricting initial supervision to low frequencies, we construct a globally convex basin of attraction that provides a valid, directional gradient from any initialization. As the spatial error decreases, our schedule seamlessly expands the active bandwidth to achieve high-frequency spatial precision without phase-wrapping, ensuring robust convergence.

1D optimization analysis

Experimental Results

We evaluate robustness to initial alignment by shifting the initial 3DGS model in random directions with increasing radii, progressively reducing the spatial overlap between the rendering and target supervision. We conduct experiments on the controlled SC4D animations and the challenging, real-world GART Dog dataset, which exhibits noticeable deviations in pose and appearance between the model and supervision even at zero displacement due to inherent reconstruction challenges. We observe that Pixel loss fails to recover the correct pose as the shift increases, often drifting the asset out of frame. In contrast, SpectralSplats (Ours) remains stable and maintains a coherent structure across all displacement levels. These observations are further reflected in our quantitative evaluations of PSNR, SSIM, and LPIPS metrics for both training and novel views.

SC4D Tracking

SpiderMan

0.0
0.2
0.4
0.6
0.8
Pixel
Ours

Astronaut

0.0
0.2
0.4
0.6
0.8
Pixel
Ours

Kitten

0.0
0.2
0.4
0.6
0.8
Pixel
Ours

Aurorus

0.0
0.2
0.4
0.6
0.8
Pixel
Ours
Metric evaluations (PSNR, SSIM, and LPIPS) for training views (top row) and novel side views (bottom row) across the increasing shift radii.
SC4D shift experiment metrics for training and novel views

GART (Dog Show) Tracking

Shiba

0.0
0.2
0.4
0.6
0.8
Pixel
Ours

French

0.0
0.2
0.4
0.6
0.8
Pixel
Ours

English

0.0
0.2
0.4
0.6
0.8
Pixel
Ours

Hound

0.0
0.2
0.4
0.6
0.8
Pixel
Ours
Metric evaluations (PSNR, SSIM, and LPIPS) for training views across the increasing shift radii.
GART shift experiment metrics for training and novel views

Citation

@misc{rimon2026spectralsplatsrobustdifferentiabletracking,
      title={SpectralSplats: Robust Differentiable Tracking via Spectral Moment Supervision}, 
      author={Avigail Cohen Rimon and Amir Mann and Mirela Ben Chen and Or Litany},
      year={2026},
      eprint={2603.24036},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.24036}, 
}