VisDom: Sparse Novel View Synthesis
with Visible Domain Constraint

Mariia Gladkova*1,2Tarun Yenamandra*1,2Edmond BoyerRobert MaierTony TungDaniel Cremers1,2

1TU Munich  ·  2MCML  ·  * equal contribution

Train images ZipNeRF 3DGS-GO ZipNeRF + VisDom 3DGS-GO + VisDom Ground truth
VisDom teaser — bonsai 4 views
VisDom teaser — guitar 4 views

VisDom enables high-quality reconstruction from as few as 4 input views.
Without VisDom, general-purpose methods fail catastrophically at this sparsity; our constraint recovers photorealistic results.

Abstract

Sparse novel view synthesis (NVS) remains challenging due to the ambiguity of recovering 3D geometry from few input views. While NeRF- and Gaussian Splatting (GS)-based methods perform well with dense supervision, they often overfit in sparse settings, producing floating artifacts and inconsistent geometry. Silhouette consistency is commonly used as a regularizer, but it remains insufficient, as silhouette-consistent regions can extend beyond the true object geometry.

We introduce VisDom, a learning-free geometric constraint that augments classical carving-based visual hull reconstruction by enforcing a minimum multi-view visibility requirement. Specifically, we define a visible domain as the subset of 3D space observed by at least K views and use it as an additional filtering criterion on top of standard silhouette-based reconstruction. This provides a stronger spatial prior in sparse-view settings.

We integrate VisDom into both implicit (NeRF) and explicit (GS) pipelines by restricting volumetric sampling and guiding Gaussian placement during optimization. Experiments on three challenging datasets show consistent improvements in sparse-view NVS, enabling high-quality object-centric reconstruction from as few as four input images. Our method is domain-agnostic, requires only silhouettes, and introduces no learned parameters, making it a simple complement to existing approaches. Applying VisDom on top of GaussianObject further improves performance on Omni3D and MipNeRF360, while matching or surpassing it at 22× lower training cost.

Key Contributions

~90%
PSNR gain at 4 views
(ZipNeRF on MipNeRF360)
22×
faster training than
GaussianObject
2 s
preprocessing time,
zero learned parameters
5
NVS frameworks
improved
🔷

Visible Domain Constraint

Augments the classical visual hull by retaining only 3D regions observed by at least K cameras — removing ambiguous space that silhouettes alone cannot resolve.

🔗

Plug-and-play Integration

Integrates into any NeRF (via ray-sampling bounds) or 3DGS pipeline (via Gaussian placement and interpolated-view supervision) with a single modification.

📐

Learning-free

Zero learned parameters, no domain-specific training data, and no generative priors — relies solely on silhouettes extracted by off-the-shelf segmentation models.

Method

The Problem with Silhouette-only Constraints

NeRF and 3DGS allocate density or place Gaussians without global geometric regularization, leading to floaters and structural inconsistencies in sparse settings. Adding a silhouette loss only carves out space excluded by silhouettes — but with few views the resulting visual hull is enormous, providing weak guidance and sometimes degrading performance (ZipNeRF+mask drops from 12.44 to 11.95 dB at 4 views on MipNeRF360).

Visual Hull in 2D
Visual Hull in 3D (top view)
Capture Setup
Traditional
VisDom
Posed silhouettes
Traditional
Unit sphere
VisDom
Visual hull comparison: traditional vs VisDom
Given 2 silhouettes, the traditional visual hull retains large ambiguous regions. VisDom's visible domain constraint removes regions visible from fewer than K cameras, yielding a tighter, more reliable shape. Top-view 3D comparison shows that unit-sphere clipping assumes object centering, while VisDom is data-driven and position-agnostic.

Visible Domain — The Core Idea

We define the visible domain as the subset of 3D space jointly observed by at least K cameras. This is applied on top of standard voxel carving: a voxel is retained only if (i) its occupancy votes exceed 95% of its visibility votes (standard hull) and (ii) it is visible from at least K cameras (our constraint). This removes weakly constrained voxels that are a primary source of spurious density in sparse reconstruction.

Camera setup
Visible domains
K = 1
K = 2
K = 3
Visible domain for K=1,2,3
Visible domain (purple) for K = 1, 2, 3 in a 3-camera setup. Larger K narrows the covered region, leading to more precise shapes.

Integration into NeRF and Gaussian Splatting

NeRF Integration

The VisDom visual hull is used to restrict each ray's sampling range [tn, tf] by intersecting it with the visual hull mesh. Rays only sample within the tightly constrained region, preventing density accumulation in ambiguous space. Combined with a silhouette loss (λ=0.1), this turns previously unusable methods into competitive ones.

Gaussian Splatting Integration

For 3DGS, we (1) initialize Gaussians from the VisDom visual hull mesh instead of COLMAP, and (2) penalize Gaussians that appear opaque outside the visual hull when rendered from interpolated camera views. This removes ghost-like floaters and confines the reconstruction to the visible domain.

Ablation: Choice of K

We ablate K ∈ {1, 2, 3, 4} on MipNeRF360. K=1 degrades to near-vanilla (unconstrained hull). K=2 eliminates most ambiguous space. K=3 achieves the best mean for ZipNeRF (25.99 dB) and is chosen as the default — it balances hull tightness against the risk of over-carving surface regions visible from only a few cameras at the hardest 4-view setting.

Views ZipNeRF + VD (PSNR ↑) 3DGS-GO + VD (PSNR ↑)
K=1K=2K=3K=4 K=1K=2K=3K=4
4 views13.5522.7124.1023.7913.2323.9824.0624.00
6 views16.2121.4725.8022.0418.2327.1726.7226.56
9 views18.6628.0628.0628.7124.1629.0628.4528.54
Mean16.1424.0825.9924.8418.5426.7426.4126.36

Ablation of K on MipNeRF360. K=3 provides the best mean across all view counts for ZipNeRF; K=2 peaks for 3DGS-GO but leaves residual floaters. We use K=3 as default.

Results

VisDom on General-purpose NeRF & 3DGS Methods

We evaluate vanilla, +mask (silhouette loss only), and +VisDom on two NeRF methods and one 3DGS pipeline. This isolates VisDom's effect from any sparse-specific inductive biases.

Dataset Views Instant-NGP ZipNeRF 3DGS-GO
Vanilla+mask+VD Vanilla+mask+VD Vanilla*+VD
MipNeRF360 413.6713.7322.15 12.4411.9524.10 23.6124.06
614.7317.3624.14 11.8912.7825.80 26.3026.72
916.0420.7425.43 14.8519.8728.06 27.9328.45
Omni3D 418.8818.1627.44 15.0415.0229.49 29.8030.32
619.9522.0129.67 16.6120.0032.28 33.0933.37
920.8622.3131.06 23.1625.3035.21 35.4935.69
ActorsHQ 512.0113.8723.53 10.8510.3324.55 25.1225.69
813.8614.9623.35 11.4310.7626.72 27.4427.99
1220.9824.8625.67 11.3228.3828.61 28.8029.13

PSNR ↑ on three datasets. Green = best per-method variant. +VD consistently wins over Vanilla and +mask. *Vanilla = GaussianObject initialization stage only.

Comparison Against Sparse-specific Methods

VisDom is applied to CoR-GS and GaussianObject (GO) and benchmarked against state-of-the-art sparse NVS methods. CoR-GS+VD leads on MipNeRF360; 3DGS+VD leads on ActorsHQ. GO+VD achieves the best Omni3D mean. Importantly, 3DGS+VD trains in just 2 minutes per scene — 22× faster than GO — while remaining competitive.

Dataset Cams VaxNeRF ZeroRF SplatFields FSGS CoR-GS GO INGP+VD ZipNeRF+VD 3DGS+VD CoR-GS+VD GO+VD
Train time → 1h 14m 10m 45m 20m 40m 2m ⚡ 10m 45m
MipNeRF360 418.1414.1722.2423.3824.0424.02 22.1524.1024.0624.6424.16
620.3924.1424.5726.1725.8126.23 24.1425.8026.7227.3526.50
921.5327.7826.5828.1628.5827.94 25.4328.0628.4529.3228.14
Mean20.0222.0324.4625.9126.1426.06 23.9125.9926.4127.1026.27
Omni3D 418.3527.7828.4927.3128.9730.37 27.4429.4930.3229.8230.71
619.6031.9432.0529.7432.5133.26 29.6732.2833.3732.8733.29
920.9132.9334.6633.4634.9435.56 31.0635.2135.6935.5635.59
Mean19.6230.8831.7330.1732.1433.06 29.3932.3333.1232.7533.20
ActorsHQ 513.1425.1322.1624.4424.9424.91 23.5324.5525.6925.2724.85
814.5126.4724.6726.4826.9326.98 23.3526.7227.9927.0626.87
1215.2727.5926.9027.9628.2128.17 25.6728.6129.1328.2228.10
Mean14.3126.4024.5826.2926.6926.69 24.1826.6327.6026.8526.61

PSNR ↑ comparison. Green = best, Blue = second best per row. Lavender-shaded columns = our VisDom variants. VaxNeRF and ZeroRF both train in 2h (shown as a grouped label).

Qualitative Results

Visual Hull Improvement with the Visible Domain Constraint

As the minimum visibility threshold K increases, the visible domain tightens, carving out more ambiguous space. The traditional visual hull (K=1) retains large, poorly-constrained regions; VisDom at K=3 yields a compact, reliable shape.

Camera setup
Camera setup (shared)
K = 1
K = 2
K = 3
Visible
Domain
Visible domain K=1
Visible domain K=2
Visible domain K=3
Visual
Hull
Visual hull K=1 (traditional)
K=1 (traditional)
Visual hull K=2
K = 2
Visual hull K=3 (VisDom)
VisDom (K=3)

Top row: the visible domain (region jointly observed by at least K cameras) shrinks as K increases. Bottom row: the resulting visual hull becomes progressively tighter and more accurate. The traditional hull (K=1) is overly permissive; VisDom at K=3 carves out all ambiguous space.

ZipNeRF: Silhouette Mask vs. VisDom Constraint

Adding a silhouette mask loss alone is insufficient — and sometimes harmful — at low camera counts. VisDom restricts ray sampling to the jointly-visible region, enabling faithful reconstruction from very few views.

5 cameras
+ Mask only
ZipNeRF +mask, 5 cams
+ VisDom
ZipNeRF +VD, 5 cams
8 cameras
+ Mask only
ZipNeRF +mask, 8 cams
+ VisDom
ZipNeRF +VD, 8 cams
8 cameras
+ Mask only
ZipNeRF +mask, 8 cams #2
+ VisDom
ZipNeRF +VD, 8 cams #2
12 cameras
+ Mask only
ZipNeRF +mask, 12 cams
+ VisDom
ZipNeRF +VD, 12 cams
12 cameras
+ Mask only
ZipNeRF +mask, 12 cams #2
+ VisDom
ZipNeRF +VD, 12 cams #2

Each pair shows a 360° render from ZipNeRF trained with silhouette mask only (left) vs. with VisDom (right). Floaters and geometry collapse are eliminated by VisDom across all camera counts.

3DGS-GO: Vanilla vs. VisDom Constraint

VisDom guides Gaussian placement during optimization, suppressing floaters that 3DGS-GO places in ambiguous free space. The improvement is most pronounced at 4 cameras and remains consistent as coverage increases.

4 cameras
3DGS-GO
3DGS-GO + VisDom
6 cameras
3DGS-GO
3DGS-GO + VisDom
9 cameras
3DGS-GO
3DGS-GO + VisDom

360° renders from 3DGS-GO (left) vs. 3DGS-GO + VisDom (right) at 4, 6, and 9 input cameras. VisDom removes Gaussian floaters by restricting placement to the jointly-visible domain.

Conclusion & Limitations

We presented VisDom, a learning-free geometric constraint that tightens the classical visual hull by restricting reconstruction to the region jointly visible in at least K views. A key finding is that silhouette supervision alone is insufficient at extreme sparsity — and can actively harm convergence — because the resulting visual hull is too large. VisDom resolves this by enforcing K-view co-visibility, removing ambiguous volume that silhouettes cannot resolve. The constraint adds only a 2-second preprocessing step and zero learned parameters, and integrates with any NeRF or 3DGS pipeline via a single modification.

Across five reconstruction frameworks and three real-world datasets, VisDom consistently improves quality — enabling general-purpose methods that completely fail without it (up to ~90% PSNR gain at 4 views), advancing sparse reconstruction models, and delivering competitive results without any additional learned parameters.

Limitations. In scenes with strong inter-view lighting variation (e.g., MipNeRF360), geometric constraints alone are insufficient without a generative prior. Below 4 views, silhouettes become too sparse for faithful reconstruction; combining VisDom with pre-trained models in this extreme regime is a natural future direction.

Citation

@misc{gladkova2026visdomsparsenovelview, title={VisDom: Sparse Novel View Synthesis with Visible Domain Constraint}, author={Mariia Gladkova* and Tarun Yenamandra* and Edmond Boyer and Robert Maier and Tony Tung and Daniel Cremers}, year={2026}, eprint={2606.20531}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2606.20531}, }