Diffusion-Guided Relighting for Single-Image SVBRDF Estimation
Proceedings of SIGGRAPH Asia 2025
-
Youxin Xing
Shandong University
-
Zheng Zeng
University of California, Santa Barbara
-
Youyang Du
Shandong University
-
Lu Wang✝
Shandong University
-
Beibei Wang✝
Nanjing University
Abstract
Recovering high-fidelity spatially varying bidirectional reflectance distribution function (SVBRDF) maps from a single image remains an ill-posed and challenging problem, especially in the presence of saturated highlights. Existing methods often fail to reconstruct the underlying texture in regions overwhelmed by intense specular reflections. This kind of bake-in artifacts caused by highlight corruption can be greatly alleviated by providing a series of material images under different lighting conditions. To this end, our key insight is to leverage the strong priors of diffusion models to generate images of the same material under varying lighting conditions. These generated images are then used to aid a multi-image SVBRDF estimator in recovering highlight-free reflectance maps. However, strong highlights in the input image lead to inconsistencies across the relighting results. Moreover, texture reconstruction becomes unstable in saturated regions, with variations in background structure, specular shape, and overall material color. These artifacts degrade the quality of SVBRDF recovery. To address this issue, we propose a shuffle-based background consistency module that extracts stable background features and implicitly identifies saturated regions. This guides the diffusion model to generate coherent content while preserving material structures and details. Furthermore, to stabilize the appearance of generated highlights, we introduce a lightweight specular prior encoder that estimates highlight features and then performs grid-based latent feature translation, injecting consistent specular contour priors while preserving material color fidelity. Both quantitative analysis and qualitative visualization demonstrate that our method enables stable neural relighting from a single image and can be seamlessly integrated into multi-input SVBRDF networks to estimate highlight-free reflectance maps.
Pipeline
Overview of our training and inference pipeline. Training: Given a batch of input images, we extract stable background features using the proposed (b) shuffle-based background consistency module (Section 4.1). In parallel, we use (c) a specular prior reuse strategy to translate highlight features 𝑙0 (encoded from the input 𝐼0) to novel lighting positions (Section 4.2). These two types of features are fused via a channel attention mechanism (AFS [Guo et al. 2021]) and injected into ControlNet [Zhang et al. 2023], providing structured and disentangled guidance for diffusion-based generation. Additionally, explicit information about the light vector, view vector, and their half vector is embedded into the cross-attention layers of the diffusion model using (d) an IP-Adapter mechanism [Ye et al. 2023], enabling precise and controllable illumination conditioning. Inference: Our method starts from a single input image 𝐼. Under different light and view positions, we use (a) a consistency material encoder without feature shuffle to extract features. These features guide the diffusion model to generate diverse neural material relighting results. By integrating with (e) a multi-input SVBRDF estimator [Luo et al. 2024b], it enables the reconstruction of high-quality SVBRDF maps.
Architecture of our proposed shuffle-based background consistency module. This figure illustrates the full pipeline for extracting stable background features (left), which consists of the HA-Branch and MD-Branch. It also depicts the intra-batch shuffling process used during training only to improve learning capacity and enforce consistency across features extracted from different lighting conditions (middle). The right part shows the details of the MD convolution. Here, ⊗ denotes element-wise multiplication. IN indicates instance normalization. X𝑙 , X𝑙 +1 represent the input and output of the 𝑙-th layer, respectively.
Comparison on Neural Material Relighting
Visual comparison of neural material relighting on real and synthetic datasets. We compare our method against Bieron et al. [2023] under the consistent view/lighting settings during training. For synthetic data, LPIPS scores are reported below each image, with the lowest values highlighted in bold.
Improvements in Single-Image SVBRDF Recovery
Visual comparison of SVBRDFs and re-rendered results between our method, MatFusion (Sartor and Peers [2023]), and LGD (Luo et al. [2024a]) on real captured data. Our method effectively avoids specular bake-in artifacts in the SVBRDFs and produces clean renderings under novel lighting conditions.
Visual comparison of SVBRDFs and re-rendered results on real and synthetic datasets. We compare our method with Bieron et al. [2023] under the consistent view/lighting settings during training. For synthetic data, LPIPS scores are reported below each image, with the lowest and second lowest values highlighted in bold and underlined.
Ablation Study
Visual comparison of our full method and ablated variants for neural relighting.
Citation
Acknowledgements
The website template was borrowed from BakedSDF.