Haidong Zhu1,* | Tianyu Ding2,*,† | Tianyi Chen2 | Ilya Zharkov2 | Ram Nevatia1 | Luming Liang2,† |
1University of Southern California | 2Microsoft |
Novel view synthesis for novel scenes using ONE reference view on Shiny, LLFF, and MVImgNet (top to bottom). Each pair of images corresponds to the results from GNT (left) and CaesarNeRF (right).
Generalizability and few-shot learning are key challenges in Neural Radiance Fields (NeRF), often due to the lack of a holistic understanding in pixel-level rendering. We introduce CaesarNeRF, an end-to-end approach that leverages scene-level CAlibratEd SemAntic Representation along with pixel-level representations to advance few-shot, generalizable neural rendering, facilitating a holistic understanding without compromising high-quality details. CaesarNeRF explicitly models pose differences of reference views to combine scene-level semantic representations, providing a calibrated holistic understanding. This calibration process aligns various viewpoints with precise location and is further enhanced by sequential refinement to capture varying details. Extensive experiments on public datasets, including LLFF, Shiny, mip-NeRF 360, and MVImgNet, show that CaesarNeRF delivers state-of-the-art performance across varying numbers of reference views, proving effective even with a single reference image.
We incoperate holistic scene understanding along with pixel-level rendering for neural radiance field.
IBRNet | GPNR | NeuRay | Ours | Ground-truth |
IBRNet | GPNR | NeuRay | Ours | Ground-truth |
@article{zhu2023caesarnerf,
author = {Zhu, Haidong and Ding, Tianyu and Chen, Tianyi and Zharkov, Ilya and Nevatia, Ram and Liang, Luming},
title = {CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering},
journal = {arXiv preprint arXiv:2311.15510},
year = {2023},
}