SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image

ICCV 2023

Xiaoyu Zhou1, Zhiwei Lin1, Xiaojun Shan1, Yongtao Wang1, Deqing Sun2, Ming-Hsuan Yang3

1Wangxuan Institute of Computer Technology, Peking Univerisity, 2 Google Research, 3University of California, Merced


Recent novel view synthesis methods obtain promising results for relatively small scenes, e.g., indoor environments and scenes with a few objects, but tend to fail for unbounded outdoor scenes with a single image as input. In this paper, we introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image based on improved multiplane images (MPI). Observing that depth distribution varies significantly for unbounded outdoor scenes, we employ an adaptive-bins strategy for MPI to arrange planes in accordance with each scene image. To represent intricate geometry and multi-scale details, we further introduce a hierarchical refinement branch, which results in high-quality synthesized novel views. Our method demonstrates considerable performance gains in synthesizing large-scale unbounded outdoor scenes using a single image on the KITTI dataset and generalizes well to the unseen Tanks and Temples dataset. The code and models will be made public.

Material selection evaluation on real data

An array of results on the real image dataset for our method and the baselines and ablations are presented on this webpage:

Material selection in videos

Given a on the first frame of, our method can be applied to select the material at the query point in each frame. Note how the selections are robust to lighting variations including specularity and shadows.

First frame with query in red
Input video
Predicted material selection
Output scores

Demo video: Enabling multiple material selection

To empower artists, in an interactive demo, we allow users to select multiple positive (first video) and negative (second video) query points. The resulting score map for positive query points are combined by taking the maximum of the individually predicted similarity scores for each pixel, and thresholded by user defined value in [0, 1]. The predicted scores corresponding to negative samples are combined by computing a per-pixel maximum across all predicted scores of negative regions, and are then thresholded by the user using a separate threshold value. The intersection of the resulting mask with the mask computed using positive query points is removed from the final selection.

Multiple selection

Negative selection

Image editing

Outputs from our method can be used as input for material selection based image editing using Photoshop, Image-Based Material Editing (Khan et al. [2006]), and Stable Diffusion Inpainting [Rombach et al . 2021].

Results on grayscale images

We further evaluate our method on grayscale images and see that if textures are clearly distinct, our method can select the relevant regions despite the lack of color, showing it also considers texture to make its selection.

Selection consistency

The method produces consistent segmentation masks for different pixel selections within image regions that belong to the same material. In the query image, we show 5 different pixel selections (marked in different colors) with the resulting masks overlayed with the respective color.

Robustness to lighting variation

Our method is robust to lighting variations including specularity and shadows. The first row shows all the input images. There is a query selected in the first image with a selection marked with a red square. The selected pixel is at the center of the red square. The query embedding at the selection at the red square is used to select materials in subsequent images. The results show the robustness of our method to different lighting scenarios.

We also present the same experiment to demonstrate the robustness of our method to different lighting scenarios using the Multi-Illumination Dataset by Murmann et al.

Analysis: Albedo analysis

To analyze the behavior of the model with respect to changing albedo, we change the hue and saturation value on a diffuse sphere. The hue is sampled in range [0, 2*pi] and saturation is sampled in [0, 1]. We select the central pixel on a grid of spheres with varying albedo. The scores are thresholded at 0.5 which results in selections of regions in spheres with neighboring albedo. As expected, our model selects sphere with colors closer to the selected one first. As we vary the threshold, the selection becomes limited to the central sphere (higher threshold > 0.9), or extends to further spheres (lower threshold).

Analysis: Blending between materials

We now evaluate the sensitivity of our model to gradual material changes. To do so we render nine spheres covered by a blend of two different materials, a stone wall and roof tiles. We use the Blender "shader mix" node to interpolate between the SVBRDFs and apply a different mixing factor for each sphere, this mixing is shown below. Where 1.0 means 100% wall and 0.0 means 100% tiles.

We then proceed to select similar materials based on a query pixel on each extreme case. As we can see our method selects materials that are close to the query but not exactly similar and discriminates well when the mix of materials is visually far from the query.