We propose VISTA+, a synthetic data generation pipeline based on single-image novel view synthesis (NVS). The method separates the 3D scene into foreground and background, representing the background as a 3D texture mesh reconstructed after foreground inpainting. CAD assets rendered in Blender generate foreground images, which are then combined with the background to create new images. Inpainting artifacts on non-smooth foreground objects are addressed to enhance the quality of novel view images, particularly for roads. Novel view images are rendered by selecting spatially closest images to the required poses, making the method ideal for scenarios where vehicles repeatedly pass through the same road sections, such as buses and Robotaxis. We validate the quality of the synthesized images and demonstrate the effectiveness of the synthetic data in improving downstream detection tasks on a public dataset.
We give an example of the video generation in the nuScenes dataset.
Move the front view camera 1m & 2m to the left and 1m & 2m to the right.
NVS image in multiview images. After inpainting orig-inal foreground objects, new vehicles are added in novel view.
We propose VISTA+, a synthetic data pipeline based on single image novel view reconstruction, to realize real-time image generation. We evaluate the NVS method on KITTI and nuScense datasets, and for downstream tasks, we evaluate its effect on nuSense. The results show that the NVS performance is very good, and it also has a promoting effect on downstream detection tasks. In our validation, we also found that despite the imperfections of this solution, such as not focusing on reconstructing the whole world and avoiding the missing margin in the invisible areas at the margins of the novel view images. However, it is surprisingly easy and simple to use for downstream inspection tasks, which makes it a bit of an AK47 rifle feeling: simple, robust, and powerful. We hope this scheme can inspire the community.