GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation

1Nankai University 2ByteDance Inc. 3Renmin University of China
Corresponding author
TL;DR: GeoWorld can obtain full-frame high-quality geometric features, effectively unlocking the potential of the geometry model and enhancing both geometric consistency and visual clarity.

GeoWorld can synthesize novel-view images from arbitrary camera trajectories, which can then be used to reconstruct high-quality 3D scenes. Even under large viewpoint changes, GeoWorld is capable of producing high-quality results.

Video Results under Different Trajectories

Comparisons to other methods

DimensionX See3D FlexWorld GeoWorld (ours)
Novel view synthesis
3DGS render

Abstract

Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content. In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models and present our GeoWorld. Instead of exploiting geometric information obtained from a single-frame input, we propose to first generate consecutive video frames and then take advantage of the geometry model to provide full-frame geometry features, which contain richer information than single-frame depth maps or camera embeddings used in previous methods, and use these geometry features as geometrical conditions to aid the video generation model. To enhance the consistency of geometric structures, we further propose a geometry alignment loss to provide the model with real-world geometric constraints and a geometry adaptation module to ensure the effective utilization of geometry features. Extensive experiments show that our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively.

overview image.

Methods

GeoWorld employs a pipeline that differs from previous methods. It utilizes a geometry condition generation procedure to obtain condition views, which are then used to extract rich geometric information. Combined with the geometric loss and the geometry adaptation module, GeoWorld unlocks the potential of the geometry model, enabling the generation of results with clear geometry and sharp visual content.

pipeline image.

Image Visual results

Our GeoWorld is capable of producing high-quality videos under various camera trajectories.
Thanks to the reliable geometric consistency across frames, the 3DGS renderings also show high-quality visual results.

BibTeX