Skip to content
/ GenNBV Public

[CVPR 2024] GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

License

Notifications You must be signed in to change notification settings

zjwzcx/GenNBV

Repository files navigation


GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

CVPR 2024
Xiao ChenQuanyi LiTai WangTianfan XueJiangmiao Pang
Shanghai AI Laboratory The Chinese University of Hong Kong

arXiv

📋 Contents

  1. About
  2. Getting Started
  3. Model and Benchmark
  4. Citation
  5. License

🏠 About

Dialogue_Teaser
While recent advances in neural radiance field enable realistic digitization for large-scale scenes, the image-capturing process is still time-consuming and labor-intensive. Previous works attempt to automate this process using the Next-Best-View (NBV) policy for active 3D reconstruction. However, the existing NBV policies heavily rely on hand-crafted criteria, limited action space, or per-scene optimized representations. These constraints limit their cross-dataset generalizability. To overcome them, we propose GenNBV, an end-to-end generalizable NBV policy. Our policy adopts a reinforcement learning (RL)-based framework and extends typical limited action space to 5D free space. It empowers our agent drone to scan from any viewpoint, and even interact with unseen geometries during training. To boost the cross-dataset generalizability, we also propose a novel multi-source state embedding, including geometric, semantic, and action representations. We establish a benchmark using the Isaac Gym simulator with the Houses3K and OmniObject3D datasets to evaluate this NBV policy. Experiments demonstrate that our policy achieves a 98.26% and 97.12% coverage ratio on unseen building-scale objects from these datasets, respectively, outperforming prior solutions.

📚 Getting Started

Installation

We test our codes under the following environment:

  • Ubuntu 20.04
  • NVIDIA Driver: 545.29.02
  • CUDA 11.3
  • Python 3.8.12
  • PyTorch 1.11.0+cu113
  • PyTorch3D 0.7.5
  1. Clone this repository.
git clone https://github.com/zjwzcx/GenNBV
cd GenNBV
  1. Create an environment and install PyTorch.
conda create -n gennbv python=3.8 -y  # pytorch3d needs python>3.7
conda activate gennbv
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
  1. NVIDIA Isaac Gym Installation: https://developer.nvidia.com/isaac-gym/download
cd isaacgym/python
pip install -e .
  1. Install GenNBV.
pip install -r requirements.txt
pip install -e .

Data Preparation

We provide all the preprocessed data used in our work, including mesh files and ground-truth surface points. We recommend users download the data from our provided Google Drive link [HERE].

The directory structure should be as follows.

gennbv
├── active_reconstruction
├── data_gennbv
│   ├── houses3k
│   │   ├── gt
│   │   ├── obj
│   │   ├── urdf
│   ├── omniobject3d
│   ├── ...

Training

Please run the following command to reproduce the training setting of GenNBV:

python active_reconstruction/train/train_gennbv_houses3k.py --sim_device=cuda:0 --num_envs=256 --stop_wandb=True

Weights & Bias (wandb) are recommended to analyze the training logs. If you want to use wandb in our codebase, please paste your wandb API key into wandb_utils/wandb_api_key_file.txt. And then you need to run the following command to launch training:

python active_reconstruction/train/train_gennbv_houses3k.py --sim_device=cuda:0 --num_envs=256 --stop_wandb=False

Customized Training Environments

If you want customize a novel training environment, you need to create your environment and configuration files in active_reconstruction/env and then define the task in active_reconstruction/__init__.py.

📝 TODO List

  • Release the paper and training code.
  • Release preprocessed dataset.
  • Release the evaluation scripts.

📦 Model and Benchmark

Model Overview

Benchmark Overview

🔗 Citation

If you find our work helpful, please cite it:

@inproceedings{chen2024gennbv,
  title={GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction},
  author={Chen, Xiao and Li, Quanyi and Wang, Tai and Xue, Tianfan and Pang, Jiangmiao},
  year={2024}
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
}

If you use the preprocessed dataset such as Houses3K and OmniObject3D, please cite them:

@inproceedings{peralta2020next,
  title={Next-best view policy for 3d reconstruction},
  author={Peralta, Daryl and Casimiro, Joel and Nilles, Aldrin Michael and Aguilar, Justine Aletta and Atienza, Rowel and Cajote, Rhandley},
  booktitle={Computer Vision--ECCV 2020 Workshops: Glasgow, UK, August 23--28, 2020, Proceedings, Part IV 16},
  pages={558--573},
  year={2020},
  organization={Springer}
}
@inproceedings{wu2023omniobject3d,
  title={Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation},
  author={Wu, Tong and Zhang, Jiarui and Fu, Xiao and Wang, Yuxin and Ren, Jiawei and Pan, Liang and Wu, Wayne and Yang, Lei and Wang, Jiaqi and Qian, Chen and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={803--814},
  year={2023}
}

We're very grateful to the codebase of Legged Gym (https://github.com/leggedrobotics/legged_gym).

📄 License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

[CVPR 2024] GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages