Skip to content

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Notifications You must be signed in to change notification settings

showlab/VideoGUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

Project Website

📢 News

  • [2024.6] We release the arXiv paper.
  • [2024.9] Accepted by NeurIPS 2024 D&B.
  • [2024.10] We released the data at Huggingface dataset. Please stay tuned for further updates.

📝 TODO

  • Upload the Evaluation code and metric implementation.
  • Upload the Missed metadata.

📖 Introduction

TL;DR: A Multi-modal Benchmark for Visual-centric GUI Automation from Instructional Videos.

overview

Visual-centric softwares and tasks: VideoGUI focuses on professional and novel software like PR and AE for video editing, or Stable Diffusion and Runway for visual creation. Besides, the task query emphasizes visual preview rather than textual instructions.

Instructional videos with human demonstration: We source novel tasks from high-quality instructional videos, with annotators replicating these to reproduce effects.

Hierarchical planning and actions: We provide detailed annotations with planning procedures and recorded actions for hierarchical evaluation.

🔨 Online Environment

If you want to set up the online environment, refer to the tutorial by GUI-Thinker.

🎓 BibTeX

If you find our work helpful, please kindly consider citing our paper. Thank you!

@inproceedings{linvideogui,
  title={VideoGUI: A Benchmark for GUI Automation from Instructional Videos},
  author={Lin, Kevin Qinghong and Li, Linjie and Gao, Difei and Qinchen, WU and Yan, Mingyi and Yang, Zhengyuan and Wang, Lijuan and Shou, Mike Zheng},
  booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track}
}

About

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published