This repository presents the current standings of various web agents evaluated on the WebVoyager benchmark (paper). The WebVoyager benchmark comprises 643 tasks across 15 popular websites, assessing agents' abilities to perform diverse web navigation and interaction tasks.
Steel is an open-source browser API purpose-built for AI agents.
Rank | Model | Organization | WebVoyager Score | Source | Open Source | New | SOTA |
---|---|---|---|---|---|---|---|
1 | Browser Use | Browser Use | 89.1% | Source | Yes | Yes | Yes |
2 | Operator | OpenAI | 87% | Source | No | Yes | |
3 | Kura | Kura | 87% | Source | No | Yes | |
4 | Skyvern 2.0 | Skyvern | 85.85% | Source | Yes | Yes | |
5 | Project Mariner | 83.5% | Source | No | |||
6 | Proxy | Convergence AI | 82% | Source | No | ||
7 | Agent-E | Emergence AI | 73.1% | Source | No | ||
8 | Runner H 0.1 | H Company | 67% | Source | No | ||
9 | WILBUR | Academic Research | 60.6% | Source | No | ||
10 | WebVoyager | Academic Research | 59.1% | Source | Yes | ||
11 | Computer Use | Anthropic | 52% | Source | No |
Notes:
- Open Source: Indicates whether the agent's source code is publicly available.
- New: Denotes recently introduced models.
- SOTA: Signifies models that have achieved state-of-the-art performance.
We encourage contributions to keep this leaderboard up-to-date. If you have information about new models or updated scores, please submit a pull request or open an issue.
This project is licensed under the MIT License.