Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReasoningAgent benchmarking with SimpleBench #293

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

Hk669
Copy link
Collaborator

@Hk669 Hk669 commented Dec 26, 2024

Why are these changes needed?

a draft PR for running the simple bench with ReasoningAgent and this PR is not meant to be merged.
source: https://simple-bench.com/

The benchmark results on the sample data (10 prompts) with the gpt-4o-mini is 20%.

Related issue number

Checks

@sonichi
Copy link
Collaborator

sonichi commented Dec 26, 2024

Thanks. How about adding the test into the contrib-openai CI?

Copy link
Collaborator

@BabyCNM BabyCNM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are some small issues (filename mismatch) that the code would not run.

@Hk669 Hk669 marked this pull request as ready for review January 1, 2025 10:55
@Hk669
Copy link
Collaborator Author

Hk669 commented Jan 1, 2025

Thanks. How about adding the test into the contrib-openai CI?

can you please mention if it is for the ReasoningAgent or for the Benchmark?
fyi: the ci tests for the reasoningagent are under process in the PR #294

@Hk669 Hk669 requested a review from BabyCNM January 1, 2025 10:58
@sonichi
Copy link
Collaborator

sonichi commented Jan 1, 2025

Thanks. How about adding the test into the contrib-openai CI?

can you please mention if it is for the ReasoningAgent or for the Benchmark? fyi: the ci tests for the reasoningagent are under process in the PR #294

I mean, we can add simplebench performance check as an optional CI for reasoning agent. It's only triggered when necessary and requires approval.

@Hk669
Copy link
Collaborator Author

Hk669 commented Jan 2, 2025

Thanks. How about adding the test into the contrib-openai CI?

added. please let me know if i missed anything. thanks!
cc @sonichi

@Hk669 Hk669 mentioned this pull request Jan 2, 2025
17 tasks
@sonichi
Copy link
Collaborator

sonichi commented Jan 2, 2025

Thanks. How about adding the test into the contrib-openai CI?

added. please let me know if i missed anything. thanks! cc @sonichi

It's better than before. An even better approach is to make a separate workflow so that it's not bundled with other contrib-openai tests.
@marklysze @BabyCNM @qingyun-wu what do you think is a good balance between convenience and cost control?

@davorrunje davorrunje self-requested a review January 10, 2025 15:04
@davorrunje
Copy link
Collaborator

@Hk669 What is the status with this PR?

@Hk669
Copy link
Collaborator Author

Hk669 commented Feb 13, 2025

This is just an experimental PR, for anyone who wanted to run the simplebench on any agent.

should be a good starting point for the benchmark.

@CLAassistant
Copy link

CLAassistant commented Feb 26, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants