-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReasoningAgent benchmarking with SimpleBench #293
base: main
Are you sure you want to change the base?
Conversation
Thanks. How about adding the test into the contrib-openai CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are some small issues (filename mismatch) that the code would not run.
can you please mention if it is for the ReasoningAgent or for the Benchmark? |
I mean, we can add simplebench performance check as an optional CI for reasoning agent. It's only triggered when necessary and requires approval. |
Signed-off-by: Mark Sze <[email protected]>
added. please let me know if i missed anything. thanks! |
It's better than before. An even better approach is to make a separate workflow so that it's not bundled with other contrib-openai tests. |
Signed-off-by: Mark Sze <[email protected]>
Signed-off-by: Mark Sze <[email protected]>
@Hk669 What is the status with this PR? |
This is just an experimental PR, for anyone who wanted to run the simplebench on any agent. should be a good starting point for the benchmark. |
Why are these changes needed?
a draft PR for running the simple bench with ReasoningAgent and this PR is not meant to be merged.
source: https://simple-bench.com/
The benchmark results on the sample data (10 prompts) with the gpt-4o-mini is 20%.
Related issue number
Checks