-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Benchmark #24
[WIP] Benchmark #24
Conversation
To test the model's performance, simply run the experiment and input configurations. The uploaded data is randomly sampled from the paper's 7400+ dev dataset. Note: I have the average steps because I get that from the react agent. But I didn't include this in this version of hotpotqa code. TODO:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge now, later will work on improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets merge for now, will improve later
The previous PR is rather messy. In case it cause trouble for you to use the experiment script, I recreated my branch based on main and make a new PR.