Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Benchmark #24

Merged
merged 11 commits into from
Jun 6, 2024
Merged

[WIP] Benchmark #24

merged 11 commits into from
Jun 6, 2024

Conversation

Alleria1809
Copy link
Contributor

The previous PR is rather messy. In case it cause trouble for you to use the experiment script, I recreated my branch based on main and make a new PR.

@Alleria1809 Alleria1809 changed the title Xiaoyi benchmark [WIP] Benchmark May 24, 2024
@Alleria1809
Copy link
Contributor Author

Alleria1809 commented May 24, 2024

To test the model's performance, simply run the experiment and input configurations.

The uploaded data is randomly sampled from the paper's 7400+ dev dataset.
I tested the paper also on this dataset.

Note:
In the bottom performance section,
random 10 means the first 10 data from the uploaded hotpot_dev_v1_simplified_random_100.json.
first 10 records in the dataset means the first 10 records from the original 7400+ dev dataset.

I have the average steps because I get that from the react agent. But I didn't include this in this version of hotpotqa code.

TODO:

  • Test on other models once the agent is updated
  • Think about optimization

Copy link
Member

@liyin2015 liyin2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge now, later will work on improvement

Copy link
Member

@liyin2015 liyin2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets merge for now, will improve later

@Alleria1809 Alleria1809 merged commit 6c12209 into main Jun 6, 2024
2 checks passed
@Alleria1809 Alleria1809 deleted the xiaoyi_benchmark branch July 2, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants