[WIP] Benchmark #24

Alleria1809 · 2024-05-24T23:04:00Z

The previous PR is rather messy. In case it cause trouble for you to use the experiment script, I recreated my branch based on main and make a new PR.

Alleria1809 · 2024-05-24T23:21:36Z

To test the model's performance, simply run the experiment and input configurations.

The uploaded data is randomly sampled from the paper's 7400+ dev dataset.
I tested the paper also on this dataset.

Note:
In the bottom performance section,
random 10 means the first 10 data from the uploaded hotpot_dev_v1_simplified_random_100.json.
first 10 records in the dataset means the first 10 records from the original 7400+ dev dataset.

I have the average steps because I get that from the react agent. But I didn't include this in this version of hotpotqa code.

TODO:

Test on other models once the agent is updated
Think about optimization

liyin2015

merge now, later will work on improvement

liyin2015

lets merge for now, will improve later

Alleria1809 added 3 commits May 24, 2024 15:56

random sampled 100 questions from the paper's related dev data

9d51d60

tranformed tools from paper's code

7738d98

script to test the hotpotqa performance

1b863b9

Alleria1809 changed the title ~~Xiaoyi benchmark~~ [WIP] Benchmark May 24, 2024

Alleria1809 added 8 commits May 24, 2024 16:25

update the performance description section

f3d4a9a

add fever benchmark

74e3232

add benchmark data--fever

7162ea5

adjust the tools string

b41fa5e

Merge remote-tracking branch 'origin/main' into xiaoyi_benchmark

1ed75d7

move benchmark(react) to benchmarks and update code accordingly

b9ee779

move the react benchmarks

4b56cc3

move the react benchmarks

e9bf6b2

liyin2015 approved these changes Jun 6, 2024

View reviewed changes

Alleria1809 merged commit 6c12209 into main Jun 6, 2024
2 checks passed

Alleria1809 deleted the xiaoyi_benchmark branch July 2, 2024 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Benchmark #24

[WIP] Benchmark #24

Alleria1809 commented May 24, 2024

Alleria1809 commented May 24, 2024 •

edited

Loading

liyin2015 left a comment

liyin2015 left a comment

[WIP] Benchmark #24

[WIP] Benchmark #24

Conversation

Alleria1809 commented May 24, 2024

Alleria1809 commented May 24, 2024 • edited Loading

liyin2015 left a comment

Choose a reason for hiding this comment

liyin2015 left a comment

Choose a reason for hiding this comment

Alleria1809 commented May 24, 2024 •

edited

Loading