Automated regression tests #187

jl-wynen · 2024-06-07T08:20:38Z

We need a way to test whether our workflows still produce the accepted 'correct' results after we make some changes. E.g. in scipp/esssans#135 and scipp/esssans#143. However, there are changes that should change the result, such as adding a new correction or tuning a parameter. In Mantid, those accepted results are written to file and loaded to compare them to results from a new version of the code. This needs extra infrastructure to store and provide the files and extra work to update them. Here is a potential alternative.

Have a test script that does this procedure on each PR:

Check out main.
Run tests with a specific mark and save results of those tests to a folder results_main.
Check out the head of the PR branch.
Run tests with the same mark and save results to results_branch.
For each file that exists in both results_main and results_branch, load the file and compare with sc.testing.assert_identical and sc.testing.assert_allclose.

The tests run this way can contain assertions to, e.g., make sure that the result has the expected shape. But the main purpose of these tests is writing data. That data can be any scipp object, e.g., the result of running a workflow.

This procedure would perform regression tests against main which we assume has the accepted 'correct' code. But it does not require storing result files in a public location.

What do you think? Does this make sense?

The text was updated successfully, but these errors were encountered:

SimonHeybrock · 2024-06-10T03:25:49Z

Basically something similar to what ASV does (but for timings instead of results)?

But in general I am not sure this is a viable approach:

You would need a mechanism for changing tests (or manually ignore failing test runs), in case an existing test needs to be updated.
I expect this to break a lot, from minor changes in workflows, leading to too many false positives?

Should this be moved to https://github.com/scipp/ess_template, as it is ESS-specific?

nvaytet · 2024-06-10T06:27:27Z

I expect this to break a lot, from minor changes in workflows, leading to too many false positives?

If we make minor changes in workflow, I think we should know about it. See for example scipp/scippneutron#514.

But yes, if we often make changes that modify the results, we do need a mechanism where we can easily ignore failed tests (or say "I accept this as the new reference solution")

MridulS · 2024-06-10T07:26:45Z

Sounds similar to something like https://github.com/matplotlib/pytest-mpl with adding baseline tests to compare matplotlib plots?

YooSunYoung · 2024-06-14T05:00:33Z

Sounds similar to something like https://github.com/matplotlib/pytest-mpl with adding baseline tests to compare matplotlib plots?

But this one seems like saving plots as a baseline and compare the new results with the existing files.
What JL suggested is more like regression tests between branches so that we don't have to keep those baseline results as files I think...?
And I would like to avoid keeping results as files if possible.

jl-wynen · 2024-06-14T07:33:33Z

What JL suggested is more like regression tests between branches so that we don't have to keep those baseline results as files I think...?

Correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated regression tests #187

Automated regression tests #187

jl-wynen commented Jun 7, 2024

SimonHeybrock commented Jun 10, 2024 •

edited

Loading

nvaytet commented Jun 10, 2024

MridulS commented Jun 10, 2024

YooSunYoung commented Jun 14, 2024

jl-wynen commented Jun 14, 2024

Automated regression tests #187

Automated regression tests #187

Comments

jl-wynen commented Jun 7, 2024

SimonHeybrock commented Jun 10, 2024 • edited Loading

nvaytet commented Jun 10, 2024

MridulS commented Jun 10, 2024

YooSunYoung commented Jun 14, 2024

jl-wynen commented Jun 14, 2024

SimonHeybrock commented Jun 10, 2024 •

edited

Loading