-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README.md #785
base: master
Are you sure you want to change the base?
Update README.md #785
Conversation
added some tips to set the hyper-parameters from a recent experience running PySR with multiple datasets. Feel free to make editions or reject the pull request if you think it is not appropriate for the README
Maybe this should go on the tuning page instead? https://ai.damtp.cam.ac.uk/pysr/tuning/ |
juliapkg.require_julia("~1.10") | ||
``` | ||
|
||
2. Another memory issue can happen if using a large enough `maxsize` parameter **(Miles: is there any explanation for that?)**. Since the main usage of PySR is for the discovery of scientific equations, this value should be set small anyway. Anything beyond $50$ seems to create a significant slowdown and memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything beyond
$50$ seems to create a significant slowdown and memory usage.
I think we can turn this off now. It was basically only there because some beginners were running with like 10,000 maxsize, so I wanted it to warn them 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Another memory issue can happen if using a large enough `maxsize` parameter **(Miles: is there any explanation for that?)**. Since the main usage of PySR is for the discovery of scientific equations, this value should be set small anyway. Anything beyond $50$ seems to create a significant slowdown and memory usage. | ||
3. Using a single population often makes the algorithm unstable, with a high variance on the results. A good enough starting value for this parameter is $10$. | ||
4. It can be a good practice to set `optimizer_nrestarts` to something larger than $1$, depending on the computational budget. The minimization of error for nonlinear regression models is multimodal and multiple restarts may be required to assess the quality of the equation. | ||
5. The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front:
You can do model.model_selection = "accuracy"
for this btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it didn't work for me I don't know why. But I can rerun some experiments to see if it was related to using a single population instead of multiple islands. I'll let you know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, just tested here again. The instability issue was due to using populations=1
, the model_selection = "accuracy"
works.
|
||
### Tips | ||
|
||
1. When running PySR with `julia 1.11`, the process seems to run into a memory leak bug that halts the execution by exceeding the available memory. Forcing the version `1.10` can help avoiding such problem: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This memory leak is a Julia bug and will be fixed once 1.11.3 is released (as well as 1.10.8 hopefully) – JuliaLang/julia#56801. So we probably don't need to provide this guidance as it will only be temporary until the new Julia is out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then maybe put as a highlighted issue (same as the "have you used PySR in your paper") while this is not fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
I think it should be more appropriate there, yes. |
It's linked in the last sentence of the Quickstart: https://github.com/MilesCranmer/PySR?tab=readme-ov-file#quickstart
But I guess people miss this. How should it be made more prominent? Maybe a special "docs/tuning" badge at the top of the README or something? |
yes!!! |
added a tips section to the README on how to set the hyper-parameters from a recent experience running PySR with multiple datasets. Feel free to make editions or reject the pull request if you think it is not appropriate for the README