-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add criterion based benchmarks #356
base: main
Are you sure you want to change the base?
Conversation
Vec, nalgebra, ndarray
this benchmark has been moved to particleswarm in order to group benchmark the backends
which is already included in lbfgs bench
I'm not sure the Vec benchmark is doing work, it runs on 500 nanoseconds vs ndarray which runs on ~100 microseconds
This was the reason why the Vec version of LBFGS was running on 500 ns (it just aborted)
this eases investigations using flamegraph
Hi @jonboh, I am very sorry for the very very late reply. I have been rather busy over the last couple of months and currently only have limited access to the Internet (#360). This is an excellent addition and I'm highly thankful for all the work you have put into this. Unfortunately I don't have time right now to give this PR the attention it deserves. I do hope that I will be able to respond adequately and give a detailed review soon. Thanks a lot!! |
Nothing to worry about, whenever you are ready we can continue with this PR |
Hi @jonboh , thanks again for the amazing work! :)
This sounds like a good approach!
Ah yes, the old "Search direction must be a descent direction" error :( I believe this is due to numerical instabilities when the gradient vanishes. This problem definitely needs to be investigated (but ideally not as part of this PR).
I've had a look at these benchmarks. I was able to identify a couple of issues and I have a few ideas of what may be going on. Regarding ParticleSwarm: This one doesn't really use any linear algebra, and as such I would expect all backends to perform similar. The only really computationally challenging part is probably sorting the population (particularly given that there are only two parameters to optimize). The fact that the populations are randomly initialized doesn't make benchmarking easier. It should be possible to provide the same initial population for all runs via Regarding LBFGS: Here I've also identified a couple of problems. Firstly, in case of ndarray, the parameter vectors are transformed to let solver = LBFGS::new(linesearch, m)
.with_tolerance_grad(0.0)?
.with_tolerance_cost(0.0)?; However, this means the solver will continue even if the gradient vanishes, leading to the "Search direction must be ...." error. I was able to run it without errors when reducing the number of iterations to 10. I'm not sure how to solve this properly. I suspect that a different test function might be better. Rosenbrock is known for its long and flat valley where most solvers get stuck/progress slowly. Moving the initial parameter further away from the optimum (in order to allow for more iterations) does not help either in my experience, because most solvers are very quick in finding the valley. Here are my results: (Note that my machine is quite old and I had other programs running)
Thanks, this was super helpful :)
In general I agree with this; however, I'm afraid this may be difficult given the different properties of different solvers. I'd still strive for this as much as possible. What I would also find useful are real-world problems (ideally higher-dimensional problems), as long as this doesn't blow the benchmark times out of proportion.
👍
This would be very interesting indeed, in particular since the observers interface has caused performance degradation in the past (#112).
👍
I love it! This is an excellent basis for the upcoming benchmarking journey :) |
Hi @stefan-k, happy to see you back :) On the point about early termination and setting the cost and grad to 0, if the algorithm is performing at least a small number of iterations I don't think it is really necessary for the benchmark to set those to 0, as once we have a baseline of the performance of the algorithm in solving the problem to a given threshold, any change from that baseline would be significant (as long as the threshold is not changed). Regarding the part about the standard problems I agree that it would be good to have higher dimensional problems to test (ideally based in the real world), your response gave me the final push to publish a crate that I had abandoned some time ago, a rewrite of the GKLS generator. Something like this would allow us to parametrize the benchmarks with the amount of dimensions of the problem or its complexity, they are as synthetic as they come though 😅. I've used this generator in the past to compare algorithms based on the amount of iterations on the cost function. For the purpose of this PR I think it is ok to not address the issues with the backends, so we can keep it focused on the benchmarking and preventing performance regressions, by generating the baselines to characterize the algorighms and address the performance peculiarities later on when we have them characterized.
I'll add them 👍 |
Thanks :) I was unfortunately only sort-of back, but now I should be able to be more responsive :)
I agree in principle. At least for some solvers I'm a bit afraid that having an insufficient number of iterations may lead to certain code paths not being part of the benchmark. Also, for solvers with line searches, the time spent in the line search may depend on the iteration number. However, I agree that having a baseline is the important part and I'm sure that these concerns aren't something we should bother too much about, just something we should keep in mind in case something isn't as expected.
👍
This is amazing! To be frank, my main motivation for having real-world problems is not so much for benchmarking as for having them as an educational resource for people starting out with argmin (i.e. examples). For benchmarks I think synthetic problems are great, so feel free to add your library as a dev dependency!
Good point, I absolutely agree! Thanks again for the work and patience! :) I'll strive to be more responsive from now on :) |
Hi! following up on #10.
I've trasformed the existing examples into benchmarks. I've kept the original examples, as benchmarks should be more stable than examples to allow comparing performance across time, so they might diverge from the current examples and changes in the examples should not modify the benchmark metrics.
I've, for the most part, kept the optimization problem parameters as they were in the examples.
Most of the benchmarks don't do much more than running the (rewritten) example. However in the case of ParticleSwarm, LBFGS and BFGS I've expanded the example to run the optimization with the different backends, and in the case of LBFGS and BFGS to run with multiple dimensions. Hopefully this can serve as a starting point to refine these benchmarks further. I wanted to have some feedback before going any further.
There are two issues that I wanted to discuss (maybe I'm doing something wrong with nalgebra and ndarray):
This is easily seen in the ParticleSwarm benchmark that is run on the three backends, it can also be seen in the LBFGS one (although this one lacks the nalgebra backend).
ParticleSwarm:
LBFGS (Axis input represents the amount of dimensions):
To run all benchmarks do:
To run just a benchmark file:
To run just a benchmark in a group:
The generated report:
target/criterion/report/index.html
I think the benchmarks are already useful as they are right now, however it might be worthwhile to define a common Problem for all (for example the Rosenbrok problem is very similarly defined in at least 4 benchmarks) that doesn't produce convergence problems to avoid code duplication.
In the case of an error I've forced the benchmark to panic in order to avoid a failing solver to report an incredibly small time for its execution.
Another thing that might be worthwhile benchmarking is the impact of the loggers in the optimization performance. I might add something in that respect after going through you feedback on this.
I've also modified the bench profile to include debug symbols, this isn't strictly necessary but it generates the necessary info for generating flamegraphs with these benchmarks. (change in top level Cargo.toml)
Let me know what you think on the benchmark approach and the issues, any feedback is welcomed :)