-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: complete overhaul #17
Conversation
Hi @Jutho, wasn't sure where to ask but this seems like the most natural place. What is the current status of this overhaul, and could it be merged as is already? The reason I'm asking is because of the known issue that the maximum number of iterations in the linesearch is not enforced in the current tagged version of the package, causing optimizations to stall indefinitely for badly conditioned problems. I don't know what the intended scope of this overhaul was, but since this issue is already fixed in this PR I wanted to check if you would be willing to merge this and leave further changes for dedicated PRs? Additionally, I think it would be good to expose the linesearch |
Hi @leburgel. This can probably be merged. I had some more changes locally, but remember that I wanted to push large commits that were themselves fully functional, so I guess this first commit overhauling the linesearch is good to go. I will try to go through it one more time asap. We have kind of concluded that it would be beneficial to move over to the Manifolds.jl/Manopt.jl ecosystem, rather than maintaining our own optimization library. So I have not really be continuing with this PR recently. Another stumbling block was that I really started to question how to deal with hyperparameters like maxiter for the linesearch from an interface level: should they be part of the linesearch algorithm struct or not? Here I moved it out for the linesearch specifically, but at the level of the optimisation algorithms (LBFGS) and in e.g. everything in KrylovKit.jl, it is encoded in the algorithm structure. So that is not very consistent. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17 +/- ##
==========================================
- Coverage 80.26% 79.73% -0.54%
==========================================
Files 5 7 +2
Lines 446 528 +82
==========================================
+ Hits 358 421 +63
- Misses 88 107 +19 ☔ View full report in Codecov by Sentry. |
@leburgel and @lkdvos , I continued the work here a bit further. You can now individually control verbosity of the optimization algorithm and the linesearch. In addition, you can now fully customize both the convergence criterion as well as an independent non-convergent termination criterion, which furthermore also receives total runtime as an input, so that you can exit gracefully after a certain amount of time. Let me know if there are other requests; if not this can be tagged as OptimKit v0.4 after merging. |
src/cg.jl
Outdated
linesearch::L | ||
ls_maxiter::Int | ||
ls_verbosity::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would somehow expect ls_maxiter
and ls_verbosity
to be parameters in linesearch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I should probably move them back in. I wanted to move all of these parameters out, also out of the algorithms, as part of the major overhaul. I started doing this with the linesearch, but then this didn't make it for the algorithms yet. And then I followed @leburgel's suggestion to be able to modify this value from the optimisation algorithm, but probably I should just have moved them back in to the linesearch algorithm?
However, it seems more cumbersome having to write LBFGS(; linesearch = HagerZhangLinesearch(; maxiter = ...))
than LBFGS(; ls_maxiter = )
as there is anyway only one linesearch algorithm and users probably don't care about that. Furthermore, as the optimize
method now has custom hasconverged
and shouldquit
keywords, the maxiter
and gradtol
fields in the algorithm structs are just for backward compatibility and could be removed in the future.
I wouldn't mind trying to get this correct with maximal flexibility once and forall, but I am struggling to come up with the proper interface. I think we definitely want to have default values for all of these parameters that are taken from (module wide?) ScopedValues
, such that the defaults can be overwritten from some outer scope? Or just immediately the actual value being used is a scoped value, so that using with(...)
becomes the only official way of modifying the parameters? That also doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first remark, I think the cleanest way is to simply define LBFGS(; kwargs...)
to take a wider range of keywords, which can automatically be moved to the correct structs. In that sense, you could still have that constructor take the keywords maxiter
and ls_verbosity
, but then store them in whatever way that is most convenient. These aren't the easiest functions to get right, but that should provide the necessary functionality in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it seems more cumbersome having to write
LBFGS(; linesearch = HagerZhangLinesearch(; maxiter = ...))
thanLBFGS(; ls_maxiter = )
.
I don't have a strong opinion on where these settings should be stored in the end, but I would really like to be able to specify a linesearch maxiter without having to explicitly choose a linesearch so I fully agree with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the last commit is about the most flexible interface I can come up with: the linesearch parameters can be specified as keyword arguments in the linesearch call, and take their defaults from the values encoded in the HagherZhangLinesearch
struct, which takes it values from the keyword arguments in the method that construct the Optimization algorithm, which takes its default values from a ScopedValue
defined at the OptimKit
toplevel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only catch with the current approach is that if you already constructed alg = LBFGS()
, then doing
with(KrylovKit.LS_MAXITER=>20) do
optimize(fg, x0, alg)
end
will have no effect, since the linesearch settings from alg
are being used.
In contrast,
with(KrylovKit.LS_MAXITER=>20) do
alg = LBFGS()
optimize(fg, x0, alg)
end
should take effect, although I have to admit I have not tested this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect to me!
Project.toml
Outdated
|
||
[deps] | ||
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" | ||
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7" | ||
|
||
[compat] | ||
julia = "1" | ||
julia = "1.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably also needs a compat for the stdlibs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you need something like Aqua to capture this in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also just think OptimKit predates the change where this was a requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the one docstring, good to go for me!
Other comments are not really that important
test-nightly: | ||
needs: test | ||
name: Julia nightly - ${{ matrix.os }} - ${{ matrix.arch }} | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
version: | ||
- 'nightly' | ||
os: | ||
- ubuntu-latest | ||
- macOS-latest | ||
- windows-latest | ||
arch: | ||
- x64 | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: julia-actions/setup-julia@v2 | ||
with: | ||
version: ${{ matrix.version }} | ||
arch: ${{ matrix.arch }} | ||
- uses: julia-actions/cache@v2 | ||
- uses: julia-actions/julia-buildpkg@latest | ||
- uses: julia-actions/julia-runtest@latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really make sense to test on nightly versions of Julia? I would say that since there are no strange language features we are using, simply testing on the regular versions could be sufficient. (of course, it also does not hurt that much, so I'm definitely okay with leaving this in)
# Default values for the keyword arguments using ScopedValues | ||
const LS_MAXITER = ScopedValue(10) | ||
const LS_MAXFG = ScopedValue(20) | ||
const LS_VERBOSITY = ScopedValue(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is definitely the most convenient, I think the recommended way of working with several related options is to bundle them in a single struct. The main reason is that the underlying data structure scales (logarithmic if I'm not mistaken) with depth, and these two are equivalent:
with(LS_MAXITER => 3, LS_MAXFG => 3) do
end
with(LS_MAXITER => 3) do
with(LS_MAXFG => 3) do
end
end
ie it would create a nested nested scope. Bundling them is slightly more annoying as a user, since you would have to specify all three even when changing a single one. Again, this is not something crucial, and I'm definitely okay with leaving this as is.
src/OptimKit.jl
Outdated
function optimize(fg, x, alg; | ||
precondition=_precondition, | ||
(finalize!)=_finalize!, | ||
hasconverged=DefaultHasConverged(alg.gradtol), | ||
shouldstop=DefaultShouldStop(alg.maxiter), | ||
retract=_retract, inner=_inner, (transport!)=_transport!, | ||
(scale!)=_scale!, (add!)=_add!, | ||
isometrictransport=(transport! == _transport! && inner == _inner)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a strange syntax, I would probably vote
optimize(fg, x, alg; kwargs...) -> x, f, g, numfg, history
No description provided.