Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: complete overhaul #17

Merged
merged 14 commits into from
Jan 24, 2025
Merged

WIP: complete overhaul #17

merged 14 commits into from
Jan 24, 2025

Conversation

Jutho
Copy link
Owner

@Jutho Jutho commented Feb 9, 2024

No description provided.

@leburgel
Copy link

leburgel commented Jan 2, 2025

Hi @Jutho, wasn't sure where to ask but this seems like the most natural place. What is the current status of this overhaul, and could it be merged as is already?

The reason I'm asking is because of the known issue that the maximum number of iterations in the linesearch is not enforced in the current tagged version of the package, causing optimizations to stall indefinitely for badly conditioned problems. I don't know what the intended scope of this overhaul was, but since this issue is already fixed in this PR I wanted to check if you would be willing to merge this and leave further changes for dedicated PRs?

Additionally, I think it would be good to expose the linesearch maxiter and maxfuneval to users via the algorithm constructors. I did this for LBFGS here a while back and have been using that since. I'm definitely willing to add this to other relevant algorithms, either on top of this or as a separate PR.

@Jutho
Copy link
Owner Author

Jutho commented Jan 3, 2025

Hi @leburgel. This can probably be merged. I had some more changes locally, but remember that I wanted to push large commits that were themselves fully functional, so I guess this first commit overhauling the linesearch is good to go. I will try to go through it one more time asap.

We have kind of concluded that it would be beneficial to move over to the Manifolds.jl/Manopt.jl ecosystem, rather than maintaining our own optimization library. So I have not really be continuing with this PR recently. Another stumbling block was that I really started to question how to deal with hyperparameters like maxiter for the linesearch from an interface level: should they be part of the linesearch algorithm struct or not? Here I moved it out for the linesearch specifically, but at the level of the optimisation algorithms (LBFGS) and in e.g. everything in KrylovKit.jl, it is encoded in the algorithm structure. So that is not very consistent.

@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2025

Codecov Report

Attention: Patch coverage is 78.09917% with 53 lines in your changes missing coverage. Please review.

Project coverage is 79.73%. Comparing base (8135d1c) to head (fefcb76).

Files with missing lines Patch % Lines
src/linesearches.jl 86.29% 17 Missing ⚠️
src/tangentvector.jl 0.00% 15 Missing ⚠️
src/lbfgs.jl 80.00% 8 Missing ⚠️
src/OptimKit.jl 12.50% 7 Missing ⚠️
src/cg.jl 83.87% 5 Missing ⚠️
src/gd.jl 94.44% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #17      +/-   ##
==========================================
- Coverage   80.26%   79.73%   -0.54%     
==========================================
  Files           5        7       +2     
  Lines         446      528      +82     
==========================================
+ Hits          358      421      +63     
- Misses         88      107      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jutho
Copy link
Owner Author

Jutho commented Jan 20, 2025

@leburgel and @lkdvos , I continued the work here a bit further. You can now individually control verbosity of the optimization algorithm and the linesearch. In addition, you can now fully customize both the convergence criterion as well as an independent non-convergent termination criterion, which furthermore also receives total runtime as an input, so that you can exit gracefully after a certain amount of time. Let me know if there are other requests; if not this can be tagged as OptimKit v0.4 after merging.

src/cg.jl Outdated
Comment on lines 38 to 40
linesearch::L
ls_maxiter::Int
ls_verbosity::Int
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would somehow expect ls_maxiter and ls_verbosity to be parameters in linesearch

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I should probably move them back in. I wanted to move all of these parameters out, also out of the algorithms, as part of the major overhaul. I started doing this with the linesearch, but then this didn't make it for the algorithms yet. And then I followed @leburgel's suggestion to be able to modify this value from the optimisation algorithm, but probably I should just have moved them back in to the linesearch algorithm?

However, it seems more cumbersome having to write LBFGS(; linesearch = HagerZhangLinesearch(; maxiter = ...)) than LBFGS(; ls_maxiter = ) as there is anyway only one linesearch algorithm and users probably don't care about that. Furthermore, as the optimize method now has custom hasconverged and shouldquit keywords, the maxiter and gradtol fields in the algorithm structs are just for backward compatibility and could be removed in the future.

I wouldn't mind trying to get this correct with maximal flexibility once and forall, but I am struggling to come up with the proper interface. I think we definitely want to have default values for all of these parameters that are taken from (module wide?) ScopedValues, such that the defaults can be overwritten from some outer scope? Or just immediately the actual value being used is a scoped value, so that using with(...) becomes the only official way of modifying the parameters? That also doesn't seem right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first remark, I think the cleanest way is to simply define LBFGS(; kwargs...) to take a wider range of keywords, which can automatically be moved to the correct structs. In that sense, you could still have that constructor take the keywords maxiter and ls_verbosity, but then store them in whatever way that is most convenient. These aren't the easiest functions to get right, but that should provide the necessary functionality in my opinion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it seems more cumbersome having to write LBFGS(; linesearch = HagerZhangLinesearch(; maxiter = ...)) than LBFGS(; ls_maxiter = ).

I don't have a strong opinion on where these settings should be stored in the end, but I would really like to be able to specify a linesearch maxiter without having to explicitly choose a linesearch so I fully agree with this.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the last commit is about the most flexible interface I can come up with: the linesearch parameters can be specified as keyword arguments in the linesearch call, and take their defaults from the values encoded in the HagherZhangLinesearch struct, which takes it values from the keyword arguments in the method that construct the Optimization algorithm, which takes its default values from a ScopedValue defined at the OptimKit toplevel.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only catch with the current approach is that if you already constructed alg = LBFGS(), then doing

with(KrylovKit.LS_MAXITER=>20) do
optimize(fg, x0, alg)
end

will have no effect, since the linesearch settings from alg are being used.

In contrast,

with(KrylovKit.LS_MAXITER=>20) do
alg = LBFGS()
optimize(fg, x0, alg)
end

should take effect, although I have to admit I have not tested this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect to me!

Project.toml Outdated

[deps]
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[compat]
julia = "1"
julia = "1.6"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably also needs a compat for the stdlibs

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you need something like Aqua to capture this in the tests.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also just think OptimKit predates the change where this was a requirement

Copy link

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the one docstring, good to go for me!
Other comments are not really that important

Comment on lines +48 to +71
test-nightly:
needs: test
name: Julia nightly - ${{ matrix.os }} - ${{ matrix.arch }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
version:
- 'nightly'
os:
- ubuntu-latest
- macOS-latest
- windows-latest
arch:
- x64
steps:
- uses: actions/checkout@v4
- uses: julia-actions/setup-julia@v2
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
- uses: julia-actions/cache@v2
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-runtest@latest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really make sense to test on nightly versions of Julia? I would say that since there are no strange language features we are using, simply testing on the regular versions could be sufficient. (of course, it also does not hurt that much, so I'm definitely okay with leaving this in)

Comment on lines +8 to +11
# Default values for the keyword arguments using ScopedValues
const LS_MAXITER = ScopedValue(10)
const LS_MAXFG = ScopedValue(20)
const LS_VERBOSITY = ScopedValue(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is definitely the most convenient, I think the recommended way of working with several related options is to bundle them in a single struct. The main reason is that the underlying data structure scales (logarithmic if I'm not mistaken) with depth, and these two are equivalent:

with(LS_MAXITER => 3, LS_MAXFG => 3) do 
end
with(LS_MAXITER => 3) do
    with(LS_MAXFG => 3) do
    end 
end

ie it would create a nested nested scope. Bundling them is slightly more annoying as a user, since you would have to specify all three even when changing a single one. Again, this is not something crucial, and I'm definitely okay with leaving this as is.

src/OptimKit.jl Outdated
Comment on lines 34 to 41
function optimize(fg, x, alg;
precondition=_precondition,
(finalize!)=_finalize!,
hasconverged=DefaultHasConverged(alg.gradtol),
shouldstop=DefaultShouldStop(alg.maxiter),
retract=_retract, inner=_inner, (transport!)=_transport!,
(scale!)=_scale!, (add!)=_add!,
isometrictransport=(transport! == _transport! && inner == _inner))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a strange syntax, I would probably vote

optimize(fg, x, alg; kwargs...) -> x, f, g, numfg, history

@Jutho Jutho merged commit a41542a into master Jan 24, 2025
12 checks passed
@Jutho Jutho deleted the jh/overhaul branch January 24, 2025 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants