Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential speedups #14

Open
milankl opened this issue May 10, 2023 · 3 comments
Open

Potential speedups #14

milankl opened this issue May 10, 2023 · 3 comments

Comments

@milankl
Copy link
Contributor

milankl commented May 10, 2023

  1. Diffusive terms only once

Move

https://github.com/DJ4Earth/DJ-barotropic-gyre/blob/7237a9fc785d545e1d46fdf014c18b5b8b716264/explicit_solver/compute_time_deriv.jl#LL32C1-L33C94

to the end of all substeps and time step forward with dt, in ShallowWaters.jl we do this like

https://github.com/milankl/ShallowWaters.jl/blob/d1609c0be39f279d23d237887870405156c4b0ca/src/time_integration.jl#L263-L265

so after the 4 RK4 substeps have been computed, compute the diffusive terms and add them to u,v with dt in an Euler forward fashion. The reason for this is that the current time step is constrained by the gravity wave speed and the diffusive terms can have larger time steps.

  1. Include PV advection in the RK substeps but only update them once every time step

https://github.com/milankl/ShallowWaters.jl/blob/d1609c0be39f279d23d237887870405156c4b0ca/src/time_integration.jl#L251C2-L257

Especially the Arakawa and Lamb scheme is expensive, so instead of calculating $qhu, qhv$ on every substep, you can do $qh_0u_0, qh_0v_0$ where the 0 is the state of the model at $t=t_0$ and therefore the term is kept constant till $t = t_0 + dt$. The reason is that because it's a fully explicit model you don't have to update the slow terms as often as those that contain the fast gravity waves.

  1. Move to a baroclinic system

in the barotropic system the time step is determined by resolving the gravity waves at speed $\sqrt{gH}$, so reducing g will slow down the gravity waves, but not affect the non-linear terms. In ShallowWaters.jl I went for $g=0.1m/s^2$, which reduces the speed from 70m/s to 7m/s, such that similar in speed as eddies. You basically just get more turbulence for the same compute time.

https://github.com/milankl/ShallowWaters.jl/blob/d1609c0be39f279d23d237887870405156c4b0ca/src/default_parameters.jl#LL15C1-L16C70

  1. Combinations

You can probably combine 1) and 3) but I'd be careful to combine 2) and 3) because with 3) the physical argument in 2) doesn't hold anymore

@milankl milankl mentioned this issue May 10, 2023
@milankl
Copy link
Contributor Author

milankl commented May 10, 2023

With dab70cc I'm timing a 10 day 100x100 as

julia> ExplicitSolver.integrate(10,100,100);
 47.614535 seconds (717.91 k allocations: 26.462 GiB, 11.49% gc time)

for comparison, ShallowWaters.jl is about 60x faster

julia> using ShallowWaters
julia> run_model(Ndays=10,g=10,H=500,nx=100,L_ratio=1);
Starting ShallowWaters on Wed, 10 May 2023 17:25:03 without output.
100% Integration done in 0.75s.

so there's clearly tons of things we can do for a bit more 🚀

@milankl
Copy link
Contributor Author

milankl commented May 11, 2023

I forgot that ShallowWaters.jl uses Float32 as default, with Float64 it's

julia> run_model(Float64,Ndays=10,g=10,H=500,nx=100,L_ratio=1);
Starting ShallowWaters on Thu, 11 May 2023 15:48:54 without output.
100% Integration done in 1.63s.

I'd recommend you to also go to Float32, as there's nothing in this shallow water model that would suffer from lower precision But the way how the algorithm is written, it's memory-bounded, meaning that you'd get very easily another 2x performance increase if you go to Float32.

The one thing you may want to consider anyway is to non-dimensionalise the biharmonic diffusion term as this scales with $1/\Delta x^4$, which for $\Delta x = 100km$ is 1e-20 (and so technically still well within the Float32 range). So

  1. leave out the/dx^4 factor in LLu, LLv
  2. include them directly in nu

Typical nu values are huge >1e15, but the 1/dx^4 tiny, so you see there's some back and forth going on here. As you want to optimise the nu anyway, scaling the LLu LLv operators with dx^4 (so that they are order 1) would make your nu order 1e-3 to 1e-6 or so, consequently it'll have units of 1/s and you can interpret its inverse as a time scale of ~hours typically. More intuitively if you ask me...

@milankl
Copy link
Contributor Author

milankl commented May 11, 2023

As a comparison, the python swm code takes about 20s for 10 days at 100x100

milan@dhcp-10-29-47-236 swm % python swm_run.py
Starting shallow water model on Thu May 11 16:22:32 2023
Model integration will take approximately 19.31s,
Integration done in 19.11s on Thu May 11 16:22:51 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant