-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedups #15
Speedups #15
Conversation
With the last commit, the RK4 integration (excl the julia> ExplicitSolver.integrate(10,100,100);
0.284759 seconds however, computing the tendencies currently allocates 16MB on every call julia> using BenchmarkTools
julia> @btime ExplicitSolver.advance($a...)
16.884 ms (424 allocations: 16.08 MiB)
|
I think I start to understand, operations like rhs.h .= rhs.eta1 .+ params.H
rhs.U .= rhs.u1 .* rhs.h_u
rhs.V .= rhs.v1 .* rhs.h_v
rhs.kinetic_sq .= sqrt.(rhs.kinetic) can be done in-place thanks to the broadcasting rhs.h_u .= interp.ITu * rhs.h actually does conceptually more something like this temporary_array = interp.ITu * rhs.h
rhs.h_u .= temporary_array
garbage_collect(temporary_array) I know the python swm code does the same, but I'd need to benchmark that to see how fast this gets in comparison. |
One way to avoid that is to use julia> using SparseArrays, BenchmarkTools
julia> M = spdiagm(rand(100));
julia> v = ones(100);
julia> o = zero(v);
julia> function f!(o,M,v) # what is currently done
o .= M*v
end
f! (generic function with 1 method)
julia> function g!(o,M,v) # in-place version calling mul! directly
SparseArrays.mul!(o,M,v,true,false)
end
g! (generic function with 1 method)
julia> @btime f!($o,$M,$v); # currently
215.717 ns (1 allocation: 896 bytes)
julia> @btime g!($o,$M,$v); # allocation free
158.479 ns (0 allocations: 0 bytes) The speed difference is here not massive, but given that's done over and over again, you can quickly allocate the >20GB in ~40s as mentioned in #15, so the overall savings are likely >2x |
@swilliamson7 I thought it might come handy to have a macro that translates |
The rhs (without advection) is now allocation free: julia> M = ExplicitSolver.integrate(10,100,100);
6.769857 seconds |
The last commit implements the Sadourny advection scheme, just to illustrate how this can be done allocation free too, however, this line @inplacemul adv_v = Iqv * q # v-component -qhu is currently commented out as I'm getting a DimensionMismatch error, could you check the Iqv/Ivq interpolation operators? |
With the last commit, moving the dissipative terms from RK4 to Euler (option 1 in #15) we're at julia> M = ExplicitSolver.integrate(10,100,100);
5.513422 seconds So almost 10x faster than before. Sure, Arakawa and Lamb would make it a bit slower again. |
Sorry it's taken so long for me to get to this, other work came up. From what I can see the code actually runs substantially faster now, which is amazing! I'm going to try running the changes on my computer, and specifically am curious if Enzyme can differentiate through the macro you wrote (pretty sure it should be possible, I don't seen any reason it wouldn't be.) Referring to the issue with the interpolation operators, that should be fixed now (I had flipped the order in which they were passed to the structure by accident), thanks for catching it. Another question, is there a reason this was only a draft pull request? Or better phrased, I'm not quite sure what it means that it was written as a draft, other than that it's a work in progress... |
No worries!
|
Ah, that makes sense! I fixed it a couple days ago but I'll double check to be safe |
if you didn't do it here in this PR then git will throw a conflict, let's merge it correctly |
hmm there are no conflicts with the base branch, in which branch did you apply the changes to? |
I'd done it in a different branch and merged to main, if that makes sense? It's here: #16. I'm still learning Github, if you can't tell....:) |
Ah yeah, sorry I didn't affect the lines you changed in that PR that's why git doesn't see a conflict - so smart ;) |
Then feel free to merge it! |
Not sure the best place to contact, but it seems that something in the speedups made the model unstable, I was getting NaN's for both velocity fields and the displacement For now I've reverted to the code as it was before, but do you know what might have caused it/was running it on your computer diverging? I saw that the assertion checks had been commented out, so that might have hidden the problem |
Sorry, yeah, put them back in again! What's your default diffusion? Maybe that's too low? If you were at the edge of stability before then evaluating the diffusion only once every timestep may push you over the edge that's true. What does the vorticity field look like? |
It's all good! For diffusion I mimicked the choice in your Python code, which ultimately chose The vorticity is small, after 10 days on a 70 by 70 grid it's on the order of 1e-7 |
viscosity and diffusion are often used interchangeably in the ocean/atmosphere modelling community. However, there's quite a nuance to it:
Now when we simulate fluid motion at resolution larger than the viscous scale where the actual viscosity of water/air would remove energy we need something that balances the energy cascade at the grid scale. Otherwise we'd pile up energy there and the model blows up due to numerical instability. One way of doing this is adding a Laplacian with a coefficient that, in most cases (unless you do DNS), would need to be several orders of magnitude larger than the viscosity coefficient that would represent the viscous forces inside water/air. So in simulations we need diffusion for numerical stability, which however has conceptually little to do with viscosity unless you take honey as a good approximation for water/air. But there's somewhat an equivalence here, hence the ambiguity most people have between viscosity and diffusion. I try to separate these terms by what they are supposed to do: represent viscous forces or ensure numerical stability. For viscosity there's no reason to use a biharmonic Laplacian: it's the divergence of the stress tensor not the Laplacian of the divergence of the stress tensor. For numerical stability, however, there's all sorts of good arguments why you'd want a higher power of your Laplacian, it's stronger on smaller scales and leaves the larger scales more unaffected. And you may want to choose your diffusion coefficient as a function of the state of your flow (what you're optimising ;) to only have higher diffusion where you actually need it for stability. But if you think of the diffusion in your model as viscosity then there's little reason to go from |
Regarding the scaling of the diffusion coefficient, I took |
For comparison with ShallowWaters.jl using ShallowWaters
run_model(nx=100,output=true,L_ratio=1,Lx=3840e3,g=9.81,bc="nonperiodic",wind_forcing_x="double_gyre",seasonal_wind_x=false,topography="flat",Ndays=200)
Starting ShallowWaters run 1 on Wed, 17 May 2023 23:00:17
Model integration will take approximately 21.0s,
and is hopefully done on Wed, 17 May 2023 23:00:38
100% Integration done in 16.8s.
All data stored. produces out.mp4So interesting stuff should only happen after 10 days (the simulation runs for 200)! I can always recommend to look at videos. I use ncvis to create the frames and then ffmpeg to turn them into a video
unfortunately, ncvis does not seem to adapt to the square aspect ratio... |
This was extremely helpful! I think I'm starting to understand a little bit. I'm going to be confusing and move back to the notation you'd used in your original Python code from here. If I'm understanding correct, what you're saying is that the choice of I'm going to re add the changes and see if moving to |
Maybe we've introduced another typical source of error: I usually think of the diffusion coefficients as being positive but if you go from harmonic to biharmonic you'd need to flip the sign. The 1D discrete Laplacian has a 1 -2 1 stencil, so you take away from the middle and put it to the sides, which is what you want for diffusion, so you'd do |
Just to say this is numerically definitely possible, as this is what I'm doing in ShallowWaters.jl too ☝🏼 so the question is just how to find the bug 😃 |
I don't think it's a sign issue in the laplacian operator, I've like quadruple checked that my time-derivatives matched the ones in the original python code 😶 so at least for the implementation of constant
ah yes, the best part of coding |
adding that I also quadruple checked every discrete operator as I was building them! |
I don't know how you've checked equalities between the python swm and your tendencies, but note that the diffusion is really small in the first 10 days and only really kicks in when you start to develop strong gradients in u,v which isn't the case in the first 10 days... |
The discrete operators are constant, so they're easy to compare. The time derivatives I just ran the model for some amount of time, I believe I ran it for maybe 30 days, and then compared to my code for the same setup. This I can do again and see, but I'm confused about why they would match for a set amount of time and only after X days diverge, that seems unlikely to me |
But you're dealing with a chaotic system, that's exactly one of its properties! Unless you are executing exactly the same float operation in exactly the same order you're solutions will always diverge eventually, the question is just when that is. The first tendencies you compute will be dominted by waves, if you were to output every time step you should see gravity waves bouncing around that eventually die out, then there's Kelvin waves that will zoom around the boundaries (where they are confined to) that kick off Rossby waves in the East that you can even see in the video propagating westward throughout the domain. All of these waves require only a correct implementation of the pressure gradient and coriolis terms, but non-linear terms are small and diffusion usually too. So if you compare the tendencies before eddies actually start to develop you may think they're the same apart from rounding errors even though something is still wrong! |
Can you point me to the code that you are running that blows up? I struggle to find that version in your repo... |
Yes yes, I understand that, but if all my forcings, parameters, discrete operators, and initial conditions match then I still don't see why they would diverge. I don't see how the physics in the model would be different if the setups are the same
Not surprised, it's not currently on the repo, give me a couple minutes to send it |
Okay I'm finished, could you do me a favor and open a new draft pull request where we can discuss the code and keep track of changes? I can email a file with all the scripts and stuff that I'd want to use, it has all your changes. I needed to move things around a bit. Putting a couple notes here I've just learned: (1) Moving my old code to a harmonic Laplacian and (2) The speedup code with a harmonic Laplacian and I knew that the diffusion term affected stability I just always begin to worry when I see my code diverge, I don't like when things break |
If you're okay with my request could you send your email? |
implements some ideas of #14