-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching for Repeated Vector Assebly #1029
Comments
There are in-place assembly functions provided in the API. In particular, you should have a look at If you are using a |
Hi @JordiManyer, Thanks for the quick reply! I noticed the
The timings and allocations are the same between the two methods. Am I missing something here? |
Those "caches" are allocations for a single cell. We allocate the elemental matrix/vector for a single cell (and all intermediary steps) and reuse that for every cell. Not much more can be optimised, unless you go very deep into the code. I am quite sure that if you increase the size of your problem, those allocations will stay the same (or they should). Its a constant overhead, so to speak. Something you can try is using "performance mode", which deactivates all checks in the code. Those checks can be relatively quite expensive, specially for small problems. Try setting Gridap.Helpers.set_performance_mode() Then close julia and open it again. The code should recompile, without any checks. |
I see. You're right that the memory allocations for the in-place method stay the same as I increase I'm okay with this computation taking a while the first time, but since it needs to be called at least once every timestep I was hoping there would be some way to store a cache that would improve performance... This should be possible, since it's really just doing the same computations over and over (see integral in original post). Alternatively, if you know of a way to parallelize this function, please let me know. I'll give Thanks again! |
If you are willing to go low-level, these are the functions you have to look at: function numeric_loop_vector!(b,a::SparseMatrixAssembler,vecdata)
strategy = get_assembly_strategy(a)
for (cellvec, _cellids) in zip(vecdata...)
cellids = map_cell_rows(strategy,_cellids)
if length(cellvec) > 0
rows_cache = array_cache(cellids)
vals_cache = array_cache(cellvec)
vals1 = getindex!(vals_cache,cellvec,1)
rows1 = getindex!(rows_cache,cellids,1)
add! = AddEntriesMap(+)
add_cache = return_cache(add!,b,vals1,rows1)
caches = add_cache, vals_cache, rows_cache
_numeric_loop_vector!(b,caches,cellvec,cellids)
end
end
b
end The caches you are looking for are created by rows_cache = array_cache(cellids)
vals_cache = array_cache(cellvec)add_cache = return_cache(add!,b,vals1,rows1)
add_cache = return_cache(add!,b,vals1,rows1)
caches = add_cache, vals_cache, rows_cache In order, these are the caches for
When all checks are disabled, most allocations should come from those caches. If nothing changes in your problem, those caches could be reused. |
Still, I maintain that when dealing with real problems (i.e 10^5/10^6 cells, non-cartesian meshes) those allocations should be negligible compared to the weakform evaluation, integration and assembly. |
It's new to Gridap v0.18, and it's still not fully taken advantage of (I have a branch somewhere with more changes to come soon). |
Yes, I think you're right; after timing each of the individual steps with a "real" problem, it looks like the I still feel that, given the repetitive nature of this problem, there must be some clever way to save time here, but at the moment I'm not familiar enough with the code to see how. |
Hello,
I was wondering if there is any way to save a
cache
forassemble_vector(l, V)
calls, in a similar way to how theevaluate!(cache, f, x...)
function works here?To explain what I mean, let me set up an example. Suppose you are trying to solve an advection problem for a tracer
c(x, t)
where the flow field is a function of space and timeu = u(x, t)
. Then you might want to be able to assemble a "right-hand-side" vector for the linear form,Since
u
andc
change with time and the advection term is non-linear, we'll need to recomputeassemble_vector(l, D)
quite frequently in order to stepc
forward in time. This is very costly for large systems.Since
u
is also aFEFunction
, shouldn't we be able to store some kind of cache to computeA
? Naively, one could compute the rank-3 tensorfor each of the basis functions
φ_i
,φ_j
, andψ_k
and then multiply this byc_j
andu_k
to get our vector, but the tensor is probably too large for realistic systems. Maybe there is something cheaper memory-wise to save so thatassemble_vector
does not become such a bottleneck for these problems? I suppose one could throwGridapDistributed
at the problem, though I'm not sure how that would work in practice.Anyways, perhaps this is something you all have already thought of and I just haven't read the docs well enough... let me know what you think!
-Henry
The text was updated successfully, but these errors were encountered: