-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MutableArithmetics #113
Comments
The function I'm looking for from MutableArithmetics is From a developer in that discussion:
|
I usually avoid |
I see. Yes the thought was to have a macro users could prepend to expressions containing the mutable struct, so that a pre-allocated temporary buffer could be used instead for the temporaries generated, while keeping the syntax simple and transparent to other types. The current At least at the time, I couldn't see any major benefits to satisfying the MutableArithmetics interface other than for |
Looking at the Another thing that seems like a drawback of the
It'd allow users to, e.g., move allocations outside of loops or outright eliminate them in a generic and thread-safe manner, but at the cost of uglier and less convenient code. Take this loop as an example: for x ∈ vec
s += f(x)
end Assuming y = ... # preallocate
for x ∈ vec
y = MA.operate_to!!(y, f, x)
s = MA.operate!!(+, s, y)
end Furthermore, if y = ... # preallocate
buf = MA.buffer_for(f, eltype(vec))
for x ∈ vec
y = MA.buffered_operate_to!!(buf, y, f, x)
s = MA.operate!!(+, s, y)
end Other MA functions to implement would be The purpose of using the |
You are correct, with this approach internally, the macro in its current state is definitely not thread safe. The current plan is to switch to a thread-safe memory pool of TPSs which the temporaries can draw from. This would make the macro thread safe, but is not implemented yet.
Kind of. Right now there are only 2 types of TPSs ( I do feel that in the long term, MutableArithmetics would be a good interface to implement here. However I would really like to see some kind of equivalent |
A thread-safe memory pool will have its costs, but it seems like a reasonable solution. Implementing the MA interfaces would enable users to squeeze out every last ounce of performance, when necessary, using (hopefully) familiar APIs like I guess it's true that |
For now, I think I will wait until MA has better documentation and an equivalent I also am not the best with writing macros, |
@nsajko can you point me to any documentation, or describe how to actually implement |
Does this example MA implementation help: kalmarek/Arblib.jl#178? |
Thanks! I thought about this a lot, but decided it seems best to stick with and improve my current implementation: I have made julia> d = Descriptor(3, 7); x = vars(d);
julia> y = rand(3);
julia> @btime @FastGTPSA begin # 2 allocations per TPS, no temporaries generated:
t1 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
t2 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
z = $y[1]^3*sin($y[2])/log(2+$y[3])-exp($y[1]*$y[2])*im;
end;
11.251 μs (4 allocations: 3.88 KiB)
julia> t3 = ComplexTPS64(); t4 = ComplexTPS64(); @gensym w; # pre-allocated TPSs and some scalar number
julia> @btime @FastGTPSA! begin # zero allocations
$t3 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
$t4 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
$w = $y[1]^3*sin($y[2])/log(2+$y[3])-exp($y[1]*$y[2])*im;
end;
10.930 μs (0 allocations: 0 bytes) This basically does everything I could do with MA (and more actually, since On that note, I think my solution for |
Originally posted by @mattsignorelli in #46 (comment)
It's not true that MA "only works for simple arithmetic", MA works for arbitrary functions. If you have questions I'd be happy to help.
The text was updated successfully, but these errors were encountered: