-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getindex overhead #166
Comments
For the record I obtain the same run-times julia> @btime arr_sum($X);
6.528 μs (0 allocations: 0 bytes)
julia> @btime arr_sum($XO);
6.535 μs (0 allocations: 0 bytes) This is odd, as indexing is definitely faster with julia> @btime getindex($X, 1, 1, 1, 1, 1, 1);
3.768 ns (0 allocations: 0 bytes)
julia> @btime getindex($XO, 3, 2, 1, 2, 3, 4);
14.857 ns (0 allocations: 0 bytes)
julia> getindex_inbounds(X, inds...) = @inbounds X[inds...];
julia> @btime getindex_inbounds($X, 1, 1, 1, 1, 1, 1);
2.694 ns (0 allocations: 0 bytes)
julia> @btime getindex_inbounds($XO, 3, 2, 1, 2, 3, 4);
4.960 ns (0 allocations: 0 bytes) |
Sorry I wasn't aware there's a performance regression in julia 1.6-dev, see also JuliaLang/julia#38073 |
Is the "raw" indexing time relevant? Any performance-sensitive operation is presumably in a loop, and in a loop won't the compiler hoist all the slow parts out of the loop? As evidence, if I define julia> @btime arr_sum($X)
5.992 μs (0 allocations: 0 bytes)
2069.986834289989
julia> @btime arr_sum($XO)
4.093 μs (0 allocations: 0 bytes)
2069.986834289989
julia> @btime arr_sum_bc($X)
7.791 μs (0 allocations: 0 bytes)
2069.986834289989
julia> @btime arr_sum_bc($XO)
5.022 μs (0 allocations: 0 bytes)
2069.986834289989 Despite the fact that an isolated Bad inlining could nix this, but since we've annotated with I don't understand why OffsetArrays are faster than Arrays, though. |
I'm recently building up some cache array with OffsetArrays and realized the performance bottleneck becomes
getindex(::OffsetArray, I)
.The benchmark result looks interesting; unsure why
arr_sum
runs faster on OffsetArray 🤔 Any ideas?The default
checkbounds
implementation definitely takes too long here. I believe the additional time is spent on the construction ofIdOffsetRange
and its generic and thus slowergetindex
.These might be benchmark artifacts, though.
The text was updated successfully, but these errors were encountered: