-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero masked arithmetic operations #2426
base: master
Are you sure you want to change the base?
Zero masked arithmetic operations #2426
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
All ops in this section return `0` for `mask=false` lanes. These are equivalent | ||
to, and potentially more efficient than, `IfThenElseZero(m, Add(a, b));` etc. | ||
|
||
* <code>V **MaskedMaxOrZero**(M m, V a, V b)</code>: returns `Max(a, b)[i]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the other PR, how about just MaskedMax(m, a, b)?
or `0` if `m[i]` is false. | ||
* <code>V **MaskedMulOrZero**(M m, V a, V b)</code>: returns `a[i] * b[i]` | ||
or `0` if `m[i]` is false. | ||
* <code>V **MaskedDivideOrZero**(M m, V a, V b)</code>: returns `a[i] / b[i]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaskedDiv for consistency?
false. | ||
* <code>V **MaskedMulAddOrZero**(M m, V a, V b, V c)</code>: returns `a[i] * | ||
b[i] + c[i]` or `0` if `m[i]` is false. | ||
* <code>V **MaskedNegMulAddOrZero**(M m, V a, V b, V c)</code>: returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are adding Masked variants for almost all ops, but not MulSub. Do we actually have use-cases for all of these? I'd prefer to add as we go, rather than pre-emptively add ops just in case.
return sv##OP##_##CHAR##BITS##_m(m, a, b); \ | ||
} | ||
// User-specified mask. Mask=false value is zero. | ||
#define HWY_SVE_RETV_ARGMVVZ(BASE, CHAR, BITS, HALF, NAME, OP) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underscore before the Z (MVV_Z) for consistency?
#define HWY_NATIVE_ZERO_MASKED_ARITH | ||
#endif | ||
|
||
#define HWY_SVE_FMAZ(BASE, CHAR, BITS, HALF, NAME, OP) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move this next to HWY_SVE_RETV_ARGMVVV and call it HWY_SVE_RETV_ARGMVVV_Z?
(Note also the P/M naming suggestion from PR 2428)
|
||
#undef HWY_SVE_FMAZ | ||
|
||
template <class V, class M> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ops that just call detail::op, we can move them out of namespace detail and avoid the wrapper function.
HWY_SVE_FOREACH(HWY_SVE_RETV_ARGMVV_M, MaskedMul_M, mul); | ||
} | ||
|
||
// ------------------------------ MulLower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not documented yet?
Also, Lower() is generally the lower half. There is a GetLane which returns the lowest. How about naming this MulLane?
Finally, this can be efficiently implemented via MaskedMulOr (which we have) plus FirstN(d, 1) or perhaps a new First1(d) op which wraps the svptrue. That seems like it would be more widely useful and no less efficient - how about we add that instead?
ForAllTypes(ForPartialVectors<TestMulAddOrZero>()); | ||
} | ||
|
||
struct TestWidenMulPairwiseAddOrZero { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of duplication here. What do you think of adding the MaskedNegMulAdd() call into the existing test for NegMulAdd? That avoids duplicating the test setup.
Introduces:
Max(a, b)[i]
orzero
ifm[i]
is falsea[i] + b[i]
or0
ifm[i]
is false.a[i] - b[i]
or0
ifm[i]
is false.a[i] * b[i]
or0
ifm[i]
is false.a[i] / b[i]
or0
ifm[i]
is false.a[i] + b[i]
saturated to the minimum/maximum representable value, or0
if m[i]` is false.a[i] - b[i]
saturated to the minimum/maximum representable value, or0
if m[i]` is false.0
ifm[i]
is false.a[i] * b[i] + c[i]
or0
ifm[i]
is false.-a[i] * b[i] + c[i]
or0
ifm[i]
is false.a
andb
toTFromD<D>
and computesa[2*i+1]*b[2*i+1] + a[2*i+0]*b[2*i+0]
, or0
ifm[i]
is false.Testing is included for all operations where both the masking and the underlying operation is tested.