-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amortize sha2 compression loop #231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. I went through the logic on paper and it seems sound but it is 3am so...
%stack (shifted, rot, value) -> (rot, value, shifted) | ||
// stack: value >> rot, value | ||
SWAP1 | ||
PUSH $rot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm pushing again instead of calling DUP
because $rot
always fits in 4 bytes at most anyway, and the savings on the # CPU cycles are worth the extra overhead on BytePackingStark
.
@muursh You may want to review the latest changes, I've pushed some additional optimizations after your review. |
Will do this evening |
@@ -27,6 +27,6 @@ | |||
// stack: c, a, b, Sigma_0(a) | |||
%sha2_majority | |||
// stack: Maj(c, a, b), Sigma_0(a) | |||
%add_u32 | |||
ADD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for 32-bit reduction, as the result of this macro is being reduced after (see compression section). Similar reasoning for %sha2_temp_word1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
evm_arithmetization/src/cpu/kernel/asm/hash/sha2/message_schedule.asm
Outdated
Show resolved
Hide resolved
Co-authored-by: Linda Guiga <[email protected]>
Co-authored-by: Linda Guiga <[email protected]>
Applies the tricks initially specified in the SHA1 specs for the macros
sha2_choice
andsha2_majority
, as well as reducingu32
addition chain overhead for already reduced inputs.Yields ~4.6% savings for SHA2 precompile.