Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hw-accelerated AES on Thumb and Arm #8326

Merged
merged 23 commits into from
Nov 27, 2023

Conversation

daverodgman
Copy link
Contributor

@daverodgman daverodgman commented Oct 8, 2023

Description

Extend support for AESCE to 32-bit builds.

Saves around 450b for Thumb 2 -Oz (but not for TF-M, because our reference target doesn't have Neon).

Manually tested via qemu, for arm, thumb and aarch64 with gcc and clang.

Do we want to put this in the Changelog as Security, because it enables const-time AES for these targets?

PR checklist

Please tick as appropriate and edit the reasons (e.g.: "backport: not needed because this is a new feature")

  • changelog provided
  • backport no - not a bugfix
  • tests provided - build test only. Manually tested with qemu

Copy link
Contributor

@yuhaoth yuhaoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

library/aesce.c Outdated Show resolved Hide resolved
library/aesce.c Outdated Show resolved Hide resolved
library/aesce.c Outdated Show resolved Hide resolved
library/aesce.c Outdated Show resolved Hide resolved
@yuhaoth
Copy link
Contributor

yuhaoth commented Oct 9, 2023

I just push a commit to remove plain c code. mtest reports no code size improvement because mbedtls_internal_aes_[en|de]crypt use plain c .

I test it with below commands. Clang version is 14.0

# Plain C only
../mbedtls-docs/tools/bin/mtest -s -MBEDTLS_AESCE_C -MBEDTLS_AES_USE_HARDWARE_ONLY armv8-thumb2
# AESCE only
../mbedtls-docs/tools/bin/mtest -s MBEDTLS_AESCE_C MBEDTLS_AES_USE_HARDWARE_ONLY armv8-thumb2
# Both
../mbedtls-docs/tools/bin/mtest -s MBEDTLS_AESCE_C -MBEDTLS_AES_USE_HARDWARE_ONLY armv8-thumb2

And the result is

with commit Without commit
Plain C only 260410 260410
AESCE only 258328 260652
Both 262023 261979

Copy link
Contributor

@yuhaoth yuhaoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check the clang versions. poly64*_t are available in clang 11.0 on aarch32

tests/scripts/all.sh Outdated Show resolved Hide resolved
library/aesce.c Outdated Show resolved Hide resolved
@daverodgman
Copy link
Contributor Author

@yuhaoth I think the existing implementation of vget_low_p64 for GCC 5 is wrong - looking at arm_neon.h different behaviour is needed for big/little endian. I think that the implementation I've done for clang is correct (it is just a trivial wrapper around vget_low_u64 which already handles endianness) - probably we should use this for gcc as well. WDYT?

@daverodgman daverodgman added needs-review Every commit must be reviewed by at least two team members, needs-ci Needs to pass CI tests size-s Estimated task size: small (~2d) priority-medium Medium priority - this can be reviewed as time permits component-crypto Crypto primitives and low-level interfaces labels Oct 9, 2023
@daverodgman
Copy link
Contributor Author

I just push a commit to remove plain c code. mtest reports no code size improvement because mbedtls_internal_aes_[en|de]crypt use plain c .

This looks good - please raise as a separate PR when this is in - let's avoid scope creep on this one.

@daverodgman
Copy link
Contributor Author

@bensze01 please could you add the linux-headers package to the CI (at least on the Ubuntu image) so that we can test the runtime-detection here? thanks

@gilles-peskine-arm
Copy link
Contributor

please could you add the linux-headers package to the CI (at least on the Ubuntu image) so that we can test the runtime-detection here? thanks

It's easy to add the package to the docker images, but should we? Compiling a Linux application should not require kernel headers. If we do something that relies on them (and we aren't testing mbedtls as a Linux kernel component), it's good that it fails to compile on the CI: it tells us we're doing something wrong.

@daverodgman
Copy link
Contributor Author

daverodgman commented Oct 9, 2023

It's easy to add the package to the docker images, but should we? Compiling a Linux application should not require kernel headers. If we do something that relies on them (and we aren't testing mbedtls as a Linux kernel component), it's good that it fails to compile on the CI: it tells us we're doing something wrong.

That is what I thought. On my local Ubuntu 22.04, it's OK to only include <sys/auxv.h> because (AIUI) it pulls in a copy of the necessary defines that is intended for user-space to use (in <bits/hwcap.h>, provided by libc6-dev), so I removed our pre-existing include of <asm/hwcap.h> and it worked fine. It didn't work on the CI though - maybe that is lacking libc6-dev? So I reverted to including <asm/hwcap.h>, but I'm open to suggestions as to the preferred way to do it.

Currently in development we include <asm/hwcap.h>, and we don't build-test it in the CI. That also seems to be the case for SHA256 (see somewhat related #7198 ). #8298 adds a build test for SHA256 - unsure why that is able to pass (without needing hwcap.h), but this PR is not. Maybe HWCAP_SHA2 is defined by auxv.h whereas the ones we check here are not?

@gilles-peskine-arm
Copy link
Contributor

sys/auxv.h is in the libc6-dev package on on an arm64 Ubuntu. More generally, it's in the glibc dev package on architectures that have this header (it doesn't exist for x86). On other architectures, it's in an extra libc package that's part of the cross-compilation toolchain, e.g. linux-libc-dev-armel-cross, which libc6-dev-armel-cross. depends on.

I don't know why the cross-compilation toolchain packages are split this way. The native packages have a similar split, but sys/auxv.h is in libc6-dev, not linux-libc-dev.) linux-libc6-dev-xxx contains some configuration-agnostic and backward-compatible (but architecture-dependent) kernel headers repackaged by glibc and intended to compile user mode applications, as opposed to linux-headers-xxx containing configuration-specific and version-specific kernel headers intended to compile kernel modules. My earlier comment was about the linux-headers-xxx packages. I don't see any reason not to install the linux-libc-dev-xxx packages.

In fact, we do have the linux-libc-dev-xxx packages, because the libc6-dev-xxx packages depend on them. But we only have libc6-dev-armel-cross, i.e. armv7 without SIMD. What I think we're missing here is arm64 packages, for armv8 support.

@yuhaoth
Copy link
Contributor

yuhaoth commented Oct 10, 2023

@yuhaoth I think the existing implementation of vget_low_p64 for GCC 5 is wrong - looking at arm_neon.h different behaviour is needed for big/little endian. I think that the implementation I've done for clang is correct (it is just a trivial wrapper around vget_low_u64 which already handles endianness) - probably we should use this for gcc as well. WDYT?

Yes, it is wrong.

library/aesce.c Outdated
# if __clang_major__ < 4
# error "Minimum version of Clang for MBEDTLS_AESCE_C is 4.0."
# if defined(MBEDTLS_ARCH_IS_ARM32) && (__clang_major__ < 11)
# error "Minimum version of Clang for MBEDTLS_AESCE_C on 32-bit Arm or Thumb is 111.0."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong typo

Suggested change
# error "Minimum version of Clang for MBEDTLS_AESCE_C on 32-bit Arm or Thumb is 111.0."
# error "Minimum version of Clang for MBEDTLS_AESCE_C on 32-bit Arm or Thumb is 11.0."

library/sha256.c Outdated
asm ("sha256h q0, q0, v0.4s" : : : "v0");
#else
asm ("sha256h.32 q0, q0, q0" : : : "q0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put the volatile on these two lines? (I was going to raise a PR to do this, but it would conflict with this one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DId you mean to make this comment on the SHA PR? I can do that there if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes, but we should always have asm volatile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for asm that does not have optimisable-out side effects - for this (i.e. the normal case where the asm is just used to compute something) we can allow the compiler to re-order this with respect to surrounding code if it wants.

@daverodgman
Copy link
Contributor Author

daverodgman commented Oct 10, 2023

sys/auxv.h is in the libc6-dev package on on an arm64 Ubuntu. More generally, it's in the glibc dev package on architectures that have this header (it doesn't exist for x86). On other architectures, it's in an extra libc package that's part of the cross-compilation toolchain, e.g. linux-libc-dev-armel-cross, which libc6-dev-armel-cross. depends on.

I don't know why the cross-compilation toolchain packages are split this way. The native packages have a similar split, but sys/auxv.h is in libc6-dev, not linux-libc-dev.) linux-libc6-dev-xxx contains some configuration-agnostic and backward-compatible (but architecture-dependent) kernel headers repackaged by glibc and intended to compile user mode applications, as opposed to linux-headers-xxx containing configuration-specific and version-specific kernel headers intended to compile kernel modules. My earlier comment was about the linux-headers-xxx packages. I don't see any reason not to install the linux-libc-dev-xxx packages.

In fact, we do have the linux-libc-dev-xxx packages, because the libc6-dev-xxx packages depend on them. But we only have libc6-dev-armel-cross, i.e. armv7 without SIMD. What I think we're missing here is arm64 packages, for armv8 support.

Thanks. I think that it's not the arm64 packages that we are missing, but the 32-bit Armv8 packages, i.e. the hard-float ABI packages?

On further investigation on my system (aarch64 Ubuntu 22.04), linux-libc-dev provides the aarch64 definitions in asm/hwcap. The 32-bit definitions are also in a (different) asm/hwcap, which is provided by linux-libc-dev-armhf-cross. There is a second copy of the native 64-bit definitions in bits/hw_cap which is automatically pulled in via sys/auxv.h.

In searching the web, asm/hwcap seems to be how these flags are normally included - see e.g. https://community.arm.com/arm-community-blogs/b/operating-systems-blog/posts/runtime-detection-of-cpu-features-on-an-armv8-a-cpu#:~:text=include%20%3Csys/auxv.h%3E-,%23include%20%3Casm/hwcap.h%3E,-int%20main()%0A%7B%0A%C2%A0%C2%A0%C2%A0%20long%20hwcaps2

So in this case, I think we're missing linux-libc-dev-armhf-cross from the CI, and possibly linux-libc-dev-arm64-cross. And I think the code should include asm/hwcap.h.

Features
* Support Armv8 Crypto Extension acceleration for SHA-256
when compiling for Thumb or 32-bit Arm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) extra blank line at the end of the file
(b) PR title references AES, this is about SHA-256 - isn't that #8298?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you've commented on the wrong PR :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR missing needs-preceding-pr?

Copy link
Contributor Author

@daverodgman daverodgman Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's based on 8298 in order to pick up the definition of MBEDTLS_ARCH_IS_ARMV8. Yes, I'll add the label.

@daverodgman daverodgman added the needs-preceding-pr Requires another PR to be merged first label Oct 10, 2023
@daverodgman daverodgman removed the needs-ci Needs to pass CI tests label Oct 10, 2023
@daverodgman
Copy link
Contributor Author

CI failure is spurious - OpenCI passed

@daverodgman daverodgman added needs-work and removed needs-review Every commit must be reviewed by at least two team members, labels Oct 24, 2023
@yuhaoth yuhaoth removed the needs-preceding-pr Requires another PR to be merged first label Oct 25, 2023
@daverodgman daverodgman added needs-review Every commit must be reviewed by at least two team members, needs-reviewer This PR needs someone to pick it up for review and removed needs-work needs-reviewer This PR needs someone to pick it up for review labels Oct 25, 2023
Copy link
Contributor

@yuhaoth yuhaoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tom-cosgrove-arm tom-cosgrove-arm requested review from tom-cosgrove-arm and removed request for tom-cosgrove-arm November 7, 2023 11:09
Copy link
Contributor

@tom-cosgrove-arm tom-cosgrove-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tom-cosgrove-arm tom-cosgrove-arm added approved Design and code approved - may be waiting for CI or backports and removed needs-review Every commit must be reviewed by at least two team members, labels Nov 27, 2023
@daverodgman daverodgman added this pull request to the merge queue Nov 27, 2023
Merged via the queue into Mbed-TLS:development with commit 9fbac38 Nov 27, 2023
@daverodgman daverodgman mentioned this pull request Nov 28, 2023
3 tasks
support_build_aes_armce() {
# clang >= 4 is required to build with AES extensions
ver="$(clang --version|grep version|sed -E 's#.*version ([0-9]+).*#\1#')"
[ "${ver}" -ge 11 ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, missed that this doesn't match the comment two lines above. I have a PR touching this function that I'll fix this in

make -B library/aesce.o CC=clang CFLAGS="--target=arm-linux-gnueabihf -mcpu=cortex-a32+crypto -mthumb -S"
not grep -E 'aes[0-9a-z]+.[0-9]\s*[qv]' library/aesce.o
msg "clang, test aarch64 crypto instructions not built"
make -B library/aesce.o CC=clang CFLAGS="--target=aarch64-linux-gnu -march=armv8-a -S"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tom-cosgrove-arm I'm confused as to exactly in what setting we test the absence of AES instructions. Specifically, why are we testing that on armv8-a and not armv8-a+crypto? Is that just copypasta or is there a reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the errors are on lines 4441 and 4444. Here we are checking that we don't emit AES instructions when we don't ask for them, so that (for example) we won't get SIGILL when run on an older Raspberry Pi (IIRC the RPi5 is the first with crypto extensions). If we passed +crypto to the compiler it would be allowed to use any of those instructions itself, so could give false positives. So not having +crypto here is correct, but having +crypto in the two cases above is incorrect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks! I'll fix it in #9286.

Hmmm...

we don't emit AES instructions when we don't ask for them

But if we're telling the compiler not to use AES instructions, how would the test ever fail? If we were enabling assembly that calls AES instructions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And below with sha256:

            make -B library/sha256.s CC=clang CFLAGS="--target=aarch64-linux-gnu -march=armv8-a"
            grep -E 'sha256[a-z0-9]+\s+[qv]' library/sha256.s

So we do find sha256 instructions on armv8-a, presumably from our assembly?

Copy link
Contributor

@davidhorstmann-arm davidhorstmann-arm Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following this thread for interest, but I just noticed and was about to raise: does this grepping work? It seems like we are grepping for the assembler mnemonic in the object file, which doesn't seem like it would work, even if we have intrinsics.

Surely we need to objdump it first?

[I have checked on x86 in case I'm missing something and the aeskeygenassist intrinsic doesn't leave a symbol in the object file, for example]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -S in CFLAGS means "output assembly, not object code, into the output file" (which we still name with.o, but hey-ho)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, makes sense, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit acbdbc7c2b32368b07a5b49a170b202d880afa12
Author: Gilles Peskine <[email protected]>
Date:   2024-06-19 14:16:05 +0200

    Use .s extension for assembly
    
    Having assembly files called *.o was confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Design and code approved - may be waiting for CI or backports component-crypto Crypto primitives and low-level interfaces priority-medium Medium priority - this can be reviewed as time permits size-s Estimated task size: small (~2d)
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants