-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using ChaCha20-Poly1305 for encryption is NOT FIPS140-3 compliant and not justified and is considered unprotected plaintext #780
Comments
On Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz I get:
On Qualcomm Snapdragon 888 2.84GHz (SM8350, Kryo 680) I get:
So, for 64 size blocks 20319820 ChaCha20 vs. 43635690 AES-256-GCM on a mobile phone shows AES is actually faster on the same device. |
Useful overview: https://csrc.nist.gov/Projects/fips-140-3-transition-effort Probably easiest is to look for approved algos from openssl of some large vendor: which leads to and which leads e.g. to or which leads e.g. to I think more interesting read will be which of the Block Cipher Modes of Operation should be used for genomic data, provided every single personal genome assebly will start with chr1, continue with telomeric repeat, likewise if similar if not even exactly same organization/ordering of data in SAM/BAM/FASTQ will be more exploitable than others. And yeah, the TLS filesize limits and security implications enforced by different algo modes. I know, it is tough. |
crypt4gh is a standard for file storage at rest. If you want to transmit files securely, you should use a protocol designed for that (i.e. TLS). Yes, this does mean the data may be encrypted twice if you transmit it. ChaCha20-Poly1305 was chosen because it was already used in existing standards, has good library support and is relatively easy to use. AES-GCM mode was considered at the time, but mainly rejected due to the limitation on the amount of data that could be encrypted under a single key (around 64Gb, see also NIST Special Publication 800-38D section 8.3). As genomic data files can often be bigger than this, it would have introduced some complication around the need to encrypt large files using more than one key. There have been calls for an improved cipher that avoids this problem, but I would imagine that it will be a while before anything got approved. AES-256-CBC does not provide authentication and is vulnerable to padding oracle attacks. While these problems can be worked around, it is easier to use an AEAD construct that does not have them, like AES-GCM or ChaCha20-Poly1305. Unfortunately, as noted, ChaCha20-Poly1305 does not have FIPS approval, which does mean users who need FIPS compliance cannot currently use crypt4gh. The specification could be extended to support compliant encryption (it was designed to allow this sort of extension) if there is demand for such an upgrade. This is more likely to happen if anyone who would like to see the change is willing to help implement it. |
Hi,
I wondered for a while why ChaCha20-Poly1305 was selected by GA4GH for data encryption and the closest I could find were notes on fixed blocksize giving possibility to jump into middle of a stream to decrypt the content (after indexing) and the TLS would need to re-enrypt the data. To some extent, concerns were with maximum file(stream) sizes transferrable during a single network connection.
The document http://samtools.github.io/hts-specs/crypt4gh.pdf is still quite sparse on this and I wonder if explicit explanation why AES-256-GCM was not selected could be found elsewhere. The data will be transferred via servers (Intel/AMD based) so the rumors that ChaCha20 is faster on mobile devices is out of question, IMO. From reading some docs on the internet actually the key to speed are not AES instruction but already availability of SSE, SSE2 and AVX instruction registers provides most of the benefit to ciphers utilizing tuples with matching sizes see section 4.1 in Gimli 20190927.
Moreover, on my mobile phone is only marginally slower. On the same phone, AES-256-GCM is faster than ChaCha20 on the same device. What am I missing?
I think none of these really justify use of an untested/non-certified algorithm: https://csrc.nist.gov/pubs/fips/140-3/final
Anyway, the major argument is from auditing and compliance perspective. The algorithm is not accepted by FIPS-140, not even in FIPS-140-3.
Let me quote from the FIPS-140-3 Cryptographic Module Validation Program CMVP addendum:
Use of Non-validated Cryptographic Modules by Federal Agencies and Departments
Non-validated cryptography is viewed by NIST as providing no protection to the information or data—in effect the data would be considered unprotected plaintext. If the agency specifies that the information or data be cryptographically protected, then FIPS 140-2 or FIPS 140-3 is applicable. In essence, if cryptography is required, then it must be validated. Should the cryptographic module be revoked, use of that module is no longer permitted.
If
LocalEGA
andFEGA
and other tools includingsamtoos htsget
are to be deployed world-wide then commonly accepted and certified crypto must be enabled in the default. Are we at all allowed to stored data using uncertified algorithm, which is officially recognized as no protection? I propose switching to AES.Hundreds of sotware tools, Linux distros, etc., undergo validation. Obviously, tools exposing uncertified ChaCha20 are breaking eventual certification, see some examples and search for
ChaCha20
:https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp4046.pdf
https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp4284.pdf
https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp3820.pdf
More can be found at https://www.nist.gov/search?s=ChaCha20&from=2&index=all-meta-engine&order=r&rpp=10
Same applies to BLAKE2, Salsa20 and successors. BLAKE2 was also selected by GA4GH. I am not certain on compliance of X25519, based on Curve25519.
I am not a crypto expert but seems in multi-user settings AES-256-CBC would be advantageous ober -GCM. And the submission hosts will encrypt data concurrently, right?
For ChaCha20 security comparison against AES-GCM, see
https://dl.acm.org/doi/abs/10.1145/3460120.3484814 , and from https://dl.acm.org/action/downloadSupplement?doi=10.1145%2F3460120.3484814&file=CCS21-fp593.mp4 I quote two slides:
The text was updated successfully, but these errors were encountered: