Skip to content

Creating Encrypted File

Selfeer edited this page Dec 13, 2024 · 1 revision

Encryption

encryption is a feature that secures the data stored in Parquet files by protecting its confidentiality and integrity. This is particularly useful when sensitive or critical data is shared, stored, or transferred. Parquet encryption ensures that only authorized parties can access or read the file's content.

Key Features of Parquet File Encryption:

  1. Column-level Encryption: Parquet allows encrypting specific columns instead of the entire file. This means you can protect sensitive data (e.g., a column with credit card numbers) while leaving less sensitive data unencrypted for easier access.

  2. File-level Encryption: Encrypts the entire file, including metadata and all columns, for complete protection.

  3. Metadata Protection: Parquet encryption can also secure file metadata, ensuring even schema information isn't exposed unless authorized.

Example:

Full example here

{
  "options": {
    "encryption": {
      "footerKey": "1234567890123456",
      "aadPrefix": "key",
      "storeAadPrefixInFile": true,
      "footerKeyMetadata": "something",
      "encryptedColumns": [
        {
          "path": "id"
        },
        {
          "path": "name"
        }
      ]
    }
  }
}

footerKey: A byte-encoded key used to encrypt the file's footer, ensuring the file's integrity and confidentiality.

aadPrefix: A byte-encoded string that is prepended to the Additional Authenticated Data (AAD) used in encryption, providing context-specific security and preventing key reuse issues.

storeAadPrefixInFile: A boolean setting that determines whether the AAD prefix is stored directly inside the file. If false, the AAD prefix must be managed externally.

footerKeyMetadata: Associated metadata stored alongside the footer key, allowing additional contextual information or identification data to be embedded with the encryption key.

encryptedColumns: A list of column paths that should be encrypted using the specified encryption properties, ensuring that sensitive data in those columns remains secure.

Note

Here length is the specified length of the FIXED_LEN_BYTE_ARRAY which is 16 for the given uuid values.

Clone this wiki locally