-
Notifications
You must be signed in to change notification settings - Fork 0
Creating Encrypted File
encryption is a feature that secures the data stored in Parquet files by protecting its confidentiality and integrity. This is particularly useful when sensitive or critical data is shared, stored, or transferred. Parquet encryption ensures that only authorized parties can access or read the file's content.
-
Column-level Encryption: Parquet allows encrypting specific columns instead of the entire file. This means you can protect sensitive data (e.g., a column with credit card numbers) while leaving less sensitive data unencrypted for easier access.
-
File-level Encryption: Encrypts the entire file, including metadata and all columns, for complete protection.
-
Metadata Protection: Parquet encryption can also secure file metadata, ensuring even schema information isn't exposed unless authorized.
Example:
{
"options": {
"encryption": {
"footerKey": "1234567890123456",
"aadPrefix": "key",
"storeAadPrefixInFile": true,
"footerKeyMetadata": "something",
"encryptedColumns": [
{
"path": "id"
},
{
"path": "name"
}
]
}
}
}
footerKey: A byte-encoded key used to encrypt the file's footer, ensuring the file's integrity and confidentiality.
aadPrefix: A byte-encoded string that is prepended to the Additional Authenticated Data (AAD) used in encryption, providing context-specific security and preventing key reuse issues.
storeAadPrefixInFile: A boolean setting that determines whether the AAD prefix is stored directly inside the file. If false, the AAD prefix must be managed externally.
footerKeyMetadata: Associated metadata stored alongside the footer key, allowing additional contextual information or identification data to be embedded with the encryption key.
encryptedColumns: A list of column paths that should be encrypted using the specified encryption properties, ensuring that sensitive data in those columns remains secure.
Note
Here length
is the specified length of the FIXED_LEN_BYTE_ARRAY
which is 16 for the given uuid
values.
Developed and maintained by the Altinity team.
- Home
- Parquet File Name
- Options of the File
- File Compression
- Writer Version
- Row and Page Size
- Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Float16 Column
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types
- Encodings
- File Encryption
- Extra Metadata Entries