-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Selfeer edited this page Dec 18, 2024
·
43 revisions
Versions | Releases |
---|---|
License | Apache-2.0 |
Parquetify is a lightweight tool leveraging the parquet-java library to generate Apache Parquet files based on the file definition provided in a JSON file.
Feature | Description |
---|---|
Physical Data Types: | All physical data types: INT32 , INT64 , BOOLEAN , FLOAT , DOUBLE , BINARY , FIXED_LEN_BYTE_ARRAY . |
Logical Data Types: | Most logical types : UTF8 , DECIMAL , DATE , TIME_MILLIS , TIME_MICROS , TIMESTAMP_MILLIS , TIMESTAMP_MICROS , ENUM , NONE , MAP , LIST , STRING , MAP_KEY_VALUE , TIME , INTEGER , JSON , BSON , UUID , INTERVAL , UINT_8 , UINT_16 , UINT_32 , UINT_64 , INT_8 , INT_16 , INT_32 , INT_64 , FLOAT16 . |
Precision & Scale: | Precision and scale for DECIMAL types. |
Compression: |
NONE , SNAPPY , GZIP , LZO , BROTLI , LZ4 , ZSTD . |
Encodings: | Automatically set by the writer for a given column. |
Bloom Filter: | Apply a bloom filter to specific columns or all columns (including those within groups). |
Writer Version: | Specify writer version (1.0 , 2.0 ). |
Customizable Sizes: | Row group and page sizes. |
- Parquet File Name
- Options of the File
- File Compression
- Writer Version
- Row and Page Size
- Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Float16 Column
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types
- Encodings
- File Encryption
- Extra Metadata Entries
Developed and maintained by the Altinity team.
- Home
- Parquet File Name
- Options of the File
- File Compression
- Writer Version
- Row and Page Size
- Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Float16 Column
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types
- Encodings
- File Encryption
- Extra Metadata Entries