Skip to content

Schema Types

Selfeer edited this page Nov 15, 2024 · 1 revision

Schema Types in Parquet-Java

When creating a Parquet file using parquet-java, you can specify the schema types for your data fields. The available schema types are:

  • requiredGroup
  • repeatedGroup
  • required
  • repeated
  • optionalGroup
  • optional

Below is a brief explanation of each schema type.

requiredGroup

A requiredGroup is a group of fields (a nested schema) that must be present in every record and cannot be null. The fields within this group can have their own repetition levels (required, optional, or repeated).

repeatedGroup

A repeatedGroup represents a group that can occur zero or more times, effectively modeling a list or array of nested records.

required

A required field must be present in every record and cannot be null. This ensures that the field always contains a value.

repeated

A repeated field can have zero or more values, modeling a list or array of values of the same type.

optionalGroup

An optionalGroup is a group of fields that may or may not be present in a record. The entire group can be null.

optional

An optional field may or may not be present in a record. If the field is not present, it is considered null.

Clone this wiki locally