Add the functionality of the Iceberg `rewrite_manifests` procedure (e.g. in OPTIMIZE) #14821

alexjo2144 · 2022-10-28T16:02:05Z

Relates to: #9340
The Spark implementation is documented here.

When using the append operation rewriting manifests is done automatically at a set size defined by commit.manifest.min-count-to-merge, defaulting to 100. However, if write latency is important, a user may want to skip the automatic compaction and run it async to the writers.

This may be done as a separate procedure, or as a part of the OPTIMIZE command

The text was updated successfully, but these errors were encountered:

findepi · 2022-10-28T21:35:22Z

The optimize should do this, I'm not yet convince we need a separate procedure

alexjo2144 · 2022-11-03T14:33:13Z

The optimize should do this, I'm not yet convince we need a separate procedure

I'm not sure yet either. Updated the description.

findinpath · 2022-11-21T12:14:21Z

When a table’s write pattern doesn’t align with the query pattern, metadata can be rewritten to re-group data files into manifests

Taken from Iceberg Spark Procedures Docs

Here is a relative lengthy article about Iceberg which includes the reasoning behind using rewrite_manifests

https://blog.developer.adobe.com/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f

A key metric is to keep track of the count of manifests per partition.

The health of the dataset would be tracked based on how many partitions cross a pre-configured threshold of acceptable value of these metrics. The trigger for manifest rewrite can express the severity of the unhealthiness based on these metrics.

We rewrote the manifests by shuffling them across manifests based on a target manifest size. Here is a plot of one such rewrite with the same target manifest size of 8MB. Notice that any day partition spans a maximum of 4 manifests.

Before a partition used to span on up to 300 manifests.

I'm not yet convince we need a separate procedure

On the light of the above arguments, I'm inclined to say that this metadata related functionality would need an own procedure, instead of squeezing it under OPTIMIZE.

mtofano · 2024-12-26T18:32:41Z

Thank you for looking into this! This is something I am interested in as well.

In my particular use case my write pattern for back populating a table does not align with the read pattern and update pattern. Rewriting the manifests is something that I think would increase read performance.

I see that this was assigned to @ebyhr and am curious the current state / roadmap for this feature?

alexjo2144 mentioned this issue Oct 28, 2022

Allow Iceberg writes to use either append or fastAppend #14822

Closed

alexjo2144 added the enhancement New feature or request label Oct 28, 2022

alexjo2144 changed the title ~~Add an Iceberg rewrite_manifests procedure~~ Add the functionality of the Iceberg rewrite_manifests procedure Nov 3, 2022

findepi changed the title ~~Add the functionality of the Iceberg rewrite_manifests procedure~~ Add the functionality of the Iceberg rewrite_manifests procedure (e.g. in OPTIMIZE) Nov 4, 2022

ebyhr self-assigned this Dec 2, 2024

ebyhr linked a pull request Jan 10, 2025 that will close this issue

Add rewrite_manifests table procedure to Iceberg #24678

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the functionality of the Iceberg `rewrite_manifests` procedure (e.g. in OPTIMIZE) #14821

Add the functionality of the Iceberg `rewrite_manifests` procedure (e.g. in OPTIMIZE) #14821

alexjo2144 commented Oct 28, 2022 •

edited

Loading

findepi commented Oct 28, 2022

alexjo2144 commented Nov 3, 2022

findinpath commented Nov 21, 2022

mtofano commented Dec 26, 2024

Add the functionality of the Iceberg rewrite_manifests procedure (e.g. in OPTIMIZE) #14821

Add the functionality of the Iceberg rewrite_manifests procedure (e.g. in OPTIMIZE) #14821

Comments

alexjo2144 commented Oct 28, 2022 • edited Loading

findepi commented Oct 28, 2022

alexjo2144 commented Nov 3, 2022

findinpath commented Nov 21, 2022

mtofano commented Dec 26, 2024

Add the functionality of the Iceberg `rewrite_manifests` procedure (e.g. in OPTIMIZE) #14821

Add the functionality of the Iceberg `rewrite_manifests` procedure (e.g. in OPTIMIZE) #14821

alexjo2144 commented Oct 28, 2022 •

edited

Loading