Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix writing of compressed ORC files with large stripe footers #17700

Open
wants to merge 11 commits into
base: branch-25.02
Choose a base branch
from

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Jan 8, 2025

Description

In ORC, stripe footers can be compressed, the same way as the data. This means that compressed footers need to be written in multiple blocks if they are larger than the maximum block size. This applies even if the footer is actually uncompressed (in this case a flag in the block header is raised).
Currently, the ORC writer does not take into account that footer can be larger than max block size, and writes the entire thing in a single block, which is not valid.
The issue only applies to compressed files. Uncompressed files do not apply this limitation to the footers.

This PR changes the way the stripe footers are written to account for this case. The output hasn't changed for files with small stripe footers.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Jan 8, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 8, 2025
@vuule vuule added bug Something isn't working non-breaking Non-breaking change labels Jan 8, 2025
@vuule
Copy link
Contributor Author

vuule commented Jan 9, 2025

/ok to test

@vuule vuule marked this pull request as ready for review January 9, 2025 17:42
@vuule vuule requested a review from a team as a code owner January 9, 2025 17:42
@vuule vuule requested review from bdice and nvdbaranec January 9, 2025 17:42
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Show resolved Hide resolved
@vuule vuule requested review from ttnghia and nvdbaranec January 10, 2025 21:30
@vuule
Copy link
Contributor Author

vuule commented Jan 15, 2025

Looks like recent changes broke this. Will check tomorrow.

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: Burndown
Development

Successfully merging this pull request may close these issues.

3 participants