Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat]: CGO packed writer api #160

Merged
merged 9 commits into from
Jan 9, 2025

Conversation

shaoting-huang
Copy link
Collaborator

@shaoting-huang shaoting-huang commented Jan 3, 2025

related: #158

  1. Put parquet writer properties into storage config
  2. Packed writer close() API return column offset mapping.
  3. Add CGO and Go API for packed writer

@sre-ci-robot sre-ci-robot requested review from sunby and tedxu January 3, 2025 09:25
@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shaoting-huang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

}
}

void DeletePackedWriter(CPackedWriter c_packed_writer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to separate the Close and Delete... functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close still writes buffers to files. delete just free the memory

auto writer =
std::make_unique<milvus_storage::PackedRecordBatchWriter>(buffer_size, trueSchema, trueFs, truePath, conf);

*packed_writer = writer.release();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why release here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to initialize packed_writer pointer


int WriteRecordBatch(CPackedWriter c_packed_writer, struct ArrowArray* array, struct ArrowSchema* schema);

int Close(CPackedWriter c_packed_writer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional information required on Close, include but not limited to: i) files written, ii) schema mapping on those files written, and iii) statistics and tracing data.

Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
Signed-off-by: shaoting-huang <[email protected]>
arrow::fs::FileSystem& fs,
const std::string& file_path,
const StorageConfig& storage_config,
const parquet::WriterProperties& props);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete the parquet properties support?

Comment on lines 46 to 47
const int pk_index,
const int ts_index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we use these two arguments?

Signed-off-by: shaoting-huang <[email protected]>
@tedxu
Copy link
Collaborator

tedxu commented Jan 9, 2025

/lgtm

@sre-ci-robot sre-ci-robot merged commit f51fd09 into milvus-io:main Jan 9, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants