Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port TimeColumn to arrow-rs #8638

Merged
merged 9 commits into from
Jan 10, 2025
Merged

Port TimeColumn to arrow-rs #8638

merged 9 commits into from
Jan 10, 2025

Conversation

emilk
Copy link
Member

@emilk emilk commented Jan 10, 2025

@emilk emilk added 🏹 arrow concerning arrow exclude from changelog PRs with this won't show up in CHANGELOG.md labels Jan 10, 2025
Copy link

github-actions bot commented Jan 10, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
d4105c9 https://rerun.io/viewer/pr/8638 +nightly +main

Note: This comment is updated whenever you push a commit.

Comment on lines -1612 to -1621
if *times.data_type() != timeline.datatype() {
return Err(ChunkError::Malformed {
reason: format!(
"Time data for timeline {} has the wrong datatype: expected {:?} but got {:?} instead",
timeline.name(),
timeline.datatype(),
*times.data_type(),
),
});
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed

Comment on lines +291 to +297
let times = self
.times_raw()
.iter()
.chain(rhs.times_raw())
.copied()
.collect_vec();
let times = ArrowScalarBuffer::from(times);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Rust to do the concat instead of using a computation kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason? Maybe worth a comment and/or TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use the computational kernel I would have to convert the buffers into dynamic array, and then dynamic-cast them back again. A lot of work, but little to gain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I forgot the compute functions like concat don't support generics. Annoying.

---
[
"1048576 scalars",
"37.1 MiB in total",
"37.0 MiB in total",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

impl<T: ArrowNativeType> SizeBytes for ScalarBuffer<T> {
#[inline]
fn heap_size_bytes(&self) -> u64 {
self.inner().capacity() as _
Copy link
Member Author

@emilk emilk Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most places we take .len() instead of .capacity(), but this isn't codified. What is the preference here @teh-cmc? I would think .capacity() is the more interesting thing, as that will catch failure to call .shrink_to_fit(), for instance 🤔

@emilk
Copy link
Member Author

emilk commented Jan 10, 2025

@rerun-bot full-check

@emilk emilk marked this pull request as ready for review January 10, 2025 13:51
Copy link

@emilk
Copy link
Member Author

emilk commented Jan 10, 2025

@rerun-bot full-check

Copy link

Started a full build: https://github.com/rerun-io/rerun/actions/runs/12714439344

@emilk emilk requested a review from teh-cmc January 10, 2025 18:28
#[derive(Debug, thiserror::Error)]
pub enum TimeColumnError {
#[error("Time columns had nulls, but should be dense")]
Nulls,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ContainsNulls would be slightly more descriptive

Comment on lines +291 to +297
let times = self
.times_raw()
.iter()
.chain(rhs.times_raw())
.copied()
.collect_vec();
let times = ArrowScalarBuffer::from(times);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason? Maybe worth a comment and/or TODO?

@emilk emilk merged commit cdf0181 into main Jan 10, 2025
31 checks passed
@emilk emilk deleted the emilk/arrow-time-column branch January 10, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow exclude from changelog PRs with this won't show up in CHANGELOG.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants