Skip to content

Commit

Permalink
feat: run VortexFileArrayStream on dedicated IoDispatcher (#1232)
Browse files Browse the repository at this point in the history
This PR separates IO for the `LayoutBatchStream` from the calling
thread's runtime. This is beneficial for a few reasons

1. Now decoding into Arrow will not block the executor where IO occurs,
which previously could lead to timeouts
2. It allows us to use non-Tokio runtimes such as compio which uses
io_uring

To enable (2), we revamp the `VortexReadAt ` trait to return `!Send`
futures that can be launched on thread-per-core runtimes. We also
enforce the `VortexReadAt` implementers are clonable, again so they can
be shared across several concurrent futures. We implement a clonable
`TokioFile` type, and un-implement for things like `Vec<u8>` and `&[u8]`
that don't have cheap cloning.


Although read operations are now `!Send`, it is preferable
for`LayoutBatchStream` to be `Send` so that it can be polled from Tokio
or any other runtime. To achieve this, we add an `IoDispatcher`, which
sits in front of either a tokio or compio executor to dispatch read
requests.

A couple of things to consider:

* **Cancellation safety**: On Drop, the dispatcher will join its worker
threads, gracefully awaiting any already-submitted tasks before it
completes. It's probably more preferable to find a way to force an
immediate shutdown, so we may want to ferry a `shutdown_now` channel and
`select!` over both of them inside of the inner loop of the dispatcher
* **Async runtime safety**: The way you can implement cross-runtime IO
is that you provide both a reader (`VortexReadAt`) and a `IoDispatcher`
to the `LayoutBatchStreamBuilder`. There is nothing at compile time that
enforces that the reader and the dispatcher are compatible. Perhaps we
should parameterize `VortexReadAt` on a runtime and then we can do
type-level checks to ensure that all IO operations can only be paired
with the right runtime. That would likely inflate the PR a good bit
  • Loading branch information
a10y authored Nov 14, 2024
1 parent 437392b commit 42b7527
Show file tree
Hide file tree
Showing 36 changed files with 1,174 additions and 530 deletions.
13 changes: 6 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ name: CI

on:
push:
branches: [ "develop" ]
pull_request: { }
workflow_dispatch: { }
branches: ["develop"]
pull_request: {}
workflow_dispatch: {}

permissions:
actions: read
contents: read
checks: write # audit-check creates checks
issues: write # audit-check creates issues
checks: write # audit-check creates checks
issues: write # audit-check creates issues

env:
CARGO_TERM_COLOR: always
Expand Down Expand Up @@ -140,7 +140,7 @@ jobs:
name: "miri"
runs-on: ubuntu-latest
env:
MIRIFLAGS: -Zmiri-strict-provenance -Zmiri-symbolic-alignment-check -Zmiri-backtrace=full
MIRIFLAGS: -Zmiri-strict-provenance -Zmiri-symbolic-alignment-check -Zmiri-backtrace=full -Zmiri-disable-isolation
steps:
- uses: rui314/setup-mold@v1
- uses: actions/checkout@v4
Expand Down Expand Up @@ -182,4 +182,3 @@ jobs:
- name: "Make sure no files changed after regenerating"
run: |
test -z "$(git status --porcelain)"
10 changes: 10 additions & 0 deletions .zed/tasks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// Static tasks configuration.
//
// Example:
[
{
"label": "clippy (workspace)",
"command": "cargo clippy --all-targets --all-features",
"shell": "system"
}
]
Loading

0 comments on commit 42b7527

Please sign in to comment.