Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: teach ALPArray to store validity only in the encoded array #1951

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

danking
Copy link
Member

@danking danking commented Jan 14, 2025

The patches are now always non-nullable.

This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is nullable.

I modified the benchmarks to include patch manipulation time, but notice that the test data has no patches. The benchmarks measure the overhead of is_valid. If we had test data where the invalid positions contained exceptional values, I would expect a modest improvement in both decompression and compression time.

As discussed in slack, the is_valid is expensive for two reasons: (a) creation of a Scalar and (b) conversion to an Arrow BooleanBuffer. @gatesn is working on a FilterMask change that should subsume all our boolean compression strategies into a new "BoolMask". With that in place, we can require Validity::Array to hold a "BoolMask" and thus check validity without either of the issues above.

This PR

The parameters are: (number_of_array_elements, fraction_valid).

Timer precision: 41 ns
alp_compress               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ compress_alp                          │               │               │               │         │
│  ├─ f32                                │               │               │               │         │
│  │  ├─ (100000, 0.25)    170.2 µs      │ 442.1 µs      │ 170.7 µs      │ 173.6 µs      │ 100     │ 100
│  │  ├─ (100000, 0.95)    169.3 µs      │ 202.2 µs      │ 170.4 µs      │ 171.7 µs      │ 100     │ 100
│  │  ├─ (100000, 1.0)     136.3 µs      │ 147.1 µs      │ 136.9 µs      │ 137.2 µs      │ 100     │ 100
│  │  ├─ (10000000, 0.25)  13.6 ms       │ 16.46 ms      │ 14.39 ms      │ 14.23 ms      │ 100     │ 100
│  │  ├─ (10000000, 0.95)  13.64 ms      │ 17.7 ms       │ 14.44 ms      │ 14.38 ms      │ 100     │ 100
│  │  ╰─ (10000000, 1.0)   13.55 ms      │ 14.97 ms      │ 14.35 ms      │ 14.23 ms      │ 100     │ 100
│  ╰─ f64                                │               │               │               │         │
│     ├─ (100000, 0.25)    240.7 µs      │ 385.7 µs      │ 247.8 µs      │ 249.1 µs      │ 100     │ 100
│     ├─ (100000, 0.95)    240.2 µs      │ 253.2 µs      │ 243.5 µs      │ 244.4 µs      │ 100     │ 100
│     ├─ (100000, 1.0)     172.6 µs      │ 184.4 µs      │ 175.2 µs      │ 175.5 µs      │ 100     │ 100
│     ├─ (10000000, 0.25)  15.95 ms      │ 24.24 ms      │ 16.61 ms      │ 17.25 ms      │ 100     │ 100
│     ├─ (10000000, 0.95)  15.95 ms      │ 21.34 ms      │ 16.39 ms      │ 16.85 ms      │ 100     │ 100
│     ╰─ (10000000, 1.0)   15.92 ms      │ 23.41 ms      │ 16.46 ms      │ 17.04 ms      │ 100     │ 100
╰─ decompress_alp                        │               │               │               │         │
   ├─ f32                                │               │               │               │         │
   │  ├─ (100000, 0.25)    12.2 µs       │ 34.7 µs       │ 12.29 µs      │ 12.52 µs      │ 100     │ 100
   │  ├─ (100000, 0.95)    12.12 µs      │ 12.74 µs      │ 12.35 µs      │ 12.37 µs      │ 100     │ 100
   │  ├─ (100000, 1.0)     12.16 µs      │ 12.95 µs      │ 12.37 µs      │ 12.4 µs       │ 100     │ 100
   │  ├─ (10000000, 0.25)  2.117 ms      │ 4.544 ms      │ 2.637 ms      │ 2.674 ms      │ 100     │ 100
   │  ├─ (10000000, 0.95)  2.085 ms      │ 4.458 ms      │ 2.362 ms      │ 2.504 ms      │ 100     │ 100
   │  ╰─ (10000000, 1.0)   2.097 ms      │ 3.875 ms      │ 2.229 ms      │ 2.338 ms      │ 100     │ 100
   ╰─ f64                                │               │               │               │         │
      ├─ (100000, 0.25)    23.41 µs      │ 25.16 µs      │ 23.66 µs      │ 23.68 µs      │ 100     │ 100
      ├─ (100000, 0.95)    23.2 µs       │ 24.7 µs       │ 24.06 µs      │ 24.05 µs      │ 100     │ 100
      ├─ (100000, 1.0)     22.79 µs      │ 26.08 µs      │ 22.91 µs      │ 22.95 µs      │ 100     │ 100
      ├─ (10000000, 0.25)  4.216 ms      │ 6.862 ms      │ 4.416 ms      │ 4.568 ms      │ 100     │ 100
      ├─ (10000000, 0.95)  4.242 ms      │ 7.647 ms      │ 4.59 ms       │ 4.827 ms      │ 100     │ 100
      ╰─ (10000000, 1.0)   4.236 ms      │ 8.129 ms      │ 4.377 ms      │ 4.507 ms      │ 100     │ 100

Develop

I patched develop with this PR's benchmark code

Timer precision: 41 ns
alp_compress               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ compress_alp                          │               │               │               │         │
│  ├─ f32                                │               │               │               │         │
│  │  ├─ (100000, 0.25)    136 µs        │ 279.7 µs      │ 136.8 µs      │ 138.5 µs      │ 100     │ 100
│  │  ├─ (100000, 0.95)    136.2 µs      │ 149.7 µs      │ 136.9 µs      │ 137.6 µs      │ 100     │ 100
│  │  ├─ (100000, 1.0)     136 µs        │ 148.5 µs      │ 136.6 µs      │ 137.1 µs      │ 100     │ 100
│  │  ├─ (10000000, 0.25)  13.68 ms      │ 15.06 ms      │ 14.07 ms      │ 14.14 ms      │ 100     │ 100
│  │  ├─ (10000000, 0.95)  13.67 ms      │ 19.4 ms       │ 14.05 ms      │ 14.18 ms      │ 100     │ 100
│  │  ╰─ (10000000, 1.0)   13.66 ms      │ 14.73 ms      │ 13.87 ms      │ 14.04 ms      │ 100     │ 100
│  ╰─ f64                                │               │               │               │         │
│     ├─ (100000, 0.25)    167.7 µs      │ 301.4 µs      │ 172.7 µs      │ 173.2 µs      │ 100     │ 100
│     ├─ (100000, 0.95)    167.7 µs      │ 187.7 µs      │ 170.7 µs      │ 171.1 µs      │ 100     │ 100
│     ├─ (100000, 1.0)     167.6 µs      │ 183.7 µs      │ 170.9 µs      │ 171.1 µs      │ 100     │ 100
│     ├─ (10000000, 0.25)  15.54 ms      │ 21.9 ms       │ 15.92 ms      │ 16.05 ms      │ 100     │ 100
│     ├─ (10000000, 0.95)  15.61 ms      │ 16.74 ms      │ 15.97 ms      │ 16.03 ms      │ 100     │ 100
│     ╰─ (10000000, 1.0)   15.64 ms      │ 18.85 ms      │ 16.1 ms       │ 16.23 ms      │ 100     │ 100
╰─ decompress_alp                        │               │               │               │         │
   ├─ f32                                │               │               │               │         │
   │  ├─ (100000, 0.25)    12.37 µs      │ 85.49 µs      │ 12.49 µs      │ 13.22 µs      │ 100     │ 100
   │  ├─ (100000, 0.95)    12.29 µs      │ 12.74 µs      │ 12.43 µs      │ 12.44 µs      │ 100     │ 100
   │  ├─ (100000, 1.0)     11.7 µs       │ 11.95 µs      │ 11.83 µs      │ 11.81 µs      │ 100     │ 100
   │  ├─ (10000000, 0.25)  2.081 ms      │ 3.003 ms      │ 2.175 ms      │ 2.263 ms      │ 100     │ 100
   │  ├─ (10000000, 0.95)  2.082 ms      │ 2.228 ms      │ 2.109 ms      │ 2.124 ms      │ 100     │ 100
   │  ╰─ (10000000, 1.0)   2.082 ms      │ 2.904 ms      │ 2.13 ms       │ 2.202 ms      │ 100     │ 100
   ╰─ f64                                │               │               │               │         │
      ├─ (100000, 0.25)    23.66 µs      │ 25.66 µs      │ 23.83 µs      │ 23.86 µs      │ 100     │ 100
      ├─ (100000, 0.95)    23.16 µs      │ 24.62 µs      │ 24.04 µs      │ 23.89 µs      │ 100     │ 100
      ├─ (100000, 1.0)     22.87 µs      │ 23.49 µs      │ 23.04 µs      │ 23.03 µs      │ 100     │ 100
      ├─ (10000000, 0.25)  4.221 ms      │ 5.59 ms       │ 4.326 ms      │ 4.469 ms      │ 100     │ 100
      ├─ (10000000, 0.95)  4.242 ms      │ 5.319 ms      │ 4.545 ms      │ 4.536 ms      │ 100     │ 100
      ╰─ (10000000, 1.0)   4.228 ms      │ 5.652 ms      │ 4.342 ms      │ 4.519 ms      │ 100     │ 100

The patches are now always non-nullable.

This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is
nullable.

I modified the benchmarks to include patch manipulation time, but notice that the test data has no
patches. The benchmarks measure the overhead of `is_valid`. If we had test data where the invalid
positions contained exceptional values, I would expect a modest improvement in both decompression
and compression time.

As discussed [in slack](https://spiraldb.slack.com/archives/C07BV3GKAJ2/p1736894376100079), the
`is_valid` is expensive for two reasons: (a) creation of a Scalar and (b) conversion to an Arrow
BooleanBuffer. @gatesn is working on a FilterMask change that should subsume all our boolean
compression strategies into a new "BoolMask". With that in place, we can require Validity::Array to
hold a "BoolMask" and thus check validity without either of the issues above.

This PR
-------

```
Timer precision: 41 ns
alp_compress               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ compress_alp                          │               │               │               │         │
│  ├─ f32                                │               │               │               │         │
│  │  ├─ (100000, 0.25)    170.2 µs      │ 442.1 µs      │ 170.7 µs      │ 173.6 µs      │ 100     │ 100
│  │  ├─ (100000, 0.95)    169.3 µs      │ 202.2 µs      │ 170.4 µs      │ 171.7 µs      │ 100     │ 100
│  │  ├─ (100000, 1.0)     136.3 µs      │ 147.1 µs      │ 136.9 µs      │ 137.2 µs      │ 100     │ 100
│  │  ├─ (10000000, 0.25)  13.6 ms       │ 16.46 ms      │ 14.39 ms      │ 14.23 ms      │ 100     │ 100
│  │  ├─ (10000000, 0.95)  13.64 ms      │ 17.7 ms       │ 14.44 ms      │ 14.38 ms      │ 100     │ 100
│  │  ╰─ (10000000, 1.0)   13.55 ms      │ 14.97 ms      │ 14.35 ms      │ 14.23 ms      │ 100     │ 100
│  ╰─ f64                                │               │               │               │         │
│     ├─ (100000, 0.25)    240.7 µs      │ 385.7 µs      │ 247.8 µs      │ 249.1 µs      │ 100     │ 100
│     ├─ (100000, 0.95)    240.2 µs      │ 253.2 µs      │ 243.5 µs      │ 244.4 µs      │ 100     │ 100
│     ├─ (100000, 1.0)     172.6 µs      │ 184.4 µs      │ 175.2 µs      │ 175.5 µs      │ 100     │ 100
│     ├─ (10000000, 0.25)  15.95 ms      │ 24.24 ms      │ 16.61 ms      │ 17.25 ms      │ 100     │ 100
│     ├─ (10000000, 0.95)  15.95 ms      │ 21.34 ms      │ 16.39 ms      │ 16.85 ms      │ 100     │ 100
│     ╰─ (10000000, 1.0)   15.92 ms      │ 23.41 ms      │ 16.46 ms      │ 17.04 ms      │ 100     │ 100
╰─ decompress_alp                        │               │               │               │         │
   ├─ f32                                │               │               │               │         │
   │  ├─ (100000, 0.25)    12.2 µs       │ 34.7 µs       │ 12.29 µs      │ 12.52 µs      │ 100     │ 100
   │  ├─ (100000, 0.95)    12.12 µs      │ 12.74 µs      │ 12.35 µs      │ 12.37 µs      │ 100     │ 100
   │  ├─ (100000, 1.0)     12.16 µs      │ 12.95 µs      │ 12.37 µs      │ 12.4 µs       │ 100     │ 100
   │  ├─ (10000000, 0.25)  2.117 ms      │ 4.544 ms      │ 2.637 ms      │ 2.674 ms      │ 100     │ 100
   │  ├─ (10000000, 0.95)  2.085 ms      │ 4.458 ms      │ 2.362 ms      │ 2.504 ms      │ 100     │ 100
   │  ╰─ (10000000, 1.0)   2.097 ms      │ 3.875 ms      │ 2.229 ms      │ 2.338 ms      │ 100     │ 100
   ╰─ f64                                │               │               │               │         │
      ├─ (100000, 0.25)    23.41 µs      │ 25.16 µs      │ 23.66 µs      │ 23.68 µs      │ 100     │ 100
      ├─ (100000, 0.95)    23.2 µs       │ 24.7 µs       │ 24.06 µs      │ 24.05 µs      │ 100     │ 100
      ├─ (100000, 1.0)     22.79 µs      │ 26.08 µs      │ 22.91 µs      │ 22.95 µs      │ 100     │ 100
      ├─ (10000000, 0.25)  4.216 ms      │ 6.862 ms      │ 4.416 ms      │ 4.568 ms      │ 100     │ 100
      ├─ (10000000, 0.95)  4.242 ms      │ 7.647 ms      │ 4.59 ms       │ 4.827 ms      │ 100     │ 100
      ╰─ (10000000, 1.0)   4.236 ms      │ 8.129 ms      │ 4.377 ms      │ 4.507 ms      │ 100     │ 100
```

Develop
-------

I patched develop with this PR's benchmark code

```
Timer precision: 41 ns
alp_compress               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ compress_alp                          │               │               │               │         │
│  ├─ f32                                │               │               │               │         │
│  │  ├─ (100000, 0.25)    136 µs        │ 279.7 µs      │ 136.8 µs      │ 138.5 µs      │ 100     │ 100
│  │  ├─ (100000, 0.95)    136.2 µs      │ 149.7 µs      │ 136.9 µs      │ 137.6 µs      │ 100     │ 100
│  │  ├─ (100000, 1.0)     136 µs        │ 148.5 µs      │ 136.6 µs      │ 137.1 µs      │ 100     │ 100
│  │  ├─ (10000000, 0.25)  13.68 ms      │ 15.06 ms      │ 14.07 ms      │ 14.14 ms      │ 100     │ 100
│  │  ├─ (10000000, 0.95)  13.67 ms      │ 19.4 ms       │ 14.05 ms      │ 14.18 ms      │ 100     │ 100
│  │  ╰─ (10000000, 1.0)   13.66 ms      │ 14.73 ms      │ 13.87 ms      │ 14.04 ms      │ 100     │ 100
│  ╰─ f64                                │               │               │               │         │
│     ├─ (100000, 0.25)    167.7 µs      │ 301.4 µs      │ 172.7 µs      │ 173.2 µs      │ 100     │ 100
│     ├─ (100000, 0.95)    167.7 µs      │ 187.7 µs      │ 170.7 µs      │ 171.1 µs      │ 100     │ 100
│     ├─ (100000, 1.0)     167.6 µs      │ 183.7 µs      │ 170.9 µs      │ 171.1 µs      │ 100     │ 100
│     ├─ (10000000, 0.25)  15.54 ms      │ 21.9 ms       │ 15.92 ms      │ 16.05 ms      │ 100     │ 100
│     ├─ (10000000, 0.95)  15.61 ms      │ 16.74 ms      │ 15.97 ms      │ 16.03 ms      │ 100     │ 100
│     ╰─ (10000000, 1.0)   15.64 ms      │ 18.85 ms      │ 16.1 ms       │ 16.23 ms      │ 100     │ 100
╰─ decompress_alp                        │               │               │               │         │
   ├─ f32                                │               │               │               │         │
   │  ├─ (100000, 0.25)    12.37 µs      │ 85.49 µs      │ 12.49 µs      │ 13.22 µs      │ 100     │ 100
   │  ├─ (100000, 0.95)    12.29 µs      │ 12.74 µs      │ 12.43 µs      │ 12.44 µs      │ 100     │ 100
   │  ├─ (100000, 1.0)     11.7 µs       │ 11.95 µs      │ 11.83 µs      │ 11.81 µs      │ 100     │ 100
   │  ├─ (10000000, 0.25)  2.081 ms      │ 3.003 ms      │ 2.175 ms      │ 2.263 ms      │ 100     │ 100
   │  ├─ (10000000, 0.95)  2.082 ms      │ 2.228 ms      │ 2.109 ms      │ 2.124 ms      │ 100     │ 100
   │  ╰─ (10000000, 1.0)   2.082 ms      │ 2.904 ms      │ 2.13 ms       │ 2.202 ms      │ 100     │ 100
   ╰─ f64                                │               │               │               │         │
      ├─ (100000, 0.25)    23.66 µs      │ 25.66 µs      │ 23.83 µs      │ 23.86 µs      │ 100     │ 100
      ├─ (100000, 0.95)    23.16 µs      │ 24.62 µs      │ 24.04 µs      │ 23.89 µs      │ 100     │ 100
      ├─ (100000, 1.0)     22.87 µs      │ 23.49 µs      │ 23.04 µs      │ 23.03 µs      │ 100     │ 100
      ├─ (10000000, 0.25)  4.221 ms      │ 5.59 ms       │ 4.326 ms      │ 4.469 ms      │ 100     │ 100
      ├─ (10000000, 0.95)  4.242 ms      │ 5.319 ms      │ 4.545 ms      │ 4.536 ms      │ 100     │ 100
      ╰─ (10000000, 1.0)   4.228 ms      │ 5.652 ms      │ 4.342 ms      │ 4.519 ms      │ 100     │ 100
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant