Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inverted_index.search): add index applier #2868

Merged
merged 35 commits into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3bf69bb
feat(inverted_index.search): add fst applier
zhongzc Nov 30, 2023
8fdd24f
fix: typos
zhongzc Nov 30, 2023
738ebe2
feat(inverted_index.search): add fst values mapper
zhongzc Dec 1, 2023
cfbdf6b
chore: remove meta check
zhongzc Dec 1, 2023
1cc4b2c
fix: fmt & clippy
zhongzc Dec 1, 2023
5bc2f9e
refactor: one expect for test
zhongzc Dec 1, 2023
fb57baf
feat(inverted_index.search): add index applier
zhongzc Dec 1, 2023
65efce3
refactor: bitmap_full -> bitmap_full_range
zhongzc Dec 1, 2023
98e1a40
feat: add check for segment_row_count
zhongzc Dec 2, 2023
654aa98
fix: remove redundant code
zhongzc Dec 2, 2023
c906353
fix: reader test
zhongzc Dec 4, 2023
83fb712
Merge remote-tracking branch 'origin/develop' into zhongzc/index-fst-…
zhongzc Dec 4, 2023
40534ab
chore: match error in test
zhongzc Dec 4, 2023
eef919f
fix: fmt
zhongzc Dec 4, 2023
08fd331
refactor: add helper function to construct fst value
zhongzc Dec 4, 2023
63a6fb9
Merge remote-tracking branch 'zhongzc/zhongzc/index-fst-values-mapper…
zhongzc Dec 4, 2023
52b3ae4
refactor: polish unit tests
zhongzc Dec 4, 2023
69cf378
refactor: bytemuck to extract offset and size
zhongzc Dec 4, 2023
65e47b7
fix: toml format
zhongzc Dec 4, 2023
42b9c21
Merge remote-tracking branch 'zhongzc/zhongzc/index-fst-values-mapper…
zhongzc Dec 4, 2023
6c8a2ed
refactor: use bytemuck
zhongzc Dec 4, 2023
6b59dc9
refactor: reorg value in unit tests
zhongzc Dec 4, 2023
7488fa2
chore: update proto
zhongzc Dec 4, 2023
480f1c4
Merge remote-tracking branch 'origin/develop' into zhongzc/index-applier
zhongzc Dec 4, 2023
094b3e4
Merge remote-tracking branch 'origin/develop' into zhongzc/index-applier
zhongzc Dec 5, 2023
6a89045
chore: add a TODO reminder to consider optimizing the order of apply
zhongzc Dec 5, 2023
ba48218
refactor: InList predicates are applied first to benefit from higher …
zhongzc Dec 5, 2023
3c5b58c
chore: update proto
zhongzc Dec 5, 2023
33610c5
Merge remote-tracking branch 'origin/develop' into zhongzc/index-applier
zhongzc Dec 5, 2023
d87becb
feat: add read options to control the behavior of index not found
zhongzc Dec 5, 2023
ec319b0
refactor: polish
zhongzc Dec 5, 2023
91a1338
refactor: move read options to implementation instead of trait
zhongzc Dec 5, 2023
3f444a4
feat: add SearchContext, refine doc comments
zhongzc Dec 5, 2023
ccac231
feat: move index_not_found_strategy as a field of SearchContext
zhongzc Dec 5, 2023
c05c99e
chore: rename varient
zhongzc Dec 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ etcd-client = "0.12"
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "2aaee38de81047537dfa42af9df63bcfb866e06c" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "b1d403088f02136bcebde53d604f491c260ca8e2" }
humantime-serde = "1.1"
itertools = "0.10"
lazy_static = "1.4"
Expand Down
10 changes: 9 additions & 1 deletion src/index/src/inverted_index/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ pub enum Error {
payload_size: u64,
},

#[snafu(display("Unexpected zero segment row count"))]
UnexpectedZeroSegmentRowCount { location: Location },

#[snafu(display("Failed to decode fst"))]
DecodeFst {
#[snafu(source)]
Expand Down Expand Up @@ -109,6 +112,9 @@ pub enum Error {
location: Location,
predicates: Vec<Predicate>,
},

#[snafu(display("index not found, name: {name}"))]
IndexNotFound { name: String, location: Location },
}

impl ErrorExt for Error {
Expand All @@ -118,6 +124,7 @@ impl ErrorExt for Error {
Seek { .. }
| Read { .. }
| UnexpectedFooterPayloadSize { .. }
| UnexpectedZeroSegmentRowCount { .. }
| UnexpectedOffsetSize { .. }
| UnexpectedBlobSize { .. }
| DecodeProto { .. }
Expand All @@ -128,7 +135,8 @@ impl ErrorExt for Error {
| ParseDFA { .. }
| KeysApplierWithoutInList { .. }
| IntersectionApplierWithInList { .. }
| EmptyPredicates { .. } => StatusCode::InvalidArguments,
| EmptyPredicates { .. }
| IndexNotFound { .. } => StatusCode::InvalidArguments,
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/index/src/inverted_index/format/reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ use crate::inverted_index::FstMap;
/// InvertedIndexReader defines an asynchronous reader of inverted index data
#[mockall::automock]
#[async_trait]
pub trait InvertedIndexReader {
pub trait InvertedIndexReader: Send {
/// Retrieve metadata of all inverted indices stored within the blob.
async fn metadata(&mut self) -> Result<InvertedIndexMetas>;

Expand Down
6 changes: 5 additions & 1 deletion src/index/src/inverted_index/format/reader/blob.rs
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,11 @@ mod tests {
};

// metas
let mut metas = InvertedIndexMetas::default();
let mut metas = InvertedIndexMetas {
total_row_count: 10,
segment_row_count: 1,
..Default::default()
};
metas.metas.insert(meta.name.clone(), meta);
metas.metas.insert(meta1.name.clone(), meta1);
let mut meta_buf = Vec::new();
Expand Down
16 changes: 10 additions & 6 deletions src/index/src/inverted_index/format/reader/footer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ use snafu::{ensure, ResultExt};

use crate::inverted_index::error::{
DecodeProtoSnafu, ReadSnafu, Result, SeekSnafu, UnexpectedFooterPayloadSizeSnafu,
UnexpectedOffsetSizeSnafu,
UnexpectedOffsetSizeSnafu, UnexpectedZeroSegmentRowCountSnafu,
};
use crate::inverted_index::format::FOOTER_PAYLOAD_SIZE_SIZE;

Expand Down Expand Up @@ -85,6 +85,11 @@ impl<R: AsyncRead + AsyncSeek + Unpin> InvertedIndeFooterReader<R> {

/// Check if the read metadata is consistent with expected sizes and offsets.
fn validate_metas(&self, metas: &InvertedIndexMetas, payload_size: u64) -> Result<()> {
ensure!(
metas.segment_row_count > 0,
UnexpectedZeroSegmentRowCountSnafu
);

for meta in metas.metas.values() {
let InvertedIndexMeta {
base_offset,
Expand Down Expand Up @@ -116,7 +121,10 @@ mod tests {
use super::*;

fn create_test_payload(meta: InvertedIndexMeta) -> Vec<u8> {
let mut metas = InvertedIndexMetas::default();
let mut metas = InvertedIndexMetas {
segment_row_count: 1,
..Default::default()
};
metas.metas.insert("test".to_string(), meta);

let mut payload_buf = vec![];
Expand All @@ -131,7 +139,6 @@ mod tests {
async fn test_read_payload() {
let meta = InvertedIndexMeta {
name: "test".to_string(),
segment_row_count: 4096,
..Default::default()
};

Expand All @@ -145,14 +152,12 @@ mod tests {
assert_eq!(metas.metas.len(), 1);
let index_meta = &metas.metas.get("test").unwrap();
assert_eq!(index_meta.name, "test");
assert_eq!(index_meta.segment_row_count, 4096);
}

#[tokio::test]
async fn test_invalid_footer_payload_size() {
let meta = InvertedIndexMeta {
name: "test".to_string(),
segment_row_count: 4096,
..Default::default()
};

Expand All @@ -171,7 +176,6 @@ mod tests {
name: "test".to_string(),
base_offset: 0,
inverted_index_size: 1, // Set size to 1 to make ecceed the blob size
segment_row_count: 4096,
..Default::default()
};

Expand Down
1 change: 1 addition & 0 deletions src/index/src/inverted_index/search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@

pub mod fst_apply;
pub mod fst_values_mapper;
pub mod index_apply;
pub mod predicate;
1 change: 1 addition & 0 deletions src/index/src/inverted_index/search/fst_apply.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ use crate::inverted_index::FstMap;

/// A trait for objects that can process a finite state transducer (FstMap) and return
/// associated values.
#[mockall::automock]
pub trait FstApplier: Send + Sync {
/// Retrieves values from an FstMap.
///
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ impl KeysFstApplier {
fn get_list(p: &Predicate) -> &HashSet<Bytes> {
match p {
Predicate::InList(i) => &i.list,
_ => unreachable!(), // `in_lists` is filtered by `split_at_in_lists
_ => unreachable!(), // `in_lists` is filtered by `split_at_in_lists`
}
}

Expand Down
32 changes: 32 additions & 0 deletions src/index/src/inverted_index/search/index_apply.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

mod predicates_apply;

use async_trait::async_trait;
pub use predicates_apply::PredicatesIndexApplier;

use crate::inverted_index::error::Result;
use crate::inverted_index::format::reader::InvertedIndexReader;

/// A trait for processing and transforming indices obtained from an inverted index.
///
/// Applier instances are reusable and work with various `InvertedIndexReader` instances,
/// avoiding repeated compilation of fixed predicates such as regex patterns.
#[async_trait]
pub trait IndexApplier {
/// Applies the predefined predicates to the data read by the given index reader, returning
/// a list of relevant indices (e.g., post IDs, group IDs, row IDs).
async fn apply(&self, reader: &mut dyn InvertedIndexReader) -> Result<Vec<usize>>;
zhongzc marked this conversation as resolved.
Show resolved Hide resolved
}
Loading
Loading