布隆过滤器
每个SST的bloom filter，使用布隆过滤器可以大幅减少不必要的磁盘IO。在bits_per_key为10的情况下，bloom filter错误率约为1%

rocksdb::ColumnFamilyOptions cf_opt;
rocksdb::BlockBasedTableOptions table_options;
table_options.filter_policy.reset(rocksdb::NewBloomFilterPolicy(/*bits_per_key*/10, /*use_block_based_builder*/false));
cf_opt.table_factory.reset(rocksdb::NewBlockBasedTableFactory(table_options));

Ribbon Filter
全称是Rapid Incremental Boolean Banding ON the fly ，能够基于真值矩阵用增量方式构建过滤器。相比于传统的Bloom Filter 来说，它的优势主要是对空间更友好，压缩率更高，相比布隆过滤器节省1/4空间。但是会消耗更多的CPU(构建和查询)，因为Ribbon filter 需要针对二维矩阵的后台计算（高斯消元），相比于传统一维数组来说计算量更大。
```
rocksdb::ColumnFamilyOptions cf_opt;
rocksdb::BlockBasedTableOptions table_options;
table_options.filter_policy.reset(rocksdb::NewRibbonFilterPolicy(bloom_equivalent_bits_per_key, /*bloom_before_level*/ 0));
cf_opt.table_factory.reset(rocksdb::NewBlockBasedTableFactory(table_options));
```

Partitioned Index Filters(分区索引缓存)

1.更高的缓存命中率:分区允许以更细的粒度加载索引/过滤器，从而有效地利用缓存空间，而不是用大索引/块污染缓存空间
2.更少的IO util:当索引/过滤器分区缓存丢失时，只需要从磁盘加载一个分区，这与读取SST文件的整个索引/过滤器相比，磁盘负载更轻

rocksdb::BlockBasedTableOptions table_options;
table_options.block_cache = rocksdb::NewLRUCache(100 * 1048576); // 100MB uncompressed cache
table_options.index_type = rocksdb::BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch;
table_options.partition_filters = true;
table_options.metadata_block_size = 4096;
table_options.cache_index_and_filter_blocks = true;
table_options.cache_index_and_filter_blocks_with_high_priority = true;
table_options.pin_top_level_index_and_filter = true;
table_options.pin_l0_filter_and_index_blocks_in_cache = true;

Block Cache使用Hash索引在 Data Block 上使用 Hash 索引来提升点查询的效率。二分查找会导致 CPU Cache Miss，增加 CPU 使用率。官方的测试数据显示：该特性可降低 21.8% 的 CPU 利用率，提升 10% 的吞吐，但会增加 4.6% 的空间占用。

rocksdb::BlockBasedTableOptions table_options;
// 默认 table_options.data_block_index_type = DataBlockIndexType::kDataBlockBinarySearch;
table_options.data_block_index_type = DataBlockIndexType::kDataBlockBinaryAndHash;
// 如果此值小的话，说明 Hash 桶多，冲突就比较小。
table_options.data_block_hash_table_util_ratio = 0.75;

前缀过滤
当设置了Options.prefix_extractor，前缀哈希会被加入到bloom中。优点是节省内存，缺点是会导致更高的假阳性概率。另外，在使用Seek和SeekForPrefix的时候，前缀bloom也可以用（通过在构建迭代器的时候设置check_filter），而全键过滤只能被用于点查询
```
Options options;
options.prefix_extractor.reset(NewFixedPrefixTransform(3));
```

注意：iterator 顺序 scan 所有的数据，需要将 total_order_seek 设置为 true，不然 scan 出来的数据顺序就不对

ReadOptions read_options;
read_options.total_order_seek = true;
auto iter = db->NewIterator(read_options);
Slice key = "foobar";
iter->Seek(key);  // Seek "foobar" in total order

optimize_filters_for_hits
True 表示关闭 LSM 最底层的 Bloom Filter。这个选项主要是因为最底层的 Bloom Filter 总大小比较大，比较占用 Block Cache 空间。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05.Index-Filter.md

05.Index-Filter.md

Files

05.Index-Filter.md

Latest commit

History

05.Index-Filter.md

File metadata and controls