Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexes for key/value tables in layers #30

Open
joto opened this issue Nov 15, 2017 · 2 comments
Open

Indexes for key/value tables in layers #30

joto opened this issue Nov 15, 2017 · 2 comments

Comments

@joto
Copy link
Collaborator

joto commented Nov 15, 2017

This issue documents the current state of affairs concerning indexes for key/value tables. Other issues can refer back to this for context.

When building a new layer you need to populate the key/value tables while adding new properties. There are several ways of doing this:

  1. Use the builtin index. This is the easiest choice, but performance isn't great. It uses a flat vector with linear search for small numbers of entries or an std::unordered_map when there are more. Some quick benchmarks show that the current value of 20 entries beyond which the implementation switches to std::unordered_map is reasonable.
  2. Use the property_mapper. This is the best choice when copying some features (or some properties of some features) from one existing layer, it is more than twice as fast as the builtin index.
  3. Use one of the supplied indexes in index.hpp.
  4. Do everything yourself. You know the data best that you are adding and can choose the right strategy.

Medium term we should think about a better implementation for (1), but this needs more benchmarks with real data and different implementations to find the best one. Because this is hidden from the user of the library, we can always improve on this later.

It is unlikely that we'll find a much better approach for (2) than the current one. But this is only usable in very specific circumstances.

We can always add to (3), for instance adding vector-based flat maps with linear search.

@springmeyer
Copy link
Contributor

It is unlikely that we'll find a much better approach for (2) than the current one. But this is only usable in very specific circumstances.

👍 property_mapper has been working great. /cc @GretaCB here as this ticket would be a nice place to link back to from your recent changelogs mentioning property_mapper.

vector-based flat maps

Interested to hear more about what you mean by flat maps? Are you revering to the optimized/cpu cache/locality friendly efforts to provide alternative containers? I just checked to see if the google one is open sourced yet, but don't see it in https://github.com/abseil/abseil-cpp/tree/master/absl/container. Some others I've noticed are detailed at https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html.

@joto
Copy link
Collaborator Author

joto commented Jan 30, 2018

What I meant with "flat maps" here is basically a vector of key-value-pairs. (An alternative is to use two vectors, one for keys, one for values and keep them in sync.) The vector can be either unsorted, which means linear search and works well for small maps, or sorted. Looking up values in a sorted vector is very simple, space efficient and fast. The drawback is that adding to the data structure means re-sorting it. So this only makes sense if you can first add all entries, then sort them, then do lots of lookups. In our context this might make sense if you have layers which always have the same keys and values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants