-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can size
be made O(1)?
#138
Comments
I've thought about it in the past but haven't found time to work on it. Basically we'd have to wrap the current
All functions that potentially change the size would have their internal implementations return an
The normal The question is how much this costs in terms of performance and implementation complexity. Someone would have to try it out. |
Okay, thanks! If someone at my work tries it out, I'll post an update here. |
@tibbe I've started working on this, and I've gotten to
which is required by I think changing the
similar to what is done for What do you think about this? |
I think that makes the constructors bigger, which is a definite performance trade-off. That doesn't mean we shouldn't do it, but it's not free. Benchmark! Just what sort of trouble did you run into with the union? I'd fear more trouble from intersection and difference operations, but I still don't know much about these structures. |
@treeowl I forgot to take performance into consideration, you're right. A bit of background: as a first step I've added copies of every function that may change a hashmap's size and suffixed their names with Internal - this will be for the final refactoring, or at least that's my plan. This line in and the cases after it in particular (rockbmb@5a6f098#diff-11321ff356efb0247b49763554dc8620R1176) are where I think this approach becomes complex to implement: because go no longer returns a HashMap and now returns (Int, HashMap), a new unionArrayBy function is required such that its first argument is of type (a -> a -> (Int, a)) and not (a -> a -> a) The reason I'm concerned is because unionArrayBy is used elsewhere, and duplicating this nontrivial function for what is not even the hardest part of the task this early on is a bit of a concern, in my opinion. I'll continue with this approach until I have definite proof it won't work. As such, considering the alternative might not be a waste of time. |
I don't think storing the size in the tree is worth it. It will affect performance of much more than size (and in an insidious way, by increasing GC times). Per-element overheads for Haskell data structures are already a bit too high. |
If O(1) is not viable, is there any way to make size O(logn) at least? |
@pepeiborra it's been a few calendars since I've last attempted to work on this, but from what I recall, it's straightforward to make it O(1) (making sure there are no regressions, not so much), while it's unclear to me right now how to make it O(log n) in a way that makes it an improvement over that approach. |
Here is an idea for storing the sizes in the tree without increasing node sizes (by much): unordered-containers/Data/HashMap/Internal.hs Lines 207 to 212 in b6bde46
|
Interesting .... I'm a bit nervous about having a different big-O for different word sizes. Should we use 64-bit always to work around that? On the flip side, if we don't to this, would we (now or in the future) benefit by narrowing that field to a The |
Could you expand a bit on this? Maybe give an example?
That would be one possibility. Alternatively we could make B ( But frankly I'm entirely unconcerned about performance on 32-bit systems. I think just about anyone who cares about performance will use a 64-bit system.
I don't believe we can save space by this. If we can, I'd like to hear about it. I think it might be useful for constant folding in some rare cases. Roughly speaking, GHC could possibly simplify
Yeah, at least after inlining, the generated code might be a good bit larger.
For some of the participants in this thread they seem to be quite important. I also agree with this bit from the issue description:
There's a similar confusion about the complexity of |
It is somewhat unfortunate that
size
is O(n) forHashMap
andHashSet
:If a faster
size
is desired, I have to store it separately in my code and make sure that I update it whenever I do aninsert
ordelete
. With filtering functions,union
, etc. it gets even more cumbersome.Even experienced programmers often assume that
size
is O(1) without checking, which leads to performance bugs :((I have looked at the code, but I have to confess that I don't understand how
HashMap/Set
work and so it's hard for me to assess how much of a performance/memory penalty adding asize
field to the constructors would be, or whether it is possible at all.)The text was updated successfully, but these errors were encountered: