-
-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling dump on a SignedData object took 30 seconds #256
Comments
Hi, Also, the result is only 14151 bytes, your My system: Python 3.11.2 (Debian), asn1crypto from git. Could you provide more info about your system (OS, Python version, asn1crypto version) and the needed steps to reproduce your issue? So far I'm not convinced that there is an issue. |
Yes, I said I also failed to reproduce the issue locally in particular I cannot - in a reasonable timeframe - have the same setup as where the problem is, but through ad-hoc debugging I narrowed it down to asn1crypto dump function. From what I read in the source code loading and dumping will have significantly different logic than creating an object and dumping. But creating would be a lot of manual labour that I am not willing to do. Note that reconstruct.py dump does not perfectly match the cms as given in cms.txt Versions where the problem exists: That being said I do not think the problem is with the os, because I did explicitly check that the dump function gets called too many times. The asn1crypto is used through pyHanko. I understand that this ticket is not very actionable as a bug report without clear steps to reproduce the problem. I was hoping that someone with reasonably good understanding of asn1crypto internals would magically know where the problem is. Since this is not the case you can close this one and I will open a new one when I have time to investigate in-depth. Alternatively you can take this to be feature request and consider cacheing anyway. |
Dump being called 200_000 times is very strange. You could record the class and optionally the ID of the object being dumped and then determine which class/object is dumped how often. |
Seconding @joernheissler's PoV; both pyHanko and Hypothesis: 200000 is a lot, but given that If you can find a way to reliably reproduce the slowness with the pyHanko API, feel free to start a discussion over there. I may be able to assist in narrowing down the root cause once you give me a way to reproduce the problem. TL;DR: |
I managed to reproduce locally and it seems the problem is not with pyHanko. But pyhanko will be needed to reproduce the problem. Attaching zip with two files. One very interesting thing I noticed is that when the |
Also calling dump for the second time will take twice as long as as the first time. |
Here's a simpler reproducer that sidesteps pyHanko (pardon the wall of text):
This takes 20 seconds on my machine, whereas @peteris-zealid's example only takes about 7. Sure, I used EDIT: hang on, the process is suddenly massively faster (<0.01s) if I omit the |
I haven't been able to pin down exactly why, but for some reason It's not the full story, but I hope it's a useful start. |
Thanks, that's very useful.
|
I might be a step closer: But a code change like this is risky, I didn't think it through yet. |
Hoping to take a look at this soon. Sounds like an interesting issue to debug. |
Yeah, I ran line profiler on above and saw |
I tried really hard to make a reproducible example, but failed at that. While investigating I did write some interesting code.
The problem seems to be related to a specific certificate. In most cases the dump function works fine.
While investigating I noticed that the dump method was called around 200000 times. The recursion depth was not big it peaked at depth of 11 which is completely normal. I could not figure out from reading the code where exactly the other dump calls were coming from but I suspect it is from x509.py
I changed the dump code to cache the results it returns and that solved the problem. I understand that cacheing is not there for two reasons 1) the objects are mutable and dump can return different results 2) cacheing would create unneeded memory consumption where in most cases it is not needed.
Since there is (as far as I can tell) a trade off between memory consumption and performance I would suggest to have some sort of flag that would allow cacheing the result of the dump function.
Here is some code I wrote to try to reproduce the problem. It worked well but slight manual editing was needed to make it construct the python object. Only removing the parts that did not de-serialize was needed. So the python object was created, but the slow serialization was not reproduced.
asn_to_py.patch
reconstructed.py.txt
Here is the particular CMS.
cms.txt
The text was updated successfully, but these errors were encountered: