-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API to get size of compiled regex #943
Comments
Could you elaborate on this? If you're processing untrusted regexes, then wouldn't |
@BurntSushi I'm referring to a case where an untrusted user submits a number of patterns individually, and you want to make sure they do not exceed a "budget" for all of them. For example if you wanted to set a limit of 10,000 bytes and to allow the users to submit over 100 patterns, you'd need to set the per-pattern limit to 100 bytes, which is not ideal. |
I can probably make this happen once #656 lands. Although, the precise relationship between the size limit and the bytes reported by this function may be tricky to get right. It probably does make sense from a certain perspective, but the problem is in how the limits are enforced. The limit tends to be enforced on a per-internal-NFA-graph basis, where as the natural thing to do for an "approximate memory usage" function would be to report the sum of everything using heap memory. Then there is the mutable scratch space to think about as well. Now we could just say, "no the approximate memory usage function should be specifically scoped to precisely what |
Perhaps a clearer name would be |
This API would have been very helpful for me. In my case, I had a sample of a few thousand regex patterns that would be used, and I needed to determine what size limit should be used. To do that, I wanted to know how large all of these sample patterns were to see how close to the limit they were. I did it with a binary search of the builder (to see if it compiled or not) but that obviously wasn't ideal. |
@fuchsnj I don't think it will ever be possible for an API like the one you want to exist in such a granular way. The size limit really is just an approximation. And the binary search thing that you've done is exactly what I do in such cases as well. (Although I tend to just sent a very high size limit if I'm trying thousands of patterns. You're likely in for a pretty bad time in that case no matter what you do.) |
As an extension of the
RegexBuilder::size_limit
function, it might be useful to know the actual size taken by a compiled regular expression. A possible use for this is for when you want to process a number of untrusted regular expressions, and be assured that collectively they don't exceed a limit.I propose that the size of the compiled program (the same one used when checking against the
size_limit
of the builder) be stored inside a regular expression struct, and be accessible through a new functionRegex::approximate_size
.The text was updated successfully, but these errors were encountered: