Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase default max allowed String value length from 20 megs to 1 gig #1082

Closed
dan2097 opened this issue Aug 20, 2023 · 7 comments
Closed

Comments

@dan2097
Copy link

dan2097 commented Aug 20, 2023

The expected out of the box behaviour of Jackson is that it can read JSON, and the JSON specification does not specify a maximum length. Violating this expectation should only be done with extremely good reason. (EDIT: For balance, RFC 7159 does say "An implementation may set limits on the size of texts that it accept")

The previous security related limits kick-in in situations where JSON is being used in odd ways e.g. extreme nesting depth, and hence there would typically be a more idiomatic way to write the JSON. In the case of having a large amount of data this will inevitably result in a correspondingly large JSON file

One takeway for me, too, is that I should have heeded my own "never add a feature just for sake of consistency; only implement things that have been specifically requested".
That'd have worked well here, addressing just number length & nesting depth -- both of which have specific known threats.

Having the ability to set a maximum string length is a good change, but it's not obvious to me why the default should differ from the JSON specification if there's not a specific security concern.

As a compromise I propose 1 gig, based rather arbritrarily on #1014 (comment) and the limit in MySQL https://dev.mysql.com/blog-archive/how-large-can-json-documents-be/ but I think a default limit of unlimited is defensible. For more than a gigabyte a user would almost certainly want to stream the JSON, although as computers continue to gain memory, the amount of JSON that can reasonable be read into an in-memory tree continues to increase.
This would make the limit more analogous to the other security limits in that it would only kick-in for what is an unusual use case. Loading 100 MB of JSON on a machine with over 10 gigabytes of ram to me doesn't seem so unusual...

@pjfanning
Copy link
Member

The limits are configurable. I'm afraid you will need to configure StreamReadConstraints and use numbers that suit your use case. We can't just keep changing the limits in every release.

@dan2097
Copy link
Author

dan2097 commented Aug 20, 2023

The limits are configurable. I'm afraid you will need to configure StreamReadConstraints and use numbers that suit your use case. We can't just keep changing the limits in every release.

I think your points are slightly contradictory; if users that have specific use cases configure the limit they would be unaffected by any change to the default limit in a subsequent release.

Given that this limit is a breaking change, and there isn't an associated CVE, personally I don't think it should have been introduced in a minor release of the library in the first place.

In a world where even web browsers can read over 100 MB of JSON (https://www.joshzeigler.com/technology/web-development/how-big-is-too-big-for-json) I remain of the opinion that having a backend library default to 20 MB is bizarre, and that you'll get spurious bug reports as users upgrade.

@cowtowncoder
Copy link
Member

Will not change to 1 gig, closing.

If someone truly needs to have JSON STRING VALUES -- not JSON Document per se -- above 20 megs, they need to configure limits appropriately.

@dan2097
Copy link
Author

dan2097 commented Aug 21, 2023

Will not change to 1 gig, closing.

If someone truly needs to have JSON STRING VALUES -- not JSON Document per se -- above 20 megs, they need to configure limits appropriately.

Sorry I erroneously thought this was a limit on the length of the input JSON string.
...as this would make sense as a quick estimate for the amount of the memory the JSON will take when deserialized,
but a limit on the individual JSON string values obviously doesn't achieve this e.g. JSON with a 25 MB string is disallowed, but a document with a thousand 5 MB strings is allowed!
(and in a streaming scenario, unless you're running in an extremely low memory environment, then 20 MB still isn't that much)

Does Jackson's performance degrade significantly more than would be expected with larger strings compared to many smaller strings?

@cowtowncoder
Copy link
Member

Jackson does not read the whole input document into memory at all, assuming input source is InputStream, Reader or such. Hence maximum document size may not be a big issue in and of itself. That said, there should be configuration limit for max input doc size too (with default value of "unlimited", I think... based on experiences we've gained :) ).

As to Big Strings: it's more a general JVM challenge as Strings need to have big backing char[] (or maybe with later JVMs, byte[] for ASCII?) arrays.
Jackson does not deeply care; input is handled using segmented buffers. It's really the resulting string that might be problematic esp. for highly concurrent systems.
I agree that a single 20 meg String should not be a problem, but typically servers have more than a single request to server (wrt DoS concerns at least).

But why would large arrays be problematic? (over a set of smaller ones that add up to same memory amount).
My understanding is that they can become problems for GC as there is certain size of Objects that will be directly allocated from old generation, increasing pressure for full GCs over incremental ones (for young gen).

@dan2097
Copy link
Author

dan2097 commented Aug 24, 2023

The severity of this issue still seems a bit fuzzy to me, compared to the more concrete fact that this was a breaking change in a minor release (technically even going to a 1 gig limit would still be breaking albeit only a fraction of the fraction of users that use JSON with large strings would be impacted). Personally I think this should have required a version bump to v3 given how fundamental JSON reading is to the library. Introducing new limits with a default value of unlimited avoids any backward compatability issues.

It may be worth putting the default limits on https://github.com/FasterXML/jackson-core (or at least in the wiki) in the same way as https://github.com/ngs-doo/dsl-json does (which incidentally also has a string limit, albeit a higher one).

@cowtowncoder
Copy link
Member

There are all kinds of changes that turn out to be breaking changes to someone, somewhere. While I emphatize with breakages (and would not want cause any), I think this issue is overblown: most users, developers are not affected in any way. But ones who are always most vocal.

I agree that perhaps this particular limit would have been safest to leave undefined -- in practice there is the 2 gig limit due to String and Java array limit, fwtw -- I would not think change would warrant 3.0 bump in itself... but as we already have 3.0 (master), change to actual limitation definitely could have (in hindsight) waited for that.

Now that limit has been imposed, however, I see no benefit from backpedaling. Issue exists with some 2.15 versions and 2.15.2 will represent the new normal.

+1 for documentation changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants