-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TOML support #248
TOML support #248
Conversation
FWIW the Toml GitHub repository shows MIT as license. You can also see MIT at the bottom of https://toml.io/en/ |
Excellent. I haven't yet looked at code (will get to that later), but I can give some suggestions in the meantime.
|
I was able to build this after adding a copy of PackageVersion.java.in from the Now to some play :-) |
I've changed the code to do state transitions inside the parser instead of the lexer now, since the parser has contextual information. |
One more though on Date/Time: while embedding value is easy enough, there are potential challenges on doing that at streaming parser level, as most handling usually is at jackson-databind and not below.
And so I guess the alternative would be to simply exposing these values as Also note on Jackson 3.0.0-SNAPSHOT -- this may be somewhat volatile due to my still making wide-ranging changes in non-atomic manner. If so, just ping me. I will try to keep snapshots working of course but may sometimes miss things for a bit until finding out problems. |
@cowtowncoder I'm not sure about the config parts. The TOML datetime types are fixed in the grammar (see the ABNF), so allowing customization of the parsing does not make sense. The format is also limited enough so that we can just do LocalDateTime.parse and friends, without having to implement much decoding ourselves (except for handling a space instead of the T separator). You're right about interaction with the deserializer though. What would happen if a user has an Instant in their data model, but the parser returns an OffsetDateTime with UTC+0 for example? InstantDeserializer handles this fine if it's a string, but I doubt it can do the conversion if it's a raw value instead. Another thing is symmetry with the serializer... Not sure about that. |
@yawkat Right, I don't mean changing definition of low-level data semantics at all, just one technical aspect relevant to piping within Jackson pipeline. If parser exposes something as I think the one switch that might prove useful is really just "expose as String vs Java 8 Date/Time" and nothing more (and no extension points for more complex handling). But for now that's just an idea, no need to implement. Probably best to do parsing to strict type first and see well that works. Should be easy to change if need be. |
Re using something else than ObjectNode as the intermediate: I got away with creating a subclass of ArrayNode and ObjectNode with additional boolean flags. That sounds like a reasonable solution, doesn't require creating a whole class structure. Does remove the possibility of a custom JsonNodeFactory though, but that's not so bad. @cowtowncoder re Temporal: Adding an option sounds reasonable and is no difficulty, I've gone ahead and done that. Has the added benefit of making the related tests work :) There's also an option for Long/Double-based number parsing now. |
@yawkat Ok: if using a custom sub-class, maybe we'll make sure this model is not directly exposed to users and is just used for internal processing. Node system is designed as extensible to allow adding arbitrary data, although that idea was never fully developed as it turns out rather tricky to make actually work (there is I probably should look at code first to make sure I understand the implementation, before more comments. :) But progress sounds good so far! |
@cowtowncoder I've created a read feature flag for java.time parsing. This works fine for parsing a field of type Object, which is then set to LocalDate as expected, but for a field of type LocalDate, it errors:
This is obviously because there's no deserializer for LocalDate, and it looks for one even before seeing that the next token is a |
@yawkat Interesting... the "safety" feature of blocking FasterXML/jackson-modules-java8#207 I can probably figure something out, will file an issue. My first instinct is to change handling so that instead of failing on lookup, code should probably construct "potentially failing" deserializer that can handle Luckily 2.13 requires Java 8, so testing should be easy. I'll first file an issue, and then hopefully have time to tackle it. |
Also: sort of vaguely related: a new toy project I wrote (but not yet published): https://github.com/cowtowncoder/Jackformer allows "blind" NxN transformations and if we get this going, I'd really like to check out "YAML to TOML" case :) |
toml/src/main/java/com/fasterxml/jackson/dataformat/toml/TomlFactory.java
Outdated
Show resolved
Hide resolved
toml/src/main/java/com/fasterxml/jackson/dataformat/toml/TomlFactory.java
Outdated
Show resolved
Hide resolved
toml/src/main/java/com/fasterxml/jackson/dataformat/toml/TomlMapper.java
Outdated
Show resolved
Hide resolved
*/ | ||
public enum TomlReadFeature | ||
implements FormatFeature { | ||
USE_BIG_INTEGER_FOR_INTS(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe brief javadoc explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure if there's a better way to do this. e.g. the heuristic for BigInteger vs smaller int types that JsonParser uses
long v; | ||
try { | ||
if (text.startsWith("0x") || text.startsWith("0X")) { | ||
v = Long.parseLong(text.substring(2), 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to extract this, as well as cases above, to call a helper method that takes String and base, but then also catches NumberFormatException
and range checks that JDK throws. This because JDK exceptions are pretty unhelpful if I remember, and we want to change these into something more meaningful.
It is unfortunate that helper methods from ParserMinimalBase
are not available (as we can't quite use that without implementing token stream, something that JsonNode provides us), like:
_reportOverflowLong()
_reportInvalidNumber()
etc.
These are probably easier to add when there are tests for overflow, misformatted numbers etc.
Looks good so far. Some suggestions:
I'd be happy to merge this whenever; probably makes sense to have a generator impl first? |
I just added a test case for the format from https://github.com/toml-lang/compliance . However, I did not add any actual test files. There are some in forks of that repo, but they still have outstanding issues: For example, sometimes temporal types have their precision truncated, and sometimes strings are double-escaped. |
@cowtowncoder Re BigI/D: I'm still not sure how to proceed here. Precedent in JsonParser is to determine whether a number is in range for int/long and use those types where possible. Maybe copying that behavior is a better solution than adding parser options for it. I'd have to adapt my tests to work with different number types, but that's a small issue. Re splitting up ParserTest: Right now, 90% of those tests are copied examples from the manual, not sure by which criterium those could be split. Re generator: Easy to implement in principle. Some decisions that will come up:
|
I've added some exception message testing. I used the junit ExpectedExceptions mechanism, since it also allows checking the message. I may also try to add some more meaningful parse errors for some of these. |
Okay, I've built a generator, mostly copied from the properties module. Not many tests yet, but seems to work. Right now, it emits arrays inline, and objects as dotted keys (except when inside arrays), because that was straight-forward. Newlines are added where necessary (namely after every dotted key value pair). There is no indentation. Other whitespace is "normal", i.e. spaces around key value separators and after commas. Tables are problematic, and may not be possible using a streaming generator. This is because when you use a table, you lose any ability to add scalar values to the root object. |
Ah. Yes, makes sense that tests were copied with minimal changes, good way to start. I can probably help with number handling: agreed that for now not critical. I'd probably suggest something like:
On generator, good questions. With 3.0, in particular, should be possible to simply use Then again... when serializing with ObjectMapper, I think underlying generator will NOT see anything but On Arrays... yeah, starting with inline sounds reasonable as the starting point. |
Ok good progress wrt generators! I think that getting reading/parser right is probably the most valuable thing anyway; and getting some valid output (even if missing configurability and ability to produce constructs like tables) is better than not having anything. Question then would be when to merge. I am happy to give you push access to this repo to simplify changes. I think I would like a 2.13 backport too, and can help getting that to work. Just need to think of how to make things then forward-mergeable (2.13 -> master) when initial merge is the "Wrong way around". |
Implemented in bfc3ff3. It sounds like the java.time serialization will take more work, so I'll leave that be for now. Not like it's all that important anyway, especially for the serialization part. One todo I still have in the source is newline handling. The TOML spec says for multi-line string parsing:
I think it's fine to just not do that for now. Maybe an optional feature in the future. Beyond that, the only thing I can think of before merging is the license question. Since TOML is MIT, and the samples for the unit tests are from their website, I guess a LICENSE file in the project folder (not in the pom / final artifact) should be sufficient. For 2.13, I can do the backport, may take a few days before I have time though. I think the branch merge should be fairly painless, there's no conflict with other modules after all, so it should be possible to just commit either the 2.13 or the 3.0 changeset and then add the other version as the merge commit for the branches. |
Rebased onto master for ContentReference change |
@cowtowncoder I've created https://github.com/yawkat/jackson-dataformats-text/tree/toml-2.13 , which is a squashed version of this PR cherry-picked onto 2.13, with an additional commit 4df60a4 that addresses the compat issues. The tests pass, but I did a fairly stupid "fix errors as they come up" approach, so there may still be stuff that's broken. |
@yawkat Ok; so how should I merge these? Does ordering matter. I think I'm happy as long as nothing there blocks work on other format modules (that is, passing of test suite by whatever means is important :) ) |
Sorry, a bit busy with moving atm :) My suggestion would be merging this branch into master manually (should be safe), then merging master into 2.13 (should be no conflicts), and finally cherry picking 4df60a4 to move the toml module to 2.13. I think this shouldn't impact the ability to merge future changes in this repo between 2.13 and master. |
@yawkat no prob wrt moving. I am pretty overload with work right now too. Question on merging: not sure on meaning of manual here -- do you mean just copying new files from here? Merging from |
By 'manual' I just mean doing it locally instead of through github web, in case it breaks somehow :P I'm not sure about the mechanics of merging. Will merging master into 2.13 include the changes to other modules as well, even if you exclude them from the merge commit? Not sure how that works. Anyway, merging in the other direction could work too. I've done it over in my repo. The layout is like this: The "TOML data format" commit 91625f5 is all the squashed changes in this PR. The "2.13 compatibility" commit fefd694 changes the code to 2.13, and that is the HEAD of the toml-merge-2.13 branch. Then, that branch is merged into toml-merge-master in commit 58239b4, and the 2.13 compat commit is reverted in "3.0 compatibility for toml module" (5d479e8). Squashing the dev history of this PR makes handling the merge easier, and it's not a big issue. The duplicate compat commits are a bit ugly, but it works. Both branches |
Right, local in that sense works: I pretty never use github gui for anything except for merging PRs. As to merging from master to 2.13: yes, the whole repo and every change would be "brought back" so that does not work: I need to use cherry-picking. But that should be fine. So I'll start with this PR. |
@@ -0,0 +1,25 @@ | |||
Jackson is licensed under [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0.txt). | |||
|
|||
The TOML grammar and some parts of the test suite are adapted from the TOML project, which is licensed under the MIT license: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, although I wonder if some processing systems barf on it.
One thing we could do, if want to, would be to make this module MIT licensed; that should be fine.
But I don't feel strongly about it, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some precedent for "weird" LICENSE files in other projects: https://github.com/rzwitserloot/lombok/blob/master/LICENSE
One solution would be dual-licensing this module as ASL2.0 and MIT. That way, if tools parse this incorrectly as MIT they'll still be fine, and at the same time there's no license inconsistency under the jackson umbrella.
Ok, merged this PR, squashed, into master. So far so good. I can see the 2.13 commit you mention but not sure how to merge it (it's on your forked repo?). Is it easy to create another PR targeting 2.13 with it, or create a diff I just patch? |
I've created a separate PR for 2.13 in #251. That one is independent of master and should be safe to merge. Though I'm not sure how well the merge into master will work with this PR already included there. |
This is WIP, creating a PR for discussion on the approach.
As discussed in #219, the toml parser landscape in java is mediocre. For this reason, I decided it was worth rolling our own, though I'm open to feedback on that decision.
TOML has an ABNF grammar available that I assume to be authoritative. The ABNF parser generator landscape is also not great in Java, so I decided to transform it by hand (potentially with errors?) into a jflex lexer file (toml.jflex), which generates a fairly efficient lexer using the jflex maven plugin. From that, I implemented a basic recursive descent parser (Parser.java). This approach also has the benefit of not requiring an external runtime dependency like antlr, unlike some existing parsers.
I have created a test class that includes all examples from the TOML docs. Most already work, but there are some samples that don't yet conform, mostly parsing successfully when parsing should fail.
I do not expect serialization to be a big problem, so I am keeping that for the end.
A few things are still outstanding: