Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A "General Considerations" section #58

Closed
gregsdennis opened this issue Mar 9, 2021 · 5 comments
Closed

A "General Considerations" section #58

gregsdennis opened this issue Mar 9, 2021 · 5 comments

Comments

@gregsdennis
Copy link
Collaborator

gregsdennis commented Mar 9, 2021

JSON Schema 2020-12, Section 6, specifies a set of general considerations that I think we can "borrow" from. I'll copy it here for reference and add comments.

6. General Considerations

6.1. Range of JSON Values

An instance may be any valid JSON value as defined by JSON. JSON Schema imposes no restrictions on type: JSON Schema can describe any JSON value, including, for example, null.

JSON Schema uses the word "instance" to describe the JSON data that's being validated. We seem to have landed on the word "data." Regardless, I think the gist of this also holds true for JSON Path: input values can be of any JSON type (although only certain types may return results depending on the Path).

6.2. Programming Language Independence

JSON Schema is programming language agnostic, and supports the full range of values described in the data model. Be aware, however, that some languages and JSON parsers may not be able to represent in memory the full range of values describable by JSON.

This, I think, is of utmost importance. We definitely should not favor a specific language. Doing so would inhibit inclusivity and shun people who work in incompatible frameworks.

This statement also makes it clear that it's understood that some frameworks inherently have limitations that may prevent them from being able to implement the full expression of JSON Schema. Declaring this outright allows such frameworks to have partial "best effort" implementations and still be compliant with the specification.

6.3. Mathematical Integers

Some programming languages and parsers use different internal representations for floating point numbers than they do for integers.

For consistency, integer JSON numbers SHOULD NOT be encoded with a fractional part.

I'm not sure whether this would apply for us, except maybe for array indices.

6.4. Regular Expressions

Keywords MAY use regular expressions to express constraints, or constrain the instance value to be a regular expression. These regular expressions SHOULD be valid according to the regular expression dialect described in ECMA-262, section 21.2.1.

Regular expressions SHOULD be built with the "u" flag (or equivalent) to provide Unicode support, or processed in such a way which provides Unicode support as defined by ECMA-262.

Furthermore, given the high disparity in regular expression constructs support, schema authors SHOULD limit themselves to the following regular expression tokens:

  • individual Unicode characters, as defined by the JSON specification;
  • simple character classes ([abc]), range character classes ([a-z]);
  • complemented character classes ([^abc], [^a-z]);
  • simple quantifiers: "+" (one or more), "" (zero or more), "?" (zero or one), and their lazy versions ("+?", "?", "??");
  • range quantifiers: "{x}" (exactly x occurrences), "{x,y}" (at least x, at most y, occurrences), {x,} (x occurrences or more), and their lazy versions;
  • the beginning-of-input ("^") and end-of-input ("$") anchors;
  • simple grouping ("(...)") and alternation ("|").
  • Finally, implementations MUST NOT take regular expressions to be anchored, neither at the beginning nor at the end. This means, for instance, the pattern "es" matches "expression".

Not sure if we're planning on supporting regular expressions. It appears that some implementations do have some support, but it's all extension on the original syntax at this point. Still, this is a good declaration of support.

It also ties in closely to section 6.2 regarding framework limitations as not all frameworks support the same flavor of regular expression syntax.

6.5. Extending JSON Schema

Additional schema keywords and schema vocabularies MAY be defined by any entity. Save for explicit agreement, schema authors SHALL NOT expect these additional keywords and vocabularies to be supported by implementations that do not explicitly document such support. Implementations SHOULD treat keywords they do not support as annotations, where the value of the keyword is the value of the annotation.

Implementations MAY provide the ability to register or load handlers for vocabularies that they do not support directly. The exact mechanism for registering and implementing such handlers is implementation-dependent.

This is good to have because invariably, implementations will want to extend functionality beyond what's in the spec. It basically covers other implementations from also having to provide the same extensions, requiring only what is stated in the spec.

It also mentions "vocabularies," which are a spec-defined mechanism by which implementation can extend functionality via new keywords in such a way that they can optionally be supported in other implementations. Furthermore, this mechanism allows the other implementations to refuse to process a schema that requires a given vocabulary if the implementation doesn't understand it. This bit I think is good for later when we eventually get to spec-defined extension mechanisms, but I don't expect that'll be in the first draft.


That's it. Just some declarations that I think would be good to have. This is neither an exclusive nor "all or nothing" list. I think we should pick and choose as we see fit. If you think of something that's not in this list, let us know.

@danielaparker
Copy link

danielaparker commented Mar 9, 2021

@gregsdennis wrote:

JSON Schema uses the word "instance" to describe the JSON data that's being validated. We seem to have landed on the word "data."

Goessner uses "root object", which I've always thought of as a JSON value (not restricted to be an object), and somewhat analogous to the JSON Schema "instance". In online JSONPath articles, "root" is often described as the "root object or array", which I understand, or "root member of a JSON structure", which I don't understand. The draft uses "root item" (once) and "root node" (five times), and talks about the "root node which is the input document." I'm not sure if it's trying to make a distinction between "root node" and the JSON value passed to a JSONPath evaluator, but from the quoted sentence, it doesn't sound like it.

I also note that it's unclear what the draft means by "node", which occurs 60 times in the draft. In section 3.2, it says "Each node holds a JSON value", but it doesn't say what else the node holds (a position or path to that point?) And then we have "root node which is the input document", which suggests the root node is a value. My own understanding of a node is a path/value pair, and I think the draft needs to be more clear about this term, and to distinguish between root and current nodes, and the corresponding root and current values.

I do think it would help to have a consistently used term to represent the thing that we pass to the evaluator. And avoid having text like "the JSON data item to which the query is applied to" embedded in a sentence.

Daniel

@cabo
Copy link
Member

cabo commented Mar 10, 2021 via email

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Mar 10, 2021

Not sure this is the right issue to discuss this, but the issues are currently all over the place.

To be sure, this issue was to cover the necessity of this sort of section moreso than the specific declarations. If we agree that such a section is ideal or even required, I'm fine with that consensus for this issue and we can split out the specific topics to other issues.

@gregsdennis
Copy link
Collaborator Author

It looks like the terminology discussion is now happening over in #66. That's one topic split out.

@cabo
Copy link
Member

cabo commented Jan 17, 2022

Text on regular expressions is useful input to #70 , which now references this.
OBE otherwise, I'd say.

@cabo cabo closed this as completed Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants