Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONPath expression and bracket-notation #57

Closed
goessner opened this issue Mar 6, 2021 · 7 comments
Closed

JSONPath expression and bracket-notation #57

goessner opened this issue Mar 6, 2021 · 7 comments

Comments

@goessner
Copy link
Collaborator

goessner commented Mar 6, 2021

I was thinking a while about [(...)] expressions (we may need another term here: 'evaluatable' or something) and filter expressions [?(...)] (or simply 'filters'?).

In order to discuss bracket-notation inside of JSONPath expressions – leaving out unions for now, we can address implicitly either JSON objects by using a string indicating a member name ...

$..['name']  // (1) ... array elements can't be selected this way ...
             //     Prefer using single quotes, so allow JSONPath 
             //     expressions being JSON text strings, defined 
             //     in [RFC8259]. 

or JSON arrays by using an integer as array index.

$..[42]  // (2) ... object members can't be selected this way ...

I'm not sure, how strict or forgiving implementations handle $..['42'] here ... I assume the latter.

$..[(#)]

Now, when applying $..[(#)], then abstract '#' should evaluate to a string in case of (1) and to an integer in case of (2), if it wants to have a chance to select anything.

Now consider the root selector '$' being part of the evaluatable in question ...

{ "key":"name", "o": {"name":"x"} }

$..[($.key)]  // ... "x"

and ...

{ "idx": 2, "a": ["v","w","x"] }

$..[($.idx)]  // ... "x"

JSON documents containing such a level of indirection may be treated well by JSONPath this way. It's hard for me to imagine other use cases, since both expression must resolve to string and integer, in order to make sense.

This time consider the current element/node indicator '@' being used ...

{ "o": {"key":"name", "name":"x"} }

$..[(@.key)]  // ... "x"

and ...

{ "a": [2,"v","w","x"] }

$..[(@[0])]  // ... "x"

Again both expression must resolve to string and integer respectively. So current item '@' needs to reference its container, i.e. object 'o' and array 'a' in the examples above. The benefit is significantly less here in contrast to the use of the root selector '$'. Again we can perform some kind of 'pointing' from one location to another in a single JSON document. But this time an object member is addressed more restrictively from its direct descendant or from its sibling or thats descendant.

My former example $..book[(@.length-1)] addresses via '@' the array book
and implies that it has a member with name length. This is true for JavaScript, but does not apply generally and thus makes it a bad example.

Daniel largely agrees with the uselessness of this case in his post "@ and the current element" on the list https://mailarchive.ietf.org/arch/browse/jsonpath/, but he and others may have some other practical uses in mind ... ?

$..[#]

Daniel is pointing to another variant of expression evaluating $..[@.address.city] on stackoverflow. Basically

{ "a":{"b":{"c":"x"} } }

$..[@.b.c]  // ... "x", which yields identical results to
$..b.c 

that use is questionable, as there exists another more intuitive JSONPath query delivering the same result, as shown in the example. Interesting is the idea, that the expression inside the brackets is evaluated, as if it were outside. Current item '@' here references its container as in $..[(#)] expressions above. A benefit (?) of that might come in combination with unions, which we won't discuss here (but possibly in another later issue).

$..[?(#)]

JSONPath filters seem to be of most practical use. It allows to perform tests while iterating over a JSON container (object or array). If the boolean test results in true, the object member or array element is selected, thus filtering data items out. In tests the symbols '$' and '@' may be used, meaning the root node and current node (is node a valid term in the draft?), as in ...

{ "find":"red",
   "a": ["yellow","green","red"],
   "o": { "size":12, "color":"red"}
}

$.a[?(@ == $.find)]      // ... "red"
$.o[?(@ == 'red')]       // ... no match, '@' holds a 'name/value' pair.
$.o[?(@.color == 'red')] // ... member "color":"red".

It's important to note, that symbol '@' herein does not reference the JSON container as before with [(...)] expressions, but the members or elements contained, while iterating.

Interestingly there is good agreement with arrays, but obviously confusion regarding objects among the implementations. See enlightening issue #47 for discussion.

Conclusion

Workgroup members seem to be very reluctant regarding those evaluatable expressions (see Glyn's comment). One reason might be security considerations #25. I find Greg's comment remarkable, where he proposes two of the following options:

  1. not supporting evaluatable expressions at all.
  2. only support filters (for now?).
  3. only support filters on arrays (for now?).
  4. ... "SHOULD NOT evaluate objects" and leave it to implementations.
  5. ... "MUST be behind an option that defaults to 'not supported.'"

I think, whatever option we will choose – my favorite would be 2. possibly combined with 4. and/or 5. – there should be a clear definition, how '@' symbol is to be interpreted, when used by implementations. I'm not sure, if this requires an explicit grammar and if so, how detailed it has to be then.

I'm also not sure, if it's possible, to announce addition of those stuff to later versions of the draft (like TC39 handles proposals).

@danielaparker
Copy link

danielaparker commented Mar 6, 2021

@goessner wrote

I'm not sure, how strict or forgiving implementations handle $..['42'] here ... I assume the latter.

There is no need to guess, the answer is in Christoph Burgmer's comparisons. The answer is "forgiving".

Daniel

@danielaparker
Copy link

danielaparker commented Mar 6, 2021

@goessner wrote

Conclusion

I find Greg's comment remarkable, where he proposes two of the following options:

  1. not supporting evaluatable expressions at all.
  2. only support filters (for now?).
  3. only support filters on arrays (for now?).
  4. ... "SHOULD NOT evaluate objects" and leave it to implementations.
  5. ... "MUST be behind an option that defaults to 'not supported.'"

I think, whatever option we will choose – my favorite would be 2. possibly combined with 4. and/or 5. – there should be a clear > definition, how '@' symbol is to be interpreted, when used by implementations. I'm not sure, if this requires an explicit
grammar and if so, how detailed it has to be then.

The only example of a "[()]" expression that I've ever seen is the one in the original JSONPath article,

$.store.book[(@.length-1)].title

and I believe it was Glyn that noted it can be rewritten without using expressions. My sense is that this construct wouldn't be missed. But expressions are also used within filters.

Whether 3 or 4, I don't have a good enough understanding of the use cases for applying a filter to objects. When applied to an array, it is reasonable to expect a certain amount of homogeneity in the elements, so it makes sense to apply common criteria to them. But that isn't generally the case for object values. And when we select object values, we lose the association with their keys, which makes the results less helpful. But I'd be interested in hearing about use cases.

Daniel

@gregsdennis
Copy link
Collaborator

gregsdennis commented Mar 8, 2021

The only example of a "[()]" expression that I've ever seen is the one in the original JSONPath article,

$.store.book[(@.length-1)].title

In issue #17 I highlight two specific StackOverflow questions where people want to use these expressions as index selection (was wrong about the SO questions, but...) a similar use case for this type of expression. I also support this in my library to some extent.


There is a lot of overlap between these two issues. Maybe we should consider consolidating them.

@gregsdennis
Copy link
Collaborator

gregsdennis commented Mar 8, 2021

Personally, I like having both forms:

[(<expr>)]

<expr> returns

  • an integer (or perhaps a list of integers, or even a slice) that represents the index to select from an array.
  • a string (or list of strings) as key names for objects. This "object" case I hadn't considered before, but it seems a natural extension and provides symmetry.

The expression is evaluated without iterating over the current value: it would take the array/object as an argument instead of iterating over their children.

It may be used less, but it definitely has it's place, and I think it caters to more advanced scenarios such as when the data itself contains the indices/keys to select.

[?(<expr>)]

This one we seem more familiar with. The expression is evaluated on each child in turn.

<expr> returns a boolean indicating whether that item should be selected.

Again, I see usefulness in allowing iteration over objects, but whether the key is included as part of the accessible data is to be defined. Perhaps it only makes sense to allow access to the object's value collection. Defining a mechanism that also grants access to the key while still providing symmetry with the array filter expression syntax could prove difficult.

@danielaparker
Copy link

danielaparker commented Mar 8, 2021

@gregsdennis wrote

Personally, I like having both forms:

Given that both were in the original article, and if filters with expressions are implemented, there is little extra implementation effort, it seems reasonable.

[(<expr>)]

<expr> returns

  • an integer (or perhaps a list of integers, or even a slice) that represents the index to select from an array.
  • a string (or list of strings) as key names for objects. This "object" case I hadn't considered before, but it seems a natural
    extension and provides symmetry.

More generally, <expr> would either evaluate to an arbitrary JSON value, say result, or raise an error. If a value, rules need to be applied to determine whether @[result] makes sense, if yes, @[result] (if it exists) would be added to the select list, if no, nothing would be added. If an error were raised, nothing would be added to the select list.

If the current node @ is an array, presumably @[result] makes sense if result is an integer, and maybe if result is an array of integers (whether generated by a slice or otherwise), and possibly if result is a string that contains only digits. That needs to be decided.

If the current node @ is an object, presumably @[result] makes sense if result is a string.

[?(<expr>)]

This one we seem more familiar with. The expression is evaluated on each child in turn.

<expr> returns a boolean indicating whether that item should be selected.

More generally, <expr> would either evaluate to an arbitrary JSON value, say result, or raise an error, in exactly the same way as above. If a value, rules need to be applied to determine whether result could be considered false, and if yes, nothing would be added to the select list, otherwise, the current node would be added. If an error were raised, nothing would be added to the select list.

Typical conditions to determine whether an arbitrary JSON value can be considered false are:

  • empty array: [],
  • empty object: {},
  • empty string: "",
  • false boolean,
  • null,
  • zero integer.

For comparison, JMESPath requires all of these conditions except zero integer, in JMESPath, an expression that evaluates to zero is not regarded as false. But in the JSONPath Goessner javascript implementation, $[?(0)] returns nothing, while $[?(1)] returns everything in the store.

A similar requirement, to find the effective boolean value of a (more complicated!) value, is present in XPath, and is described here.

Again, I see usefulness in allowing iteration over objects, but whether the key is included as part of the accessible data is to be > defined.

Are you thinking about something like

$[?(@.key == 'foo' && @.value == 'bar')]

or perhaps

$[?(key(@) == 'foo' && @ == 'bar')]

or

$[?(stem(@) == 'foo' && @ == 'bar')]

?

Perhaps it only makes sense to allow access to the object's value collection. Defining a mechanism that also grants access to
the key while still providing symmetry with the array filter expression syntax could prove difficult.

Perhaps. Although something like key(@) or stem(@) could be regarded as either an index of an array or a key of an object.

(The rationale for the name stem would be for the stem of the position of the current node, position(@) could also be supported.)

@gregsdennis
Copy link
Collaborator

gregsdennis commented Mar 9, 2021

@danielaparker yes, I think you restated what I said. I'm not so worried about writing "spec-y" language in an issue, though, so much as the concept. If we can state it in plain language and get people to agree on the idea, we can develop a more precise language to describe it later.

Typical conditions to determine whether an arbitrary JSON value can be considered false are:

  • empty array: [],
  • empty object: {},
  • empty string: "",
  • false boolean,
  • null,
  • zero integer.

I had a heck of a time implementing "falsiness" and other loose equality things in .Net when I was building my JsonLogic library. In a strongly-typed language/framework (where I spend most of my time), this idea is counterintuitive and just feels wrong. But I understand the roots. If we proceed with this kind of thing, then I suggest we approach it as I describe in #15 where I propose a DSL (I also give reasons there, so I won't repeat them here).

Although something like key(@) or stem(@) could be regarded as either an index of an array or a key of an object.

I like this, but (and?) it opens the door for other functions (possibly an extension point). Perhaps things like array/object length for the [(<expr>)] expression could use this instead of the widely accepted dot-key syntax: [(length(@))].


Also related (and probably another candidate for consolidation): #47. It sounds like we like the idea of supporting it.

@cabo
Copy link
Member

cabo commented Jan 17, 2022

Interesting discussion that is pretty much implemented in the document now.

@cabo cabo closed this as completed Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants