Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selector definitions (basically an analysis and description of the DSL) #15

Closed
gregsdennis opened this issue Sep 6, 2020 · 10 comments
Closed

Comments

@gregsdennis
Copy link
Collaborator

gregsdennis commented Sep 6, 2020

We still don't yet have an official definition of each of the operators ("selectors"). This issue has been created to define all of the things that we expect JSON Path should be able to do.

<NOTE>

In the second comment, I reconsider the interpretation I post in the this first comment. However, I'm leaving it here for posterity as it does contain some interesting ideas.

</NOTE>

Basic Operations

$ - Selects the root of the instance. MUST be used at the start of the path; MAY be used in expressions as a reference.
.<name> - If the value is an object selects the value in the property indicated by <name>, if it exists; otherwise an empty array.
.* - If the value is an object, selects the values of all properties; otherwise an empty array.
[<index>] - If the value is an array, selects the values at the positions indicated by <index>. (See array indices later.)
[*] - If the value is an array, selects all of the values in it; otherwise an empty array.

This is where I start to have issues with the way that Goessner defined things. There is an incongruity between the syntax and the operation of the *-syntax and the name-syntax of the property selector and the recursive selector.

As mentioned above, .prop would select the value under the prop property of a local object. Similiarly, ..prop would find all objects recursively and select the values under their prop properties. These seem to behave the same, and this makes sense as they have similar syntaxes.

But the *-syntax selectors don't behave the same, even though they look like they should. .* will select all values of all properties if the local value is an object. ..* on the other hand recursively selects all values including the local value (consensus says most do, some only select children) and values inside both objects and arrays. It seems like the logical behavior for ..* should be to recursively find values of object properties.

But this leaves a gap: how do we define the "recursive get everything"? Furthermore, do we need a "local get everything" that would return the current item and array items (if an array) or object values (if an object)?

That confusion aside, defining ..* this way opens the door for another syntax: .[*], which would recursively select all values in arrays. In essence, it defines the recursion operator merely as a . followed by another operator. This means that if we want the "select all" behavior, we can define a new operator (and character) for it.

Selecting Everything

For now, suppose we use ~. This gives these selectors as well:

  • .~ - Selects the current value (maybe) and its local decendents, whether they are property values or array items.
  • ..~ - Selects the current value (maybe) and its recursive decentents.

Does ~ select the current node?

Consider the example currently in the spec:

{
  "a": [
    {"b": 1},
    {"b": 2},
    {"b": 3},
  ]
}

If our path is $.~, then we have two options. If we return the local value (the root), then we also include the intermediate values, like the array and all of the objects in it. If we don't return the local value and only return the children values, then we must apply the same logic at every step so that the result is only the leaf values: [1, 2, 3].

The consensus report shows that most implementations return the intermediate values, therefore, they should also return the local value.

Array Indices

The array selector supports a number of index forms

  • direct index lists
  • index calculations
  • predicate selections

The consensus shows that not only are single selections (as in Goessner's post) generally supported, but comma-delimited lists of indices are allowed. (Goessner does have a single example with multiple indices: $..book[0,1]) And many also support mixing the various styles.

Direct index lists

Each item can be a single index or a slice-notation index range (couldn't find a spec on it, but this SO answer is pretty thorough, even if it's for python).

Index calculations

This index is an expression that evaluates to a valid index, surrounded by parentheses.

It's unclear if Goessner's intent was to have the expression only evaluate to a single value, but we could support it evaluating to a collection or slice-notation.

Predicate selections

This index is also an expression, except that it evaluates to a boolean result. The expression iterates over all of the items and returns true for items to include and false for those to exclude. Like the other expression, this is surrounded by parentheses, however in this case the parentheses are preceded by a ?.

Expression syntax

Expressions bring us to the last selector, @. This selector is context sensitive: it carries a slightly different meaning depending on which type of index it's in.

For index calculation expressions, this symbol represents the array itself. It can be used in the expression to obtain information about the array, as in Goessner's example $..book[(@.length-1)].

For predecate selection expressions, this symbol represents the current item as the selection iterates over the contents of the array.

There are some other details about the expression syntax that would probably be better discussed in another issue.

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Sep 11, 2020

An alternative to the above (and probably the more correct interpretation of Goessner's post) is ignoring the difference between objects and arrays except when a property name or numeric index is specified. The following results for the selectors:

  • .name - if an object, gets the value under the name property, if it exists; otherwise an empty array.
  • [1] - if an array, gets the value at index 1 (0-based), if it exists; otherwise an empty array.
  • .* and [*] - if an object, gets the values for all properties; if an array, gets all values; otherwise an empty array. (These are the same.)
  • ..name - recursively selects the name property from all objects.
  • ..* - gets all values (including the local value), recursively selecting both object property values and array item values.

I think this is probably closer to how implementations currently interpret these selectors. The difference here is that we've defined * universally as "select all children of the local value regardless of the local value's type" rather than making a distinction between the object or array syntax.

Another benefit of this interpretation is that [*] covers the string-index format for property selection, e.g. ['name'], without the need for additional clarification in the spec.

Note that there isn't an array analog for ..name. I think a syntax like .[1] would work to recursively select all values at index 1 in arrays. This reinforces the idea from above that a . followed immediately by another operator indicates a recursive search.

Lastly the expression-style selectors would also be able to select values in objects.

I think this is a better definition for the selectors than what I put above.

@danielaparker
Copy link

danielaparker commented Mar 9, 2021

@gregsdennis wrote:

An alternative to the above (and probably the more correct interpretation of Goessner's post) is ignoring the difference between objects and arrays except when a property name or numeric index is specified. The following results for the selectors:

  • .name - if an object, gets the value under the name property, if it exists; otherwise an empty array.
  • [1] - if an array, gets the value at index 1 (0-based), if it exists; otherwise an empty array.
  • .* and [*] - if an object, gets the values for all properties; if an array, gets all values; otherwise an empty array. (These are the same.)
  • ..name - recursively selects the name property from all objects.
  • ..* - gets all values (including the local value), recursively selecting both object property values and array item values.

I think this is probably closer to how implementations currently interpret these selectors.

I think it would be more accurate to write

  • .name - if an object, gets the value under the name property and adds it to a collection of selected items, otherwise does
    nothing.

In the case of no match, I don't think a selector can be said to get an empty array.

If "paths" are in effect, it would need to update the path corresponding to that value as well.

@cabo
Copy link
Member

cabo commented Mar 9, 2021 via email

@danielaparker
Copy link

danielaparker commented Mar 9, 2021

@cabo wrote:

On 2021-03-09, at 16:38, Daniel Parker @.***> wrote: In the case of no match, I don't think a selector can be said to get > an empty array.
An array is a data item. I think we are talking about an empty collection (my tentative term for what XPath calls nodeset).
Grüße, Carsten

Right, but the collection wouldn't necessarily be empty when the selector was evaluated. Selectors are chained, an individual selector adds to the collection in the case of a match, but adds nothing in the case of no match. Of course, you already know that. But taken literally, @gregsdennis wording could be taken to mean that in the case of no match, the selector added an empty array to the collection, hence my comment.

@cabo
Copy link
Member

cabo commented Mar 9, 2021 via email

@danielaparker
Copy link

On 2021-03-09, at 16:57, Daniel Parker @.***> wrote: Right, but the collection wouldn't necessarily be empty when the selector was evaluated. Selectors are chained, an individual selector adds to the collection in the case of a match, but adds nothing in the case of no match. Of course, already you know that. But taken literally, @gregsdennis wording could have been taken to mean that the selector added an empty array to the collection, hence my comment.
You lost me. Are you saying that the collection resulting from $.a.b.c is the union of those resulting from $.a $.a.b $.a.b.c ? Grüße, Carsten

No, but I can certainly see how you could have interpreted it that way! By match I meant a match for the expression "$.a.b.c", but worded that poorly. In this case the collection would be empty until evaluating the tail of the path ".c", if it got that far.

But to my point, consider the store example with path $..isbn. In this case .isbn will be applied four times, at the tail end of four paths. In two cases with match, an isbn will be added to the collection, in the two others with no match, nothing will be added to the collection. In the latter cases, I don't think it can be said that the selector got an empty array, rather, it did nothing.

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Mar 9, 2021

@danielaparker - if an object, gets the value under the name property and adds it to a collection of selected items, otherwise does nothing.

Yes. This is what I meant. I was considering .name to be the party in its entirety here, do the final result would be empty. In most cases, this kind of operation is one of successive many, so the value, if any, would be added to whatever result collection exists.

@goessner
Copy link
Collaborator

goessner commented Mar 9, 2021

I would like to complete the list above with symmetry in mind:

  • .name and ['name'] - if an object, gets the value under the name property, if it exists; otherwise an empty array.
  • [1] - if an array, gets the value at index 1 (0-based), if it exists; otherwise an empty array. Not ['1'] !
  • .* and [*] - if an object, gets the values for all properties; if an array, gets all values; otherwise an empty array. (These are the same.)
  • ..name and ..['name'] - recursively selects the name property from all objects.
  • ..* and ..[*] - gets all values (including the local value), recursively selecting both object property values and array item values. Not the same as ..['*'] !

@gregsdennis
Copy link
Collaborator Author

☝️ that, too, yes, combined with what @danielaparker said about the return values.

@cabo
Copy link
Member

cabo commented Jan 17, 2022

I think this early discussion is now reflected in the document.
Closing.

@cabo cabo closed this as completed Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants