Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Expressions #64

Closed
ghost opened this issue Mar 10, 2021 · 31 comments
Closed

Filter Expressions #64

ghost opened this issue Mar 10, 2021 · 31 comments

Comments

@ghost
Copy link

ghost commented Mar 10, 2021

(This was original raised in #54 and now being split out)

Related to that, it would be helpful to determine if JSONPath filters
apply to both JSON objects and arrays, or only to JSON arrays.
(Daniel P)

I would support restricting filters to arrays, if others agree.
(Glyn Normington)

I tend to let implementations and their "normative force of the factual" decide here or in doubt agree to Glyn's restriction to arrays.

I am very unhappy with confusing $..book[(@.length-1)], where '@' addresses the array itself and implies that array has a length property. In filter expression examples '@' more consistently addresses the current array element.

The invocation of 'the underlying scripting engine' wasn't meant a serious normative aspect, but rather a quick and dirty solution for JavaScript and PHP implementations at that time.

Corner Case

Consider this perfectly legal JSON object

{ "ab": 0, "'a.b": 1, "a-b": 2, "a": { "b": 3 } }

So $.ab is 0, $.a.b is 3, $['a.b'] is 1, $['a-b'] is 2. You'd like to say $.a-b but lots of libraries will refuse it because "a-b" is not a legal JavaScript "name" construct, that's why you have to say $['a-b'].

But suppose your library would accept $.a-b. Then $.a-b and $['a-b'] would be synonyms, but $.a.b and $['a.b'] wouldn't.
(Tim Bray)

Hmm ... this seems to be a hint to better exclude '-' from dot-child-selector syntax. I think I have read more discussion about that, currently don't know where.

@gregsdennis
Copy link
Collaborator

This actually already has an issue: #47.

@gregsdennis
Copy link
Collaborator

Can we close this?

@goessner
Copy link
Collaborator

Filter expressions [?(<expr>)] are completely missing in the current draft.

Implementations agree, that filter expressions are an essential feature with JSONPath. They are considered much more important than "index expressions" [(<expr>)], given that term "index" is an interim synonym for "name / index" (#84).

There are already some discussions about filters (historically ascending):

  • Non-property-name-friendly chars #16
  • Query expression language support #17
  • Treating of duplicates #23
  • Security items #25
  • Filter applied to JSON object #47
  • Some observations on the 3 March 2021 draft #56
  • JSONPath expression and bracket-notation #57
  • Regular expressions in filters #70
  • The problem of comparing an evaluated expression with a json literal #74
  • The "current value" or "current node", and the meaning thereof #76

1. Term "filter expression"

The WG still has to decide, if we want to use the term "filter expression" in the draft, as there are some objections against it (#20).

As an alternative the term "filter selector" was proposed, which is used below.

2. Apply Filter Selector to Objects

There seems to be no agreement, if filters should be applied to objects (#47).

Consider a school,

{
    "students": [...],
    "staff": [...],
    "teachers": [...]
}

while looking for the array(s) containing persons named 'Müller' by '$[?(@..name == 'Müller')]', which is not the same as looking for the persons themself by '$.*[?(@.name == 'Müller')]'.

I can see value for allowing filter selectors applied to objects and not really a rational reason to exclude them from filtering.

3. How Filter Selectors Work

Filter selectors work via iterating over arrays and objects, provided the latter is allowed by the spec.

3.1 Selecting

During iteration process the value of each array element or object member is taken and evaluated against an implicite boolean expression <expr> in '[?(<expr)]'. If the result is truthy, the item is selected. In case of a falsy result, it is not.

3.2 The Current Value @

In expression <expr>, the special selector '@' represents the currently visited JSON value during iteration process. The most simple JSONPath expression with a filter selector is '$[?(@)]', collecting all items of the related container, provided their individual value is interpreted as truthy.

Special selector '@' used outside of filter selectors may have completely different meaning.

4. Syntax

4.1 Filter Selector Syntax

filter-selector    = "[?(" expr ")]"

There are some discussions / implementations that are leaving out the parentheses. In this case syntax reads:

filter-selector    = "[?" expr "]"

It needs to be discussed, if the last - more minimally - syntax works in all practical cases. If it does and expr is allowed to contain parentheses, above original syntax would be also valid automatically.

4.2 Filter Selector Expression Syntax

Implicite boolean expression expr is evaluated and converted to the JSON values true or false. Selection occures in the true case only.

Greg proposes a minimal set of basic comparison, mathematical, and boolean operators #17:

==   equal
!=   not equal
<    less than
<=   less than or equal to
>    greater than
>=   greater than or equal to

+    addition
-    subtraction
*    multiplication
/    division
%    modulus

&&   and
||   or

Taking this set of operators, excluding arithmetic operators for now, we have something like this:

expr                = *comparisons
comparisons         = ["("] comparison *[logical-operator comparison] [")"]
logical-operator    = "||" / "&&"
comparison          = relative-path [comparison-operator value]
comparison-operator = "==" / "!=" / "<" / ">" / "<=" / ">="
value               = number / quoted-string / true / false / null
relative-path       = "@" *(dot-selector / index-selector)

dot-selector and index-selector is defined elsewhere. number, quoted-string and null is defined analog to RFC8259.

4.3 Arithmetic operator Extension

Two scenarios for using (binary) arithmetic operators op are possible.

  1. Use it on the left side of the comparison, i.e. 'relativ-path op (relativ-path | number)'. Here it is implied, that values on both sides of the comparison are of type number. If one or both are string or null it must be decided, what to do.
  2. Use it on the right side of the comparison is a trivial thing for number literals and might be directly replaced by its arithmetic result.

I can hardly see relevant practical use cases here and would opt for not supporting arithmetic operators at the first glance.

4.4 in operator Extension

An in operator might be useful. Examples are

[?(@.color in ['red','green','blue'])]
[?(@.type in [1,2,4,8])]

In fact, supporting this would be pure syntactic sugar, as it can always be replaced by

[?(@.color == 'red' || @.color == 'green' || @.color == 'blue')]
[?(@.type == 1 || @.type == 2 || @.type == 4 || @.type == 8)]

Whats your opinion? With minimalism in mind I would opt against it.

4.5 Nested Filter Selectors

Greg pointed us to an SO question. Herein an array element (an object itself) is to be selected, if and only if its subarray property contains an element having a certain index/value pair. Somewhat like

$.arr[?(@.type == 'a' && @.subarr[?(@.prop == 'b')])]

In this case relative-path in 4.2 above should be modified to

relative-path = "@" *(dot-selector / index-selector / filter-selector)

What's the general opinion here ?

4.6 Regular expressions with Filter Selectors

See Regular expressions in filters #70.

I see usefulness here.

@gregsdennis
Copy link
Collaborator

gregsdennis commented Apr 30, 2021

First, why post here? This is a duplicate issue, as I noted.

They are considered much more important than "index expressions" [(<expr>)]...

A being "more important" (really meaning "more commonly used") than B does not imply that B should be excluded.

The index expression format has its uses that cannot be replicated with other mechanisms. It should be included.

@gregsdennis
Copy link
Collaborator

If the result is truthy, the item is selected. In case of a falsy result, it is not.

We still need to define "truthiness." There is still active discussion on this topic.

in operator Extension

I see the utility of this. It does open up the door for other operators. The inverse of this operator would be contains, e.g. [?(@.colors contains 'red'])].

I'm sure we can think of others.

Nested Filter Selectors

We need to address scoping. $ can always be used to access the root of the input value, and @ access the "current" element. But once we introduce nested filters, identifying what @ represents gets murky.

Maybe @@ can denote the nested scope instead of just a single @. This leaves the single @ to denote the initial interated scope. I can't imagine this would be used much, and maybe we include this functionality with a SHOULD or RECOMMENDED so that it's not a hard requirement, which will alleviate pressure on implementors.

@goessner
Copy link
Collaborator

goessner commented May 2, 2021

Correctum: 4.5 Nested Filter Selectors

As I should have been thinking a little longer, I finally realized, that there is no need for nested filter selectors in example above ...

$.arr[?(@.type == 'a' && @.subarr[?(@.prop == 'b')])]

... as this might be resolved conventionally by

$.arr[?(@.type == 'a' && @.subarr[*].prop == 'b')])]

without nesting. I am doubtful, that significant real world examples exist, which are not resolvable without nesting.
So let's ignore that feature in the spec and leave it to implementations for now.

@goessner
Copy link
Collaborator

goessner commented May 2, 2021

in operator Extension

I see the utility of this. It does open up the door for other operators. The inverse of this operator would be contains, e.g. [?(@.colors contains 'red'])].

Can be solved without contains by [?(@.colors[*] == 'red')].

We should take care, not inflationary inventing operators.

@goessner
Copy link
Collaborator

goessner commented May 2, 2021

We still need to define "truthiness." There is still active discussion on this topic.

yes, you are right. Or better define falsiness, as Daniel proposes ...

Typical conditions to determine whether an arbitrary JSON value can be considered false are:

  • empty array: [],
  • empty object: {},
  • empty string: "",
  • false boolean,
  • null,
  • zero integer.

adding ...

  • undefined, i.e. not existing !

@gregsdennis
Copy link
Collaborator

gregsdennis commented May 2, 2021

Can be solved without contains by [?(@.colors[*] == 'red')].

What are we saying here? @.colors[*] returns an array. So you're implying that == functions as a contains operator? That's hideous! Operators shouldn't be overloaded like that.

Or better define falsiness, as Daniel proposes ...

My point is that I don't like the idea of loose equality at all, and I expect implementors in other strongly typed language will have the same reservations.

@goessner
Copy link
Collaborator

goessner commented May 3, 2021

Can be solved without contains by [?(@.colors[*] == 'red')].

What are we saying here? @.colors[*] returns an array. So you're implying that == functions as a contains operator? That's hideous! Operators shouldn't be overloaded like that.

Hmm ... what have I been thinking? You are right. It doesn't work that way, of course. I did not intend to introduce an implicit 'contains' operator.

Given this, nested filters and/or a contains operator is still worth a discussion.

@goessner
Copy link
Collaborator

goessner commented May 3, 2021

Or better define falsiness, as Daniel proposes ...

My point is that I don't like the idea of loose equality at all, and I expect implementors in other strongly typed language will have the same reservations.

We don't address implementors but ordinary users of JSONPath, which are not necessarily programmers using strongly typed languages.

Ok ... what exactly do equality comparisons look like then? What do you propose?

@glyn
Copy link
Collaborator

glyn commented May 4, 2021

So let's ignore [nested filter expressions] in the spec and leave it to implementations for now.

I think the spec should define whether or not any given JSONPath is syntactically valid rather than leaving this up to implementations. What do others think about this general principle?

Also, I'm not sure how the spec's syntax and compliance test suite could allow implementation-specific variants.

There isn't much of a consensus among existing implementations about the behaviour of nested filter expressions. However, Proposal A neatly disallows nested filter expressions, along with certain other complicated filter expressions. OTOH, I quite like not having apparently arbitrary syntactic restrictions and so my implementation allows nested filter expressions.

@gregsdennis
Copy link
Collaborator

gregsdennis commented May 4, 2021

Ok ... what exactly do equality comparisons look like then? What do you propose? - @goessner

Doing loose equality in a strongly-typed system is a serious pain. However, strict equality in a loosely-typed system is generally easy. For example, JS uses the === operator to facilitate this. I realize that my opinion is biased, but I speak from (recent) experience of having to do the former.

I think the spec should define whether or not any given JSONPath is syntactically valid rather than leaving this up to implementations. - @glyn

Definitely, yes, the spec should be as precise and prescriptive as possible on what it covers. However, I do think that it does need to allow for implementations to expand on the domain in order to foster new syntaxes and ideas, so long as those implementations call out those deviations as non-standard.

I think the spec should define its support for nested queries, even if it's to say they're not supported. I also think that it should support index queries.

However, I recognize that it says nothing at all in regard to expressions currently, and I'm on board with getting something in place to cover the things we at least agree on right now so long as it's done in a way to leave the door open to incorporate these other features after we've discussed them further.

@cabo
Copy link
Member

cabo commented May 4, 2021

The spec should be well-defined, but also provide extension points -- both for third parties to use and for evolution of the spec itself.

@glyn
Copy link
Collaborator

glyn commented May 11, 2021

Allowing non-scalar comparisons in filter expressions leads to some weirdness. For example, the > operator is no longer total.

Background: vmware-labs/yaml-jsonpath#31
Detailed example: vmware-labs/yaml-jsonpath#3

The JSONPath comparison project has a couple of relevant examples:

I would prefer the spec to disallow non-scalar comparisons. Christoph Burgmer found a neat way to disallow this in the grammar of Proposal A.

@gregsdennis
Copy link
Collaborator

gregsdennis commented Jun 24, 2021

Found a StackOverflow question where the OP is looking to use a unary ! to negate a user-supplied expression.

$[?(!(<expr>))]

@goessner
Copy link
Collaborator

goessner commented Jun 25, 2021 via email

@gregsdennis
Copy link
Collaborator

If you look at the rest of the post, he shares his code where you can see that he's literally just interpolating user input into the expression and using the negation operator to invert the filter.

This is why he says the != doesn't work for him. I suppose he could do something like <expr> == false, but that's kinda weird, too.

@gregsdennis
Copy link
Collaborator

gregsdennis commented Jul 18, 2021

We received an SO question that works be helped by the in operator, but the other direction.

The asker wants a path that can identity paths to arrays that contain two values. Such a query could be

$[?('PERSON A' in @ && 'PERSON B' in @)]

returning paths instead of values (so they can get the keys).

This iterates over the values in the top-level object (which are themselves arrays) and uses the in operator to determine if specific values are in the arrays.

@gregsdennis
Copy link
Collaborator

Here's another SO question that would benefit from nested expressions.

https://stackoverflow.com/q/68926463/878701

@cabo
Copy link
Member

cabo commented Jan 17, 2022

This discussion led to useful changes to the draft but does not seem to have actionable items left.
Closing.

@relu91
Copy link

relu91 commented Aug 26, 2022

I tried to understand the final consensus on this 4.5 Nested Filter Selectors. Which was the final decision? I am looking for a way to select an object inside an array that has an array with a particular item:

[{"target": ["a", "b"]}, {"target" : ["c", "d"]}]

In my understanding in some implementations I can do:

$[? @.target == "a" ]

but that seems to be ruled out cause it is actually operator overloading. Or

$[? "a" in @.target ]

But it is not a valid query for the current grammar.

@glyn
Copy link
Collaborator

glyn commented Aug 27, 2022

I believe the current consensus is that the only paths allowed inside filter expressions are singular paths, and that rules out nested filters.

@relu91
Copy link

relu91 commented Aug 27, 2022

I believe the current consensus is that the only paths allowed inside filter expressions are singular paths and that rules out nested filters.

Ok, thank you for your answer! But then a singular path can still resolve in a node that is an array, right? Is there any plan to add particular operators to deal with array items? like the in operator above? Maybe #203 will help?

@glyn
Copy link
Collaborator

glyn commented Aug 27, 2022

I believe the current consensus is that the only paths allowed inside filter expressions are singular paths and that rules out nested filters.

Ok, thank you for your answer! But then a singular path can still resolve in a node that is an array, right?

Correct.

Is there any plan to add particular operators to deal with array items? like the in operator above?

No, we decided against including the in operator a while back.

Maybe #203 will help?

Perhaps.

@danielaparker
Copy link

danielaparker commented Aug 28, 2022

Is there any plan to add particular operators to deal with array items? like the in operator above?

No, we decided against including the in operator a while back.

Searching for an element in an array is a common use case, and is readily achieved in both of the foundational JSONPath implementations, Goessner Javascript JSONPath and Jayway Java JSONPath.

Given

[{"target": ["a", "b"]}, {"target" : ["c", "d"]}]       

In Goessner:

$[?(@.target.includes('a'))]

In Jayway:

$[?('a' in @.target)]

Both produce:

[
   {
      "target" : [
         "a",
         "b"
      ]
   }
]

Note also that this capability is supported in the modern competitors to JSONPath, in JMESPath, with the contains function, and in JSONAtat, with the in operator. Perhaps the committee would reconsider?

Maybe #203 will help?

Perhaps.

If some entity were to register a "contains" function as an extension point, presumably users of that entity's implementation would have this capability. But there wouldn't be interoperability, which would presumably be the goal of the standardization exercise.

Daniel

@gregsdennis
Copy link
Collaborator

In Goessner:

$[?(@.target.includes('a'))]

This is because Goessner uses the underlying language to process expressions, which is something that we explicitly forbid in favor of interoperability.

@relu91
Copy link

relu91 commented Aug 28, 2022

If some entity were to register a "contains" function as an extension point, presumably users of that entity's implementation would have this capability. But there wouldn't be interoperability, which would presumably be the goal of the standardization exercise.

Once the syntax of the extensions point is settled, I think that the group can define a "core set" of extension points that have to be supported in all compliant implementations. This should help interoperability at least for the common use cases.

@danielaparker
Copy link

In Goessner:

$[?(@.target.includes('a'))]

This is because Goessner uses the underlying language to process expressions, which is something that we explicitly forbid in favor of interoperability.

Yes, but from the user's point of view, it's cold comfort if a capability that they had with another version of JSONPath isn't available in the committee's interpretation of JSONPath. One of the appeals of JSONPath Javascript/Python has always been that the user could write expressions with all the capabilities of Javascript/Python available. The Jayway implementation, which I believe was the first quality implementation to define it's own expression language, introduced operators and functions for some of the most common requirements, including searching for an element in an array. And JSONPath's modern competitors have all introduced a rich set of operators and/or functions to satisfy common search use cases.

Best regards,
Daniel

@gregsdennis
Copy link
Collaborator

My point, though, is that it needs to be defined in the spec. That's not to say it shouldn't or won't be, but that's a lot of work and not the target for the spec right now.

@danielaparker
Copy link

My point, though, is that it needs to be defined in the spec.

Indeed :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants