JEP | 7 |
Author | James Saryerwinnie |
Status | accepted |
Created | 16-Dec-2013 |
This JEP proposes grammar modifications to JMESPath to allow for filter expressions. A filtered expression allows list elements to be selected based on matching expressions. A literal expression is also introduced (from JEP-3) so that it is possible to match elements against literal values.
A common request when querying JSON objects is the ability to select elements based on a specific value. For example, given a JSON object:
{"foo": [{"state": "WA", "value": 1},
{"state": "WA", "value": 2},
{"state": "CA", "value": 3},
{"state": "CA", "value": 4}]}
A user may want to select all objects in the foo
list that have
a state
key of WA
. There is currently no way to do this
in JMESPath. This JEP will introduce a syntax that allows this:
foo[?state == `WA`]
Additionally, a user may want to project additional expressions onto the values
matched from a filter expression. For example, given the data above, select
the value
key from all objects that have a state
of WA
:
foo[?state == `WA`].value
would return [1, 2]
.
The updated grammar for filter expressions:
bracket-specifier = "[" (number / "*") "]" / "[]"
bracket-specifier =/ "[?" list-filter-expression "]"
list-filter-expression = expression comparator expression
comparator = "<" / "<=" / "==" / ">=" / ">" / "!="
expression =/ literal
literal = "`" json-value "`"
literal =/ "`" 1*(unescaped-literal / escaped-literal) "`"
unescaped-literal = %x20-21 / ; space !
%x23-5A / ; # - [
%x5D-5F / ; ] ^ _
%x61-7A ; a-z
%x7C-10FFFF ; |}~ ...
escaped-literal = escaped-char / (escape %x60)
The json-value
rule is any valid json value. While it’s recommended
that implementations use an existing JSON parser to parse the
json-value
, the grammar is added below for completeness:
json-value = "false" / "null" / "true" / json-object / json-array /
json-number / json-quoted-string
json-quoted-string = %x22 1*(unescaped-literal / escaped-literal) %x22
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
ws = *(%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ; Carriage return
)
json-object = begin-object [ member *( value-separator member ) ] end-object
member = quoted-string name-separator json-value
json-array = begin-array [ json-value *( value-separator json-value ) ] end-array
json-number = [ minus ] int [ frac ] [ exp ]
decimal-point = %x2E ; .
digit1-9 = %x31-39 ; 1-9
e = %x65 / %x45 ; e E
exp = e [ minus / plus ] 1*DIGIT
frac = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
minus = %x2D ; -
plus = %x2B ; +
zero = %x30 ; 0
The following operations are supported:
-
==
, tests for equality. -
!=
, tests for inequality. -
<
, less than. -
<=
, less than or equal to. -
>
, greater than. -
>=
, greater than or equal to.
The behavior of each operation is dependent on the type of each evaluated expression.
The comparison semantics for each operator are defined below based on the corresponding JSON type:
For string/number/true/false/null
types, equality is an exact match. A
string
is equal to another string
if they they have the exact sequence
of code points. The literal values true/false/null
are only equal to their
own literal values. Two JSON objects are equal if they have the same set
of keys (for each key in the first JSON object there exists a key with equal
value in the second JSON object). Two JSON arrays are equal if they have
equal elements in the same order (given two arrays x
and y
,
for each i
in x
, x[i] == y[i]
).
Ordering operators >, >=, <, <=
are only valid for numbers.
Evaluating any other type with a comparison operator will yield a null
value, which will result in the element being excluded from the result list.
For example, given:
search('foo[?a<b]', {"foo": [{"a": "char", "b": "char"},
{"a": 2, "b": 1},
{"a": 1, "b": 2}]})
The three elements in the foo list are evaluated against a < b
. The first
element resolves to the comparison "char" < "bar"
, and because these types
are string, the expression results in null
, so the first element is not
included in the result list. The second element resolves to 2 < 1
,
which is false
, so the second element is excluded from the result list.
The third expression resolves to 1 < 2
which evalutes to true
, so the
third element is included in the list. The final result of that expression
is [{"a": 1, "b": 2}]
.
When a filter expression is matched, the matched element in its entirety is included in the filtered response.
Using the previous example, given the following data:
{"foo": [{"state": "WA", "value": 1},
{"state": "WA", "value": 2},
{"state": "CA", "value": 3},
{"state": "CA", "value": 4}]}
The expression foo[?state == `WA`]
will return the following value:
[{"state": "WA", "value": 1}]
Literal expressions are also added in the JEP, which is essentially a JSON value surrounded by the “`” character. You can escape the “`” character via “\`”, and if the character “`” appears in the JSON value, it must also be escaped. A simple two pass algorithm in the lexer could first process any escaped “`” characters before handing the resulting string to a JSON parser.
Because string literals are by far the most common type of JSON value, an alternate syntax is supported where the starting and ending double quotes are not required for strings. For example:
`foobar` -> "foobar"
`"foobar"` -> "foobar"
`123` -> 123
`"123"` -> "123"
`123.foo` -> "123.foo"
`true` -> true
`"true"` -> "true"
`truee` -> "truee"
Literal expressions aren’t allowed on the right hand side of a subexpression:
foo[*].`literal`
but they are allowed on the left hand side:
`{"foo": "bar"}`.foo
They may also be included in other expressions outside of a filter expressions. For example:
{value: foo.bar, type: `multi-select-hash`}
The proposed filter expression syntax was chosen such that there is sufficient expressive power for any type of filter one might need to perform while at the same time being as minimal as possible. To help illustrate this, below are a few alternate syntax that were considered.
In the simplest case where one might filter a key based on a literal value, a possible filter syntax would be:
foo[bar == baz]
or in general terms: [identifier comparator literal-value]
. However this
has several issues:
-
It is not possible to filter based on two expressions (get all elements whose
foo
key equals itsbar
key. -
The literal value is on the right hand side, making it hard to troubleshoot if the identifier and literal value are swapped:
foo[baz == bar]
. -
Without some identifying token unary filters would not be possible as they would be ambiguous. Is the expression
[foo]
filtering all elements with a foo key with a truth value or is it a multiselect-list selecting thefoo
key from each hash? Starting a filter expression with a token such as[?
make it clear that this is a filter expression. -
This makes the syntax for filtering against literal JSON arrays and objects hard to visually parse. “Filter all elements whose
foo
key is a single list with a single integer value of 2:[foo == [2]]
. -
Adding literal expressions makes them useful even outside of a filter expression. For example, in a
multi-select-hash
, you can create arbitrary key value pairs:{a: foo.bar, b: `some string`}
.
This JEP is purposefully minimal. There are several extensions that can be added in future:
- Support any arbitrary expression within the
[? ... ]
. This would enable constructs such as or expressions within a filter. This would allow unary expressions.
In order for this to be useful we need to define what corresponds to true and false values, e.g. an empty list is a false value. Additionally, “or expressions” would need to change its semantics to branch based on the true/false value of an expression instead of whether or not the expression evalutes to null.
This is certainly a direction to take in the future, adding arbitrary expressions in a filter would be a backwards compatible change, so it’s not part of this JEP.
- Allow filter expressions as top level expressions. This would potentially
just return
true/false
for any value that it matched.
This might be useful if you can combine this with something that can accept a list to use as a mask for filtering other elements.