JSON Path Operator Confusion
The CipherDoc service offers a robust secondary key lookup API and search interface powered by JSON/SQL Path queries run against a GIN-indexed JSONB column. SQL/JSON Path, introduced in SQL:2016 and added to Postgres in version 12 in 2019, nicely enables an end-to-end JSON workflow and entity lifecycle. It’s a powerful enabler and fundamental technology underpinning CipherDoc. I’m so happy to have found it.
Confusion
However, the distinction between the SQL/JSON Path operators @@
and @?
confused me. Even as I found that the @?
operator worked for my needs and @@
did not, I tucked the problem into my mental backlog for later study.
The question arose again on a recent work project, and I can take a hint. It’s time to figure this thing out. Let’s see where it goes.
The docs say:
jsonb @? jsonpath → boolean
- Does JSON path return any item for the specified JSON value?
'{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)' → t
jsonb @@ jsonpath → boolean
- Returns the result of a JSON path predicate check for the specified JSON value. Only the first item of the result is taken into account. If the result is not Boolean, then
NULL
is returned.
'{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 2' → t
These read quite similarly to me: Both return true if the path query returns an
item. So what’s the difference? When should I use @@
and when @?
? I went so
far as to ask Stack Overflow about it. The one answer directed my attention
back to the jsonb_path_query()
function, which returns the results from a
path query.
So let’s explore how various SQL/JSON Path queries work, what values various expressions return.
Queries
The docs for jsonb_path_query
say:1
jsonb_path_query ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → setof jsonb
- Returns all JSON items returned by the JSON path for the specified JSON value. If the
vars
argument is specified, it must be a JSON object, and its fields provide named values to be substituted into the jsonpath expression. If thesilent
argument is specified and is true, the function suppresses the same errors as the@?
and@@
operators do.select * from jsonb_path_query( '{"a":[1,2,3,4,5]}', '$.a[*] ? (@ >= $min && @ <= $max)', '{"min":2, "max":4}' ) → jsonb_path_query ------------------ 2 3 4
The first thing to note is that a SQL/JSON Path query may return more than one
value. This feature matters for the @@
and @?
operators, which return a
single boolean value based on the values returned by a path query. And path queries
can return a huge variety of values. Let’s explore some examples, derived from
the sample JSON value and path query from the docs.2
select jsonb_path_query('{"a":[1,2,3,4,5]}', '$ ?(@.a[*] > 2)');
jsonb_path_query
------------------------
{"a": [1, 2, 3, 4, 5]}
(1 row)
This query returns the entire JSON value, because that’s what $
selects at the
start of the path expression. The ?()
filter returns true because its
predicate expression finds at least one value in the $.a
array greater than
2
. Here’s what happens when the filter returns false:
select jsonb_path_query('{"a":[1,2,3,4,5]}', '$ ?(@.a[*] > 5)');
jsonb_path_query
------------------
(0 rows)
None of the values in the $.a
array are greater than five, so the query
returns no value.
To select just the array, append it to the path expression after the ?()
filter:
select jsonb_path_query('{"a":[1,2,3,4,5]}', '$ ?(@.a[*] > 2).a');
jsonb_path_query
------------------
[1, 2, 3, 4, 5]
(1 row)
Path Modes
One might think you could select $.a
at the start of the path query to get the
full array if the filter returns true, but look what happens:
select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query
------------------
3
4
5
(3 rows)
That’s not the array, but the individual array values that each match the predicate. Turns out this is a quirk of the Postgres implementation of path modes. From what I can glean, the SQL:2016 standard dictates something like these SQL Server descriptions:
- In lax mode, the function returns empty values if the path expression contains an error. For example, if you request the value $.name, and the JSON text doesn’t contain a name key, the function returns null, but does not raise an error.
- In strict mode, the function raises an error if the path expression contains an error.
But the Postgres lax mode does more than suppress errors. From the docs (emphasis added):
The lax mode facilitates matching of a JSON document structure and path expression if the JSON data does not conform to the expected schema. If an operand does not match the requirements of a particular operation, it can be automatically wrapped as an SQL/JSON array or unwrapped by converting its elements into an SQL/JSON sequence before performing this operation. Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box.
There are a few more details, but this is the crux of it: In lax mode, which is the default, Postgres always unwraps an array. Hence the unexpected list of results.3 This could be particularly confusing when querying multiple rows:
select jsonb_path_query(v, '$.a ?(@[*] > 2)')
from (values ('{"a":[1,2,3,4,5]}'::jsonb), ('{"a":[3,5,8]}')) x(v);
jsonb_path_query
------------------
3
4
5
3
5
8
(6 rows)
Switching to strict mode by preprending strict
to the JSON Path query restores
the expected behavior:
select jsonb_path_query(v, 'strict $.a ?(@[*] > 2)')
from (values ('{"a":[1,2,3,4,5]}'::jsonb), ('{"a":[3,5,8]}')) x(v);
jsonb_path_query
------------------
[1, 2, 3, 4, 5]
[3, 5, 8]
(2 rows)
Important gotcha to watch for, and a good reason to test path queries thoroughly to ensure you get the results you expect. Lax mode nicely prevents errors when a query references a path that doesn’t exist, as this simple example demonstrates:
select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.b');
ERROR: JSON object does not contain key "b"
select jsonb_path_query('{"a":[1,2,3,4,5]}', 'lax $.b');
jsonb_path_query
------------------
(0 rows)
In general, I suggest always using strict mode when executing queries. Better
still, perhaps always prefer strict mode with our friends the @@
and @?
operators, which suppress some errors even in strict mode:
The jsonpath operators
@?
and@@
suppress the following errors: missing object field or array element, unexpected JSON item type, datetime and numeric errors. Thejsonpath
-related functions described below can also be told to suppress these types of errors. This behavior might be helpful when searching JSON document collections of varying structure.
Have a look:
select '{"a":[1,2,3,4,5]}' @? 'strict $.a';
?column?
----------
t
(1 row)
select '{"a":[1,2,3,4,5]}' @? 'strict $.b';
?column?
----------
<null>
(1 row)
No error for the unknown JSON key b
in that second query! As for the error
suppression in the jsonpath
-related functions, that’s what the silent
argument does. Compare:
select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.b');
ERROR: JSON object does not contain key "b"
select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.b', '{}', true);
jsonb_path_query
------------------
(0 rows)
Boolean Predicates
The Postgres SQL/JSON Path Language docs briefly mention a pretty significant deviation from the SQL standard:
A path expression can be a Boolean predicate, although the SQL/JSON standard allows predicates only in filters. This is necessary for implementation of the
@@
operator. For example, the followingjsonpath
expression is valid in PostgreSQL:
$.track.segments[*].HR < 70
This pithy statement has pretty significant implications for the return value
of a path query. The SQL standard allows predicate expressions, which are akin
to an SQL WHERE
expression, only in ?()
filters, as seen previously:
select jsonb_path_query('{"a":[1,2,3,4,5]}', '$ ?(@.a[*] > 2)');
jsonb_path_query
------------------------
{"a": [1, 2, 3, 4, 5]}
(1 row)
This can be read as “return the path $
if @.a[*] > 2
is true. But have a
look at a predicate-only path query:
select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a[*] > 2');
jsonb_path_query
------------------
true
(1 row)
This path query can be read as “Return the result of the predicate $.a[*] > 2
,
which in this case is true
. This is quite the divergence from the standard,
which returns contents from the JSON queried, while a predicate query returns
the result of the predicate expression itself. It’s almost like they’re two
different things!
Don’t confuse the predicate path query return value with selecting a boolean value from the JSON. Consider this example:
select jsonb_path_query('{"a":[true,false]}', '$.a ?(@[*] == true)');
jsonb_path_query
------------------
true
(1 row)
Looks the same as the predicate-only query, right? But it’s not, as shown by
adding another true
value to the $.a
array:
select jsonb_path_query('{"a":[true,false,true]}', '$.a ?(@[*] == true)');
jsonb_path_query
------------------
true
true
(2 rows)
This path query returns the true
s it finds in the $.a
array. The fact that
it returns values from the JSON rather than the filter predicate becomes more
apparent in strict mode, which returns all of $a
if one or more elements of
the array has the value true
:
select jsonb_path_query('{"a":[true,false,true]}', 'strict $.a ?(@[*] == true)');
jsonb_path_query
---------------------
[true, false, true]
(1 row)
This brief aside, and its mention of the @@
operator, turns out to be key to
understanding the difference between @?
and @@
. Because it’s not just that
this feature is “necessary for implementation of the @@
operator”. No, I would
argue that it’s the only kind of expression usable with the @@
operator
Match vs. Exists
Let’s get back to the @@
operator. We can use a boolean predicate JSON Path
like so:
select '{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 2';
?column?
----------
t
(1 row)
It returns true because the predicate JSON path query $.a[*] > 2
returns true.
And when it returns false?
select '{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 6';
?column?
----------
f
(1 row)
So far so good. What happens when we try to use a filter expression that returns
a true
value selected from the JSONB?
select '{"a":[true,false]}'::jsonb @@ '$.a ?(@[*] == true)';
?column?
----------
t
(1 row)
Looks right, doesn’t it? But recall that this query returns all of the
true
values from $.@
, but @@
wants only a single boolean. What happens
when we add another?
select '{"a":[true,false,true]}'::jsonb @@ 'strict $.a ?(@[*] == true)';
?column?
----------
<null>
(1 row)
Now it returns NULL
, even though it’s clearly true that @[*] == true
matches. This is because it returns all of the values it matches, as
jsonb_path_query()
demonstrates:
select jsonb_path_query('{"a":[true,false,true]}'::jsonb, '$.a ?(@[*] == true)');
jsonb_path_query
------------------
true
true
(2 rows)
This clearly violates the @@
documentation claim that “Only the first item of
the result is taken into account”. If that were true, it would see the first
value is true
and return true. But it doesn’t. Turns out, the corresponding
jsonb_path_match()
function shows why:
select jsonb_path_match('{"a":[true,false,true]}'::jsonb, '$.a ?(@[*] == true)');
ERROR: single boolean result is expected
Conclusion: The documentation is inaccurate. Only a single boolean is expected
by @@
. Anything else is an error.
Futhermore, it’s dangerous, at best, to use an SQL standard JSON Path expression
with @@
. If you need to use it with a filter expression, you can turn it into
a boolean predicate by wrapping it in exists()
:
select jsonb_path_match('{"a":[true,false,true]}'::jsonb, 'exists($.a ?(@[*] == true))');
jsonb_path_match
------------------
t
(1 row)
But there’s no reason to do so, because that’s effectively what the @?
operator (and the corresponding, cleverly-named jsonb_path_exists()
function
does): it returns true if the SQL standard JSON Path expression contains any
results:
select '{"a":[true,false,true]}'::jsonb @? '$.a ?(@[*] == true)';
?column?
----------
t
(1 row)
Here’s the key thing about @?
: you don’t want to use a boolean predicate path
query with it, either. Consider this predicate-only query:
select jsonb_path_query('{"a":[1,2,3,4,5]}'::jsonb, '$.a[*] > 6');
jsonb_path_query
------------------
false
(1 row)
But see what happens when we use it with @?
:
select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 6';
?column?
----------
t
(1 row)
It returns true even though the query itself returns false! Why? Because false
is a value that exists and is returned by the query. Even a query that returns
null
is considered to exist, as it will when a strict query encounters an
error:
select jsonb_path_query('{"a":[1,2,3,4,5]}'::jsonb, 'strict $[*] > 6');
jsonb_path_query
------------------
null
(1 row)
select '{"a":[1,2,3,4,5]}'::jsonb @? 'strict $[*] > 6';
?column?
----------
t
(1 row)
The key thing to know about the @?
operator is that it returns true if
anything is returned by the path query, and returns false only if nothing is
selected at all.
The Difference
In summary, the difference between the @?
and @@
JSONB operators is this:
@?
(andjsonb_path_exists()
) returns true if the path query returns any values — evenfalse
ornull
— and false if it returns no values. This operator should be used only with SQL-standard JSON path queries that select data from the JSONB. Do not use predicate-only JSON path expressions with@?
.@@
(andjsonb_path_match()
) returns true if the path query returns the single boolean valuetrue
and false otherwise. This operator should be used only with Postgres-specific boolean predicate JSON path queries, that return data from the predicate expression. Do not use SQL-standard JSON path expressions with@@
.
This difference of course assumes awareness of this distinction between
predicate path queries and SQL standard path queries. To that end, I submitted
a patch that expounds the difference between these types of JSON Path
queries, and plan to submit another linking these differences in the docs for
@@
and @?
.
Oh, and probably another to explain the difference in return values between strict and lax queries due to array unwrapping.
Thanks
Many thanks to Erik Wienhold for patiently answering my pgsql-hackers
questions and linking me to a detailed pgsql-general thread in which the
oddities of @@
were previously discussed in detail.
-
Well almost. The docs for
jsonb_path_query
actually say, about the last two arguments, “The optionalvars
andsilent
arguments act the same as forjsonb_path_exists
.” I replaced that sentence with the relevant sentences from thejsonb_path_exists
docs, about which more later. ↩︎ -
Though omitting the
vars
argument, as variable interpolation just gets in the way of understanding basic query result behavior. ↩︎ -
In fairness, the Oracle docs also discuss “implicit array wrapping and unwrapping”, but I don’t have a recent Oracle server to experiment with at the moment. ↩︎