SQL/JSON is here! (kinda “Waiting for Pg 17”)

Amazing. Awesome. Well, but what is it? We could store json data in Pg since PostgreSQL 9.2 – so it's been there for over 12 years now. How is the new shiny thing different? What does it allow you to do?

Let's see if I can shed some light on it…

For starters: SQL/JSON is a standard. As in: written down by people not bound by “only for PostgreSQL" rules. This means that whatever is here will (eventually? hopefully?) work in other databases, once they will implement it.

We actually had parts of it in PostgreSQL for some time. JSON datatype, JSON PATH expressions. But now, we get MORE.

Up to this moment, we (generally speaking) had json and jsonb datatypes, some functions and operators (including the ones that used JSONPATH datatype). This all doesn't go away. But we get more stuff. Some of it is kinda redundant to what we could have done previously, but it is there now, in this new way, because it is part of the standard. Some of it wasn't possible earlier or, at the very least, wasn't simple earlier.

To do any kind of work on json, we need to, well, have any json value. This means we need some constructors.

Of course, we could have done it previously, using things like:

=$ SELECT '"test"'::json;

but SQL/JSON has a bunch of its own constructors. Let's see how to use them.

Constructors

JSON

The simplest possible things. Takes something that kinda makes like a json, and returns json. Example usage:

=$ SELECT JSON(123::text);
 json 
------
 123
(1 ROW)
 
=$ SELECT JSON( '"depesz"' );
   json   
----------
 "depesz"
(1 ROW)

Basically simple ::json cast. Will work only if input is text, bytea, json, or jsonb datatype. Otherwise, you will get an error:

=$ SELECT JSON( 123 );
ERROR:  cannot CAST TYPE INTEGER TO json
LINE 1: SELECT JSON( 123 );
                     ^

There is one bit of functionality. You can make the JSON() call validate that your object doesn't have duplicate keys.

Consider JSON value of {“a": 123, “a": 256}. Depending on how you'd cast it (to json, or jsonb) you get different things:

=$ SELECT '{"a": 123, "a": 256}'::json;
         json         
----------------------
 {"a": 123, "a": 256}
(1 ROW)
 
=$ SELECT '{"a": 123, "a": 256}'::jsonb;
   jsonb    
------------
 {"a": 256}
(1 ROW)

Plain JSON() call on this value will produce same output as cast to JSON:

=$ SELECT JSON( '{"a": 123, "a": 256}' );
         json         
----------------------
 {"a": 123, "a": 256}
(1 ROW)

but I can make it reject such (invalid?) objects, instead of silently modifying:

=$ SELECT JSON( '{"a": 123, "a": 256}' WITH UNIQUE KEYS );
ERROR:  duplicate JSON object KEY VALUE

JSON_SCALAR

Function that generates properly quoted JSON scalar value. For some values – numbers, and boolean values, it will be something unquoted, but for everything else, you will get quotes:

=$ SELECT
    JSON_SCALAR(1),
    JSON_SCALAR(-12.34),
    JSON_SCALAR(TRUE),
    JSON_SCALAR('depesz'),
    JSON_SCALAR(now()),
    JSON_SCALAR(NULL)\gx
-[ RECORD 1 ]-----------------------------------
json_scalar | 1
json_scalar | -12.34
json_scalar | TRUE
json_scalar | "depesz"
json_scalar | "2024-10-11T14:11:44.636294+02:00"
json_scalar |

Please note that the last value is SQL null. So it's not JSON value of null

JSON_SERIALIZE

This, to be honest, isn't really clear to me what it's for.

Docs say:

Converts an SQL/JSON expression into a character or binary string. The expression can be of any JSON type, any character string type, or bytea in UTF8 encoding. The returned type used in RETURNING can be any character string type or bytea. The default is text.

While I kinda understand what it says, and I understand that I can use it, for some reason, to generate BYTEA, I don't really understand the functionality.

Just for completeness:

=$ SELECT
    JSON_SERIALIZE( JSON_SCALAR( now() ) ),
    JSON_SERIALIZE( JSON_SCALAR( now() ) RETURNING BYTEA ) \gx
-[ RECORD 1 ]--+-----------------------------------------------------------------------
json_serialize | "2024-10-11T14:18:24.670959+02:00"
json_serialize | \x22323032342d31302d31315431343a31383a32342e3637303935392b30323a303022

It might be worth noting that while JSON_SCALAR (and other constructors) return JSON datatype, JSON_SERIALIZE produces text (or bytea):

=$ SELECT
    JSON_SCALAR( now() ),
    JSON_SERIALIZE( JSON_SCALAR( now() ) ),
    JSON_SERIALIZE( JSON_SCALAR( now() ) RETURNING BYTEA ) \gdesc
     COLUMN     | TYPE  
----------------+-------
 json_scalar    | json
 json_serialize | text
 json_serialize | bytea
(3 ROWS)

JSON_ARRAY

This thing has two modes of operations. In first it gets list of values:

=$ select json_array( ‘depesz', true, now() );
json_array
——————————————————
[“depesz", true, “2024-10-11T14:21:18.997901+02:00"]
(1 row)

In second, it takes a query:

=$ SELECT json_array( SELECT datname FROM pg_database ORDER BY datname );
                                             json_array                                             
----------------------------------------------------------------------------------------------------
 ["coll", "depesz", "depesz_explain", "pgdba", "postgres", "q", "template0", "template1", "x", "z"]
(1 ROW)

In the mode where you give it list of values, you can specify what to do with null values:

=$ SELECT json_array( 'depesz', NULL, 'test' );
     json_array     
--------------------
 ["depesz", "test"]
(1 ROW)
 
=$ SELECT json_array( 'depesz', NULL, 'test' ABSENT ON NULL );
     json_array     
--------------------
 ["depesz", "test"]
(1 ROW)
 
=$ SELECT json_array( 'depesz', NULL, 'test' NULL ON NULL );
        json_array        
--------------------------
 ["depesz", NULL, "test"]
(1 ROW)

In case of query based form, it always uses ‘ABSENT ON NULL', and it can't be changed.

Assitionally, you can set returning datatpoe, just like with JSON_SERIALIZE(), and it works in both modes:

=$ SELECT
    json_array( 1, 2, 3 ),
    json_array( 1, 2, 3 returning jsonb ),
    json_array( SELECT datname FROM pg_database ORDER BY datname ),
    json_array( SELECT datname FROM pg_database ORDER BY datname returning jsonb ) \gdesc
   COLUMN   | TYPE  
------------+-------
 json_array | json
 json_array | jsonb
 json_array | json
 json_array | jsonb
(4 ROWS)

JSON_ARRAYAGG

Basically aggregate function that produces json values. Usage is pretty simple:

=$ SELECT JSON_ARRAYAGG( datname ORDER BY LENGTH(datname) DESC, datname ) FROM pg_database;
                                           json_arrayagg                                            
----------------------------------------------------------------------------------------------------
 ["depesz_explain", "template0", "template1", "postgres", "depesz", "pgdba", "coll", "q", "x", "z"]
(1 ROW)

Just like any aggregate you can add ORDER BY clause, but because it's JSON thing you also get NULL/ABSENT ON NULL and RETURNING jsonb? modifiers:

=$ WITH src AS ( SELECT unnest(array['a', NULL, 'c' ]) AS f )
    SELECT
        json_arrayagg(f),
        pg_typeof( json_arrayagg(f) ),
        json_arrayagg(f NULL ON NULL),
        pg_typeof( json_arrayagg(f NULL ON NULL) ),
        json_arrayagg(f NULL ON NULL returning jsonb),
        pg_typeof( json_arrayagg(f NULL ON NULL returning jsonb) )
        FROM src \gx
-[ RECORD 1 ]-+-----------------
json_arrayagg | ["a", "c"]
pg_typeof     | json
json_arrayagg | ["a", NULL, "c"]
pg_typeof     | json
json_arrayagg | ["a", NULL, "c"]
pg_typeof     | jsonb

JSON_OBJECT

This is going to be fun. We already had json_object() functions, one operating on array of texts, and the other getting two arrays – for keys, and for values.

This new json_object kinda looks like function call, but is a different thing. So, explaining differences on IRC/Slack/Discord will be fun /s.

Anyway. Basic usage looks like this:

=$ SELECT json_object(
    'a' VALUE 123,
    'b': TRUE,
    'c' VALUE now(),
    'd': json_array( 1, 2, 'YEAH' )
);
                                       json_object                                       
-----------------------------------------------------------------------------------------
 {"a" : 123, "b" : TRUE, "c" : "2024-10-11T14:42:34.894498+02:00", "d" : [1, 2, "YEAH"]}
(1 ROW)

Depending on how you feel you can separate keys from values using keyword value, or by using colon :. Both work.

Just like in previous constructors, we can use NULL ON NULL or ABSENT ON NULL clauses (null matching works for values, not keys, NULL keys are invalid in any case).

Also, as usual, we have RETURNING clause that allows us to return JSONB.

Just like with JSON(), we can use WITH UNIQUE KEYS which will error out if any key is duplicated.

JSON_OBJECTAGG

Similarly to JSON_ARRAYAGG, it is basically an aggregate:

=$ SELECT JSON_OBJECTAGG( datname VALUE oid RETURNING JSONB ) FROM pg_database;
                                                                                     json_objectagg                                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"q": "530457", "x": "530456", "z": "530455", "coll": "529419", "pgdba": "530454", "depesz": "529711", "postgres": "5", "template0": "4", "template1": "1", "depesz_explain": "16414"}
(1 ROW)

Of course we can use NULL ON NULL or ABSENT ON NULL, and WITH/WITHOUT UNIQUE KEYS specifiers.

Which is actually interesting – what will happen if I have the same key MANY times, but generate with unique keys and use json/jsonb returning types?

=$ WITH s(k,v) AS ( VALUES ('a', 1), ('a', 2), ('b', 3 ) )
SELECT
    JSON_OBJECTAGG( k VALUE v returning json ),
    JSON_OBJECTAGG( k VALUE v returning jsonb )
FROM s;
        json_objectagg         |  json_objectagg  
-------------------------------+------------------
 { "a" : 1, "a" : 2, "b" : 3 } | {"a": 2, "b": 3}
(1 ROW)

Sweet. And now for…

Testing functions

This is a family of expressions in format of:

IS (NOT)? JSON _TYPE_

where _TYPE_ is one of: VALUE, SCALAR, ARRAY, OBJECT.

And additionally, you can add WITH/WITHOUT UNIQUE KEYS at the end.

In all of the cases we get boolean value that tells us if given thing is proper json thing.

Specifying WITH UNIQUE KEYS will check the whole thing, not only the top-most level.

Couple of examples:

=$ SELECT
    'depesz' IS JSON VALUE AS depesz,
    NULL IS JSON VALUE AS NULL,
    JSON_SCALAR( now() ) IS JSON SCALAR AS null_scalar,
    JSON_SCALAR( now() ) IS JSON ARRAY AS null_array,
    JSON_SCALAR( now() ) IS JSON OBJECT AS null_object,
    JSON_ARRAY( 1,2,3 ) IS JSON ARRAY AS array_array,
    JSON_ARRAY( 1,2,3 ) IS JSON OBJECT AS array_object,
    JSON_OBJECT( 'k': 'v' ) IS JSON ARRAY AS object_array,
    JSON_OBJECT( 'k': 'v' ) IS JSON OBJECT AS object_object,
    JSON_ARRAY( 1, 2, json_object( 'k': 1, 'k': 2 ) ) IS JSON ARRAY AS non_unique_array,
    JSON_ARRAY( 1, 2, json_object( 'k': 1, 'k': 2 ) ) IS JSON ARRAY WITH UNIQUE KEYS AS non_unique_array_with_unique
\gx
-[ RECORD 1 ]----------------+--
depesz                       | f
NULL                         | 
null_scalar                  | t
null_array                   | f
null_object                  | f
array_array                  | t
array_object                 | f
object_array                 | f
object_object                | t
non_unique_array             | t
non_unique_array_with_unique | f

Query functions

These are so interesting/complicated that they got their own section in manual. Let's see if I can explain them. A bit.

Before we can dig into them, one should know, and understand, at least to some extens, so called SQL/JSON PATH EXPRESSIONS.

These were available for quote some time now, and could have been used in all functions named jsonb_path_SOMETHING that you can find in the docs.

If you're not familiar with those – please read the waiting blogpost linked above, and/or the docs, or just play a bit with it.

JSON_EXISTS

So this is basic testing if given path expression exists in JSON. Basically like jsonb_path_exists() function.

So, let's see it:

=$ SELECT
    JSON_EXISTS( j, '$.k' ),
    JSON_EXISTS( j, '$.f' ),
    JSON_EXISTS( j, '$.a[5]' )
FROM
    ( VALUES ( json_object( 'k': 'v', 'a': json_array( 1, 2, 5 ) ) ) ) AS s (j);
 json_exists | json_exists | json_exists 
-------------+-------------+-------------
 t           | f           | f
(1 ROW)

Please note that in third example, I checked existence of 5th index of .a array. This worked because, by default, it's using “lax" processing. I could:

=$ SELECT
    JSON_EXISTS( j, 'strict $.a[5]' ERROR ON ERROR )
FROM
    ( VALUES ( json_object( 'k': 'v', 'a': json_array( 1, 2, 5 ) ) ) ) AS s (j);
ERROR:  jsonpath array subscript IS OUT OF bounds

This allows (strict processing) to set some other handlers for errors:

=$ SELECT
    JSON_EXISTS( j, 'strict $.a[5]' TRUE ON ERROR ),
    JSON_EXISTS( j, 'strict $.a[5]' FALSE ON ERROR ),
    JSON_EXISTS( j, 'strict $.a[5]' UNKNOWN ON ERROR ),
    JSON_EXISTS( j, 'lax $.a[5]' TRUE ON ERROR ),
    JSON_EXISTS( j, 'lax $.a[5]' FALSE ON ERROR ),
    JSON_EXISTS( j, 'lax $.a[5]' UNKNOWN ON ERROR )
FROM
    ( VALUES ( json_object( 'k': 'v', 'a': json_array( 1, 2, 5 ) ) ) ) AS s (j);
 json_exists | json_exists | json_exists | json_exists | json_exists | json_exists 
-------------+-------------+-------------+-------------+-------------+-------------
 t           | f           |             | f           | f           | f
(1 ROW)

Please note that in lax mode, the * ON ERROR clause doesn't do anything.

And of course, you can pass variables to path expressions, like this:

=$ SELECT
    JSON_EXISTS( j, '$.a[*] ? ( @ >= $good )' PASSING 5 AS good ),
    JSON_EXISTS( j, '$.a[*] ? ( @ >= $good )' PASSING 10 AS good )
FROM
    ( VALUES ( json_object( 'k': 'v', 'a': json_array( 1, 2, 5 ) ) ) ) AS s (j);
 json_exists | json_exists 
-------------+-------------
 t           | f
(1 ROW)

JSON_QUERY

Now, we're entering more complicated features.

To give you some perspective, syntax definition for json_query looks like:

JSON_QUERY (
context_item, path_expression
[ PASSING { VALUE AS varname } [, ...]]
[ RETURNING data_type [ FORMAT JSON [ ENCODING UTF8 ] ] ]
[ { WITHOUT | WITH { CONDITIONAL | [UNCONDITIONAL] } } [ ARRAY ] WRAPPER ]
[ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ]
[ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON EMPTY ]
[ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON ERROR ]) → jsonb

So. What it does. It takes jsonpath (second argument), and applies it on given data (first argument). And then it returns whatever the path returned.

Simple to explain, but there are some intersting details.

Let's start with something simple to see if I can show all the things I'd like to show.

Since getting all the data to each query will be rather tedious, let's make simple table with one column, and one row, that will contain test json:

=$ CREATE TABLE t (j jsonb);
=$ INSERT INTO t (j) VALUES ('{"k": "v", "a": [ 1,2,5,15], "n": { "s": "x", "b": [ 5, 10, 15 ] } }' );
=$ SELECT jsonb_pretty(j) FROM t;
   jsonb_pretty   
------------------
 {               +
     "a": [      +
         1,      +
         2,      +
         5,      +
         15      +
     ],          +
     "k": "v",   +
     "n": {      +
         "b": [  +
             5,  +
             10, +
             15  +
         ],      +
         "s": "x"+
     }           +
 }
(1 ROW)

With this in place, we can try to get some json_queries.

=$ WITH d AS (
    SELECT json_query( j, '$.k' ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof |  r  
-----------+-----
 jsonb     | "v"
(1 ROW)

In here you can see that all I did was gettinf value of k key from the object. Simple. I could also get it as normal text:

=$ WITH d AS (
    SELECT json_query( j, '$.k' returning text omit quotes ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof | r 
-----------+---
 text      | v
(1 ROW)

Please note that I had to use omit quotes, as if I didn't:

=$ WITH d AS (
    SELECT json_query( j, '$.k' returning text ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof |  r  
-----------+-----
 text      | "v"
(1 ROW)

I'd get still text, but instead of getting v I got 3 characters: “v". Which usually isn't what one would want.

Now. Let's assume you'd want to get elements from .a but only the ones more than one. Trivial, should be:

=$ WITH d AS (
    SELECT json_query( j, '$.a[*] ? ( @ > 1 )' ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof | r 
-----------+---
 jsonb     | 
(1 ROW)

Whoa. Why is r empty? The problem is that this expression returns multiple values, and this doesn't really work well. Json value can't have “1", and “2", and “3" – it can contain array of these things though. This is where wrappers come handy:

=$ WITH d AS (
    SELECT json_query( j, 'lax $.a[*] ? ( @ > 1 )' WITH array wrapper ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof |     r      
-----------+------------
 jsonb     | [2, 5, 15]
(1 ROW)

There is also a way to specify default value to be returned, in case json_query didn't match anything, or had an error.

In case of error, the clause is X ON ERROR, where X can be one of:

  • ERROR – will raise exception
  • NULL
  • EMPTY ARRAY
  • EMPTY OBJECT
  • DEFAULT ‘whatever'

This works the same way I described above error handling in JSON_EXISTS, so let's focus on handling empty return sets.

For example, what will happen if I'd like to get all values from .a that are above 50?

=$ WITH d AS (
    SELECT json_query( j, 'lax $.a[*] ? ( @ > 50 )' WITH array wrapper ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof | r 
-----------+---
 jsonb     | 
(1 ROW)

It's null. But I can make it into something else using magical X ON EMPTY clause:

=$ WITH d AS (
    SELECT json_query( j, 'lax $.a[*] ? ( @ > 50 )' WITH array wrapper DEFAULT '["nothing", "found"]' ON EMPTY) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof |          r           
-----------+----------------------
 jsonb     | ["nothing", "found"]
(1 ROW)

JSON_VALUE

This is very much like JSON_QUERY with one simple exception – you can't get multiple values from there. As such there is no WITH … WRAPPER clause. Also, KEEP/OMIT QUOTES is gone.

Generally this function makes sense if/when you are sure that you will be returning single scalar value. There is also one other big difference. Default return datatype is TEXT:

=$ WITH d AS (
    SELECT json_value( j, '$.k' ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof | r 
-----------+---
 text      | v
(1 ROW)

Of course you can use RETURNING clause to turn it to whatever you want 🙂

=$ WITH d AS (
    SELECT json_value( j, '$.a[2]' returning int8 ) AS r
    FROM t
)
SELECT pg_typeof(r), r FROM d;
 pg_typeof | r 
-----------+---
 BIGINT    | 5
(1 ROW)

JSON_TABLE

This is the BIG thing.

This one function has its own section in docs.

I checked, and it seems that everything that I wrote previously still works/applies. So let's get the description from there (with small changes to reflect current state of my understanding):

Let's start with some simple example:

=$ SELECT * FROM json_table(
    '[{"a":10,"b":20},{"a":30,"b":40}]'::jsonb,
    '$[*]'
    COLUMNS (
        column_a int4 path '$.a',
        column_b int4 path '$.b'
    )
);
 column_a | column_b 
----------+----------
       10 |       20
       30 |       40
(2 ROWS)

I assume the example is easy to understand.

Now, let's assume we want to add serial-like column, named id. Also, I'll show two more tricks:

=$ SELECT * FROM json_table(
    '[{"a":10,"b":20},{"a":30,"b":40}]'::jsonb,
    '$[*]'
    COLUMNS (
        id FOR ORDINALITY,
        column_a int4 path '$.a',
        column_b int4 path '$.b',
        a int4,
        b int4,
        c text
    )
);
 id | column_a | column_b | a  | b  | c 
----+----------+----------+----+----+---
  1 |       10 |       20 | 10 | 20 | 
  2 |       30 |       40 | 30 | 40 | 
(2 ROWS)

Adding serial-like id worked. And please note that i also added columns a and b – without specifying their paths. In case path is simply ‘$.SOMETHING' and you want the column to be named SOMETHING you don't need to explicitly state paths.

If there is no such field in the json, it will be returned as null.

Now, pretty commonly one has nested structures. For example, we could have json like this:

=$ SELECT jsonb_pretty(j) FROM sample;
              jsonb_pretty               
-----------------------------------------
 [                                      +
     {                                  +
         "title": "first post",         +
         "author": "depesz",            +
         "comments": [                  +
             {                          +
                 "body": "comment #1",  +
                 "author": "hubert"     +
             },                         +
             {                          +
                 "body": "comment #3",  +
                 "author": "lubaczewski"+
             },                         +
             {                          +
                 "body": "comment #5",  +
                 "author": "someone"    +
             }                          +
         ]                              +
     },                                 +
     {                                  +
         "title": "second post",        +
         "author": "depesz",            +
         "comments": [                  +
             {                          +
                 "body": "comment #2",  +
                 "author": "depesz"     +
             },                         +
             {                          +
                 "body": "comment #6",  +
                 "author": "anyone"     +
             }                          +
         ]                              +
     },                                 +
     {                                  +
         "title": "third post",         +
         "author": "someone else",      +
         "comments": [                  +
             {                          +
                 "body": "comment #4",  +
                 "author": "whoever"    +
             }                          +
         ]                              +
     }                                  +
 ]
(1 ROW)

Single row, nested structure.

With json_table I can:

=$ SELECT jt.* FROM sample,
    lateral json_table(
    j,
    '$[*]'
    COLUMNS (
        id FOR ORDINALITY,
        author text,
        title text,
        NESTED PATH '$.comments[*]'
        COLUMNS (
            comment_author text PATH '$.author',
            comment_body text PATH '$.body'
        )
    )
) AS jt;
 id |    author    |    title    | comment_author | comment_body 
----+--------------+-------------+----------------+--------------
  1 | depesz       | FIRST post  | hubert         | comment #1
  1 | depesz       | FIRST post  | lubaczewski    | comment #3
  1 | depesz       | FIRST post  | someone        | comment #5
  2 | depesz       | SECOND post | depesz         | comment #2
  2 | depesz       | SECOND post | anyone         | comment #6
  3 | someone ELSE | third post  | whoever        | comment #4
(6 ROWS)

There are, as previously many additional options:

  • default clauses for fields
  • exists clauses that return boolean or int depending on jsonpath existing in given place
  • format specifier (json/jsonb) that allows extraction value from json, as json, even if datatype used is, for example, text
  • with/without wrapper – working just like in json_query
  • keep/omit quotes
  • error handling, like X ON ERROR in json_query

And that kinda concludes the (long overdue) description. There are certainly aspects that I didn't cover with enough details, but if you plan on using any of these, I think that reading official documentation should be on your “todo" list anyway 🙂 And lots of experimentation.

Anyway – thanks a lot to all involved. There have been many patches over many years, authors of code, authors of docs, reviewers, testers. It's an amazing feature, and while I'm not really fan of using json blobs in databases, I can see that in some cases it makes sense, and having all these features is definitely great.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.