Waiting for PostgreSQL 18 – Virtual generated columns

On 7th of February 2025, Peter Eisentraut committed patch:

Virtual generated columns
 
This adds a new variant of generated columns that are computed on read
(like a view, unlike the existing stored generated columns, which are
computed on write, like a materialized view).
 
The syntax for the column definition is
 
    ... GENERATED ALWAYS AS (...) VIRTUAL
 
and VIRTUAL is also optional.  VIRTUAL is the default rather than
STORED to match various other SQL products.  (The SQL standard makes
no specification about this, but it also doesn't know about VIRTUAL or
STORED.)  (Also, virtual views are the default, rather than
materialized views.)
 
Virtual generated columns are stored in tuples as null values.  (A
very early version of this patch had the ambition to not store them at
all.  But so much stuff breaks or gets confused if you have tuples
where a column in the middle is completely missing.  This is a
compromise, and it still saves space over being forced to use stored
generated columns.  If we ever find a way to improve this, a bit of
pg_upgrade cleverness could allow for upgrades to a newer scheme.)
 
The capabilities and restrictions of virtual generated columns are
mostly the same as for stored generated columns.  In some cases, this
patch keeps virtual generated columns more restricted than they might
technically need to be, to keep the two kinds consistent.  Some of
that could maybe be relaxed later after separate careful
considerations.
 
Some functionality that is currently not supported, but could possibly
be added as incremental features, some easier than others:
 
- index on or using a virtual column
- hence also no unique constraints on virtual columns
- extended statistics on virtual columns
- foreign-key constraints on virtual columns
- not-null constraints on virtual columns (check constraints are supported)
- ALTER TABLE / DROP EXPRESSION
- virtual column cannot have domain type
- virtual columns are not supported in logical replication
 
The tests in generated_virtual.sql have been copied over from
generated_stored.sql with the keyword replaced.  This way we can make
sure the behavior is mostly aligned, and the differences can be
visible.  Some tests for currently not supported features are
currently commented out.
 
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org

Long time ago we got first part of generated columns: identity columns. Then two versions later, generated, stored columns.

And now, we got generated, but not stored – virtual columns.

The idea is very simple – if you have something that can be calculated, you can have it named, available in table, without having to type the full expression to generate the value. And without storing the data.

Let's see it in action.

First, let's make a test table, with no generated columns, and see what will happen once I will add these generated ones:

=$ create table test (
    id int8 generated always as identity primary key,
    n int4
);
 
=$ insert into test (n) select random() * 200 + 100 from generate_series(1,10000) i;

This made me a table with 10k rows with some random n values:

=$ select * from test limit 5;
 id |  n  
----+-----
  1 | 187
  2 | 150
  3 | 256
  4 | 210
  5 | 237
(5 rows)

Total size of the table, as file(s) on disk is 712kB:

=$ select pg_total_relation_size('test'::regclass);
 pg_total_relation_size 
------------------------
                 729088
(1 row)

Now, let's assume we want to add text column that will have the word “PgSQL" repeated n times. If I'll add it as virtual:

=$ alter table test add column v_text text generated always as ( repeat(‘PgSQL', n) ) virtual;
ALTER TABLE

Table size will not change:

=$ select pg_total_relation_size('test'::regclass);
 pg_total_relation_size 
------------------------
                 729088
(1 row)

But the data will be there (i'll truncate it so that it's not too wide):

=$ select id, n, length(v_text), substr(v_text, 1, 27) || '…'
    from test
    limit 5;
 id |  n  | length |           ?column?           
----+-----+--------+------------------------------
  1 | 187 |    935 | PgSQLPgSQLPgSQLPgSQLPgSQLPg…
  2 | 150 |    750 | PgSQLPgSQLPgSQLPgSQLPgSQLPg…
  3 | 256 |   1280 | PgSQLPgSQLPgSQLPgSQLPgSQLPg…
  4 | 210 |   1050 | PgSQLPgSQLPgSQLPgSQLPgSQLPg…
  5 | 237 |   1185 | PgSQLPgSQLPgSQLPgSQLPgSQLPg…
(5 rows)

Compare this with previous approach, stored:

=$ alter table test add column s_text text generated always as ( repeat('PgSQL', n) ) stored;

where table size changed by quite a lot:

=$ select pg_total_relation_size('test'::regclass);
 pg_total_relation_size 
------------------------
               11788288
(1 row)

There are usecases where this would be very helpful (at the very least so that queries can be a bit simpler).

Also, adding virtual columns is very fast – you just modify metainformation, table data stays the same, so there is no rewrite needed.

With stored columns – table has to be rewritten (if we add the column to existing table), so that the new column values can actually be stored.

Unfortunately it's not without certain limitations. Commit message lists them, so I will just point one – you can't have index on virtual column.

If we could then this would be absolutely awesome solution to indexing with tsearch. Normally people build tsvector in some column, and then add index on it. If we could have index on virtual columns, we could make the virtual column definition be arbitrarily complicated, put gist/gin index on it, and have great index, without having to retype complicated expression on each search, and without “wasting" disk space on storing tsvector data in table.

I assume/hope that these missing bits will be added soon(ish), as it would greatly extend usability of virtual columns.

But, even without it, I love the new functionality, and hope to get to use it in my own things soon.

Thanks a lot to everyone involved.

2 thoughts on “Waiting for PostgreSQL 18 – Virtual generated columns”

  1. @sv:

    Well, no, they don’t:

    postgres=# select version();
                                                               version
    -----------------------------------------------------------------------------------------------------------------------------
     PostgreSQL 17.4 (Ubuntu 17.4-1.pgdg24.10+2) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 14.2.0-4ubuntu2) 14.2.0, 64-bit
    (1 row)
     
    postgres=# create table q (x int4, y int4 generated always as (x + 1) virtual);
    ERROR:  syntax error at or near "virtual"
    LINE 1: ...able q (x int4, y int4 generated always as (x + 1) virtual);
                                                                  ^

    You can see it in docs for create table

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.