On 30th of March 2019, Peter Eisentraut committed patch:
Generated columns This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Discussion: https://www.postgresql.org/message-id/flat/-4019-bdb1-699e-@2ndquadrant.com
This change looks like relatively simple to explain. Let's consider table like:
=$ CREATE TABLE test ( id serial PRIMARY KEY, val_a int4 NOT NULL DEFAULT 0, val_b int4 NOT NULL DEFAULT 0, val_sum int4 generated always AS (val_a + val_b) stored ); CREATE TABLE
Now, I can:
=$ INSERT INTO test (val_a, val_b) VALUES (1, 2); INSERT 0 1 =$ SELECT * FROM test; id | val_a | val_b | val_sum ----+-------+-------+--------- 1 | 1 | 2 | 3 (1 ROW) =$ UPDATE test SET val_b = 5; UPDATE 1 =$ SELECT * FROM test; id | val_a | val_b | val_sum ----+-------+-------+--------- 1 | 1 | 5 | 6 (1 ROW)
If I'd try to update val_sum, or insert row with specified values for it, then:
=$ UPDATE test SET val_sum = 15; ERROR: COLUMN "val_sum" can ONLY be updated TO DEFAULT DETAIL: COLUMN "val_sum" IS a generated COLUMN. =$ INSERT INTO test (val_a, val_b, val_sum) VALUES (1,2,5); ERROR: cannot INSERT INTO COLUMN "val_sum" DETAIL: COLUMN "val_sum" IS a generated COLUMN.
even if val_sum value is correct:
=$ INSERT INTO test (val_a, val_b, val_sum) VALUES (1,2,3); ERROR: cannot INSERT INTO COLUMN "val_sum" DETAIL: COLUMN "val_sum" IS a generated COLUMN.
Generally – generated columns are basically read-only for user.
This kind of functionality was available before with triggers. But there are couple of differences.
First of all, \d of the table shows generated values in easier to read way than digging through triggers:
=$ \d test TABLE "public.test" COLUMN | TYPE | Collation | NULLABLE | DEFAULT ---------+---------+-----------+----------+-------------------------------------------- id | INTEGER | | NOT NULL | NEXTVAL('test_id_seq'::regclass) val_a | INTEGER | | NOT NULL | 0 val_b | INTEGER | | NOT NULL | 0 val_sum | INTEGER | | | generated always AS (val_a + val_b) stored Indexes: "test_pkey" PRIMARY KEY, btree (id)
Secondly – COPY, without mentioning columns, will copy just real columns, skipping all that are generated:
=$ copy test TO stdout WITH (format csv, header TRUE); id,val_a,val_b 1,1,5
This is nice addition that is basically impossible to do with triggers, unless you'll modify every COPY statement everywhere.
What's more – it's faster. I tried, and doing COPY of 100k rows to this table, with GENERATED val_sum took ~ 200ms on my computer, but when I changed it to be calculated by trivial pl/PgSQL trigger, the time was ~ 400ms – twice as long.
And thirdly, though it hasn't been done yet, there will be non-STORED generated columns.
Currently every generated column has to be marked as STORED, which means that table data, on disk, will contain calculated value for the column.
But in future, we will (hopefully) get virtual columns that are calculated on demand.
There are limitations, of course:
- generated column can't be part of partition key (whatever you use to partition your data with)
- generated column can't access other generated columns to calculate its value
- expression used to calculate generated column value has to be immutable – so either plain(ish) expression using columns from current row, or a function call to immutable function.
All in all – it looks pretty cool, and will definitely make certain things easier, or at least easier to read.
Thanks a lot, devs 🙂
I don’t think “immutable” is the correct word in this context. Normally immutability is not something you talk about for an expression or a function, but rather it is something objects can be. I think you mean “pure”, that is without side effects and only depended on its inputs which in this case I guess are the values of the other columns.
And yes, non-stored (but indexable) computed columns would be nice.
“Immutable” is the syntax word used in Postgres (and other SQL database, I believe) to mark a function which is as described. The concept is pretty much the same as “pure”. There is one important difference: to my knowledge Postgres makes no effort to ensure the function really is pure.
Okay, I did not realize that. Then i understand your choice of words better. Thank you.
Whether or not Postgres (or anything else) ensures it, does not change the pureness of a given function, so that is not really a difference 🙂
But the lack of check is probably an indication that you are allowed to make caching and other optimizations, that technically might make the function unpure, while it still exhibit the properties when invoked in practise. Kind of what the “mutable” keyword allows you with regards to const-ness in c++. And in that case it is probably wise to avoid the “pure” word to escape the wrath of the functional programming folks 🙂
Isaac, given that such a test would be equivalent to solving the Halting Problem, we don’t make promises we couldn’t possibly keep.
So generated columns can compute their value based on other stored generated columns? What happens if you create a loop? (E.g.
)
@Aristotle:
Not entirely sure why you asked the question, instead of simply trying it – it took me literally 3 seconds to test:
OK, so stored or not makes no difference.
The reason I asked is that I don’t have a Pg 12 around. 🙂
It’s in the PostgreSQL 12 documentation for the new feature also 🙂
Hi, thanks for the article.
Any idea if the immutable function can be a C function from a C extension? Can you pinky swear you’re immutable when defining a UDF from a C extension?
@Touisteur:
sure it can be from extension. and you define whether function is immutable on creation. If it will not be true then you have to deal with potential problems.