On 3rd of November Andrew Dunstan committed his patch which adds new function to PostgreSQL – suppress_redundant_updates_trigger().
This function is not for using in selects, but it can help you tremendously if your database access matches certain pattern.
To be more precise – if you have a lot of updates that don't really update anything (think: update table set field = field).
This is usually generated by ORMs, but it can also show in hand made queries which were written in suboptimal way:
UPDATE TABLE SET FIELD = 'some_value' WHERE id IN (...)
The problem with above query is that in case some of updated rows already contained ‘some_value' in “field" column – they will be updated without real reason (unless you count on trigger side-effects).
Andrew function is to be used as trigger function, to prevent such updates.
Usage is pretty simple:
CREATE TRIGGER some_name BEFORE UPDATE ON some_table FOR each ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();
What does it give us?
Let's check, without this trigger:
# CREATE TABLE x (i int4); CREATE TABLE # INSERT INTO x (i) SELECT * FROM generate_series(1, 10000000); INSERT 0 10000000 # SELECT pg_total_relation_size('x'); pg_total_relation_size ------------------------ 321355776 (1 ROW) # UPDATE x SET i = i; UPDATE 10000000 TIME: 135216.299 ms # SELECT pg_total_relation_size('x'); pg_total_relation_size ------------------------ 642695168 (1 ROW)
As you can see size of the table doubled – the same would happen with basically any index on the table.
But, let's check how it will go with new trigger:
# CREATE TABLE x (i int4); CREATE TABLE # CREATE TRIGGER some_name BEFORE UPDATE ON x FOR each ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger(); CREATE TRIGGER # INSERT INTO x (i) SELECT * FROM generate_series(1, 10000000); INSERT 0 10000000 # SELECT pg_total_relation_size('x'); pg_total_relation_size ------------------------ 321355776 (1 ROW) # UPDATE x SET i = i; UPDATE 0 TIME: 83754.151 ms # SELECT pg_total_relation_size('x'); pg_total_relation_size ------------------------ 321355776 (1 ROW)
As you can notice size of the table didn't change. Also – please note that the update was faster. Much faster.
What can I say – it's great that this is bundled in PostgreSQL.
If you need more explanation on the subject – please check previous articles on this topic:
Under what circumstances would creating the MVCC row for a pointless change be a good thing?
@Corey:
define “pointless”.
to check if anything changed postgresql would have to check previous row against new one. doing this for every update seems to be waste of resources – generally it’s not big problem to write code/queries that will not make empty updates.
if you can’t write such queries – there is trigger which adds this check – of course it comes at a price – but now you can choose if you want to pay the price of checking for every row/update or not.
additionally – you might want the pointless change because of trigger side effects.
like trigger which counts updates for statistical purposes.
Wouldn’t this be better as either a configuration option or a flag for the possibly redundant trigger(s)?
@Steve:
configuration to what?
Basically – you can add this to all, some (1?) or no tables.
so this “configuration option” should be per table – what’s the difference then if it’s configuration option or trigger?
Careful if you use this with something like the “merge_db” function (Example 38-1) found in the docs here:
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
Bascially, if you’re trying to do an “UPSERT”, this will make it appear as if the row does not exist, which may lead to inserting duplicates (or lots of error messages).
Very nice, but how fast is it compared to the homegrown triggers you have written about before? And also compared to no triggers.