On 30th of June, Alvaro Herrera committed patch:
Enable CHECK constraints to be declared NOT VALID This means that they can initially be added to a large existing table without checking its initial contents, but new tuples must comply to them; a separate pass invoked by ALTER TABLE / VALIDATE can verify existing data and ensure it complies with the constraint, at which point it is marked validated and becomes a normal part of the table ecosystem. An non-validated CHECK constraint is ignored in the planner for constraint_exclusion purposes; when validated, cached plans are recomputed so that partitioning starts working right away. This patch also enables domains to have unvalidated CHECK constraints attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT VALID, which can later be validated with ALTER DOMAIN / VALIDATE CONSTRAINT. Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various reviews, and Robert Hass for documentation wording improvement suggestions. This patch was sponsored by Enova Financial.
Some time ago I wrote about NOT VALID FOREIGN KEYS in 9.1. Now for 9.2, we'll be getting the same mechanism for CHECK constraints.
Normally one would add check constraint with:
ALTER TABLE test ADD CHECK ( FIELD >= 0 );
But now we can do it in two steps:
ALTER TABLE test ADD CHECK ( FIELD >= 0 ) NOT VALID; ALTER TABLE test VALIDATE CONSTRAINT test_field_check;
So. Let's see if, and if yes – why it's cool.
For tests, I created simple tables:
$ CREATE TABLE test (id serial, some_text text, FIELD int4); NOTICE: CREATE TABLE will CREATE implicit SEQUENCE "test_id_seq" FOR serial COLUMN "test.id" CREATE TABLE $ INSERT INTO test (some_text, FIELD) SELECT repeat('depesz', 50), CAST( random() * 120000 AS int4) FROM generate_series(1,1000000); INSERT 0 1000000 $ ALTER TABLE test ADD PRIMARY KEY (id); NOTICE: ALTER TABLE / ADD PRIMARY KEY will CREATE implicit INDEX "test_pkey" FOR TABLE "test" ALTER TABLE $ CREATE TABLE test2 (id serial, some_text text, FIELD int4); NOTICE: CREATE TABLE will CREATE implicit SEQUENCE "test2_id_seq" FOR serial COLUMN "test2.id" CREATE TABLE $ INSERT INTO test2 (id, some_text, FIELD) SELECT * FROM test; INSERT 0 1000000 $ ALTER TABLE test2 ADD PRIMARY KEY (id); NOTICE: ALTER TABLE / ADD PRIMARY KEY will CREATE implicit INDEX "test2_pkey" FOR TABLE "test2" ALTER TABLE
Structure and data in both tables are exactly the same. Each table, with indexes and toast takes ~ 360MB.
So, let's see speed and locks:
$ BEGIN; BEGIN *$ ALTER TABLE test ADD CHECK ( FIELD >= 0 ); ALTER TABLE TIME: 206.138 ms *$ SELECT * FROM pg_locks WHERE pid = pg_backend_pid(); locktype | DATABASE | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | GRANTED ---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+------+--------------------------+--------- relation | 16404 | 26486 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/295 | 9149 | ShareUpdateExclusiveLock | t relation | 16404 | 26486 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/295 | 9149 | ShareRowExclusiveLock | t transactionid | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 1399 | [NULL] | [NULL] | [NULL] | 2/295 | 9149 | ExclusiveLock | t relation | 16404 | 11000 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/295 | 9149 | AccessShareLock | t virtualxid | [NULL] | [NULL] | [NULL] | [NULL] | 2/295 | [NULL] | [NULL] | [NULL] | [NULL] | 2/295 | 9149 | ExclusiveLock | t (5 ROWS) *$ SELECT 11000::regclass, 26486::regclass; regclass | regclass ----------+---------- pg_locks | test (1 ROW)
OK. It took 206ms, and for this time it took ShareUpdateExclusiveLock and ShareRowExclusiveLock locks. Which, thanks to this table we know that it conflicts with everything except Access Share and Row Share – which basically turns table into read-only.
Now. What about new method?
First operation:
$ BEGIN; BEGIN TIME: 0.249 ms *$ ALTER TABLE test2 ADD CHECK ( FIELD >= 0 ) NOT VALID; ALTER TABLE TIME: 0.871 ms *$ SELECT * FROM pg_locks WHERE pid = pg_backend_pid(); locktype | DATABASE | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | GRANTED ---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+------+--------------------------+--------- virtualxid | [NULL] | [NULL] | [NULL] | [NULL] | 2/296 | [NULL] | [NULL] | [NULL] | [NULL] | 2/296 | 9149 | ExclusiveLock | t relation | 16404 | 26497 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/296 | 9149 | ShareUpdateExclusiveLock | t relation | 16404 | 26497 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/296 | 9149 | ShareRowExclusiveLock | t transactionid | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 1400 | [NULL] | [NULL] | [NULL] | 2/296 | 9149 | ExclusiveLock | t relation | 16404 | 11000 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/296 | 9149 | AccessShareLock | t (5 ROWS) *$ SELECT 26497::regclass; regclass ---------- test2 (1 ROW)
OK. So we had the same level of locks, buit only for 0.871ms.
Now. In another transaction, I can run the validation:
$ BEGIN; BEGIN TIME: 0.139 ms *$ ALTER TABLE test2 VALIDATE CONSTRAINT test2_field_check; ALTER TABLE TIME: 201.310 ms *$ SELECT * FROM pg_locks WHERE pid = pg_backend_pid(); locktype | DATABASE | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | GRANTED ---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+------+--------------------------+--------- relation | 16404 | 26497 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/297 | 9149 | ShareUpdateExclusiveLock | t transactionid | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 1401 | [NULL] | [NULL] | [NULL] | 2/297 | 9149 | ExclusiveLock | t virtualxid | [NULL] | [NULL] | [NULL] | [NULL] | 2/297 | [NULL] | [NULL] | [NULL] | [NULL] | 2/297 | 9149 | ExclusiveLock | t relation | 16404 | 11000 | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] | 2/297 | 9149 | AccessShareLock | t (4 ROWS)
OK. Time is the same as in traditional way. But the locking is different – we have only ShareUpdateExclusiveLock, and no more ShareRowExclusiveLock. This change means that all modifications on the table will work just fine during the validation phase!
Let's see if it's really the case.
First, I'll drop the constraint:
$ ALTER TABLE test2 DROP CONSTRAINT test2_field_check;
Now, let's re-add invalid check:
$ ALTER TABLE test2 ADD CHECK ( FIELD >= 0 ) NOT VALID;
With this in place, in first psql, I run:
psql-1 $ BEGIN; BEGIN psql-1 1*$ ALTER TABLE test2 VALIDATE CONSTRAINT test2_field_check; ALTER TABLE
and leave it in transaction.
Afterwards, I run another psql, and let's test some updates:
psql-2 $ UPDATE test2 SET FIELD = 50 WHERE id < 20; UPDATE 19 psql-2 $ DELETE FROM test2 WHERE id < 10; DELETE 9
All works. Sweet. That's really cool stuff.
Thanks for the write up. This features is really cool, now we can add new constraints to huge tables with little relative impact!
will we be able to add NOT VALID UNIQUE constraints? or will we have to fake it?
@Caleb: not yet. Maybe in some future patches in 9.2, but not yet.