On 16th of July, Kevin Grittner committed patch:
Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. This allows reads to continue without any blocking while a REFRESH runs. The new data appears atomically as part of transaction commit. Review questioned the Assert that a matview was not a system relation. This will be addressed separately. Reviewed by Hitoshi Harada, Robert Haas, Andres Freund. Merged after review with security patch f3ab5d4.
Back when Materialized Views where introduced refreshing materialized view data required obtaining AccessExclusiveLock – thus effectively blocking all concurrent access to the view.
Not, luckily, we can do better.
Let's see it:
psql-1> CREATE materialized VIEW test AS SELECT now() AS refreshed, i FROM generate_series(1,2) i; SELECT 2
Then, I can:
psql-1> SELECT now(), * FROM test; now | refreshed | i -------------------------------+-------------------------------+--- 2013-07-22 15:31:20.265253+02 | 2013-07-22 15:31:07.814336+02 | 1 2013-07-22 15:31:20.265253+02 | 2013-07-22 15:31:07.814336+02 | 2 (2 ROWS)
OK. We can see that the refreshed column is lagging, so it's ok.
Now, let's see previous approach to refresh, with 2nd session trying to get access to the view:
psql-1> BEGIN; BEGIN psql-1> refresh materialized VIEW test; REFRESH MATERIALIZED VIEW
I didn't commit it. In other psql:
psql-2> SET statement_timeout = 1000; SET psql-2> SELECT * FROM test; ERROR: canceling statement due TO statement timeout
As expected – refresh locks the view.
But what about doing it concurrently?
I did rollback to the refresh transaction, and then:
psql-1> BEGIN; BEGIN psql-1> refresh materialized VIEW concurrently test; ERROR: cannot refresh materialized VIEW "public.test" concurrently HINT: CREATE a UNIQUE INDEX WITH no WHERE clause ON one OR more COLUMNS OF the materialized VIEW.
OK. So, I have to add index, on the view:
psql-1> CREATE UNIQUE INDEX some_index ON test (i); CREATE INDEX psql-1> BEGIN; BEGIN psql-1> refresh materialized VIEW concurrently test; REFRESH MATERIALIZED VIEW
Afterwards, it just works, which is fine. Now – I didn't commit the transaction at this point, so in 2nd psql I tested access:
psql-2> SELECT now(), * FROM test; now | refreshed | i -------------------------------+-------------------------------+--- 2013-07-22 15:35:30.611278+02 | 2013-07-22 15:31:07.814336+02 | 1 2013-07-22 15:35:30.611278+02 | 2013-07-22 15:31:07.814336+02 | 2 (2 ROWS)
Please note the refreshed is still old. After I did this select, I committed refresh in 1st psql, and re-run the select * from test:
psql-2> SELECT now(), * FROM test; now | refreshed | i -------------------------------+-------------------------------+--- 2013-07-22 15:35:35.817046+02 | 2013-07-22 15:35:08.925637+02 | 2 2013-07-22 15:35:35.817046+02 | 2013-07-22 15:35:08.925637+02 | 1 (2 ROWS)
So – until transaction with refresh ends – I can query, and will see old data. After it will be committed – I will see new data. Just as expected. All fine.
To be honest, I am not entirely sure why there is requirement for unique index – I thought that refresh would just create new “file" with data, and after finishing, it would just switch which “file" pg reads data from. But, I am by no means Pg hacker, so there have to be some important reason why unique index is needed.
Anyway – it's great addition, and one that will make the materialized views much more functional.
The reason a unique index is required is because when refreshing a materialized view concurrently, it creates a temporary table containing the new materialized data, then performs a full outer join against the old data, and this can’t be done if there are any duplicate rows.