On 18th of March, Robert Haas committed patch:
Directly modify foreign tables. postgres_fdw can now sent an UPDATE or DELETE statement directly to the foreign server in simple cases, rather than sending a SELECT FOR UPDATE statement and then updating or deleting rows one-by-one. Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro Horiguchi, Albe Laurenz, Thom Brown, and me.
The meaning of this should be pretty obvious, but let's quickly see some test of how it works. First, on some Pg 9.4 I have:
$ \d test TABLE "public.test" COLUMN | TYPE | Modifiers ---------+---------+----------- id | INTEGER | NOT NULL payload | text | Indexes: "test_pkey" PRIMARY KEY, btree (id) (depesz@localhost:5432) 20:36:05 [depesz] $ SELECT * FROM test; id | payload ----+--------- 1 | test 1 2 | test 2 3 | test 3 4 | test 4 5 | test 5 6 | test 6 7 | test 7 8 | test 8 9 | test 9 10 | test 10 (10 ROWS)
Now, in the same database, I create the foreign table, using postgres_fdw:
$ CREATE server SOURCE FOREIGN DATA wrapper postgres_fdw options( dbname 'depesz', host 'localhost' ); CREATE SERVER $ CREATE USER mapping FOR depesz server SOURCE options ( USER 'depesz' ); CREATE USER MAPPING $ CREATE FOREIGN TABLE remote_table (id int4 NOT NULL, payload text) server SOURCE options (TABLE_NAME 'test'); CREATE FOREIGN TABLE $ SELECT * FROM remote_table ; id | payload ----+--------- 1 | test 1 2 | test 2 3 | test 3 4 | test 4 5 | test 5 6 | test 6 7 | test 7 8 | test 8 9 | test 9 10 | test 10 (10 ROWS)
Great. It works.
So, now, let's try to update the table remotely:
$ EXPLAIN analyze UPDATE remote_table SET payload = payload || 'x'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- UPDATE ON remote_table (cost=100.00..150.33 ROWS=1241 width=42) (actual TIME=3.647..3.647 ROWS=0 loops=1) -> FOREIGN Scan ON remote_table (cost=100.00..150.33 ROWS=1241 width=42) (actual TIME=0.677..0.709 ROWS=10 loops=1) Planning TIME: 0.066 ms Execution TIME: 4.135 ms (4 ROWS)
In postgresql log, I can see:
LOG: connection authorized: USER=depesz DATABASE=depesz LOG: duration: 0.142 ms statement: SET search_path = pg_catalog LOG: duration: 0.280 ms statement: SET timezone = 'UTC' LOG: duration: 0.073 ms statement: SET datestyle = ISO LOG: duration: 0.042 ms statement: SET intervalstyle = postgres LOG: duration: 0.067 ms statement: SET extra_float_digits = 3 LOG: duration: 0.060 ms statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ LOG: duration: 0.250 ms parse <unnamed>: DECLARE c1 CURSOR FOR SELECT id, payload, ctid FROM public.test FOR UPDATE LOG: duration: 0.380 ms bind <unnamed>: DECLARE c1 CURSOR FOR SELECT id, payload, ctid FROM public.test FOR UPDATE LOG: duration: 0.065 ms EXECUTE <unnamed>: DECLARE c1 CURSOR FOR SELECT id, payload, ctid FROM public.test FOR UPDATE LOG: duration: 0.149 ms statement: FETCH 100 FROM c1 LOG: duration: 0.198 ms parse pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 LOG: duration: 0.136 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,11)', $2 = 'test 1xx' LOG: duration: 0.069 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,11)', $2 = 'test 1xx' LOG: duration: 0.091 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,12)', $2 = 'test 2xx' LOG: duration: 0.044 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,12)', $2 = 'test 2xx' LOG: duration: 0.061 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,13)', $2 = 'test 3xx' LOG: duration: 0.040 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,13)', $2 = 'test 3xx' LOG: duration: 0.071 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,14)', $2 = 'test 4xx' LOG: duration: 0.059 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,14)', $2 = 'test 4xx' LOG: duration: 0.074 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,15)', $2 = 'test 5xx' LOG: duration: 0.043 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,15)', $2 = 'test 5xx' LOG: duration: 0.069 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,16)', $2 = 'test 6xx' LOG: duration: 0.046 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,16)', $2 = 'test 6xx' LOG: duration: 0.016 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,17)', $2 = 'test 7xx' LOG: duration: 0.040 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,17)', $2 = 'test 7xx' LOG: duration: 0.018 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,18)', $2 = 'test 8xx' LOG: duration: 0.042 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,18)', $2 = 'test 8xx' LOG: duration: 0.015 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,19)', $2 = 'test 9xx' LOG: duration: 0.047 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,19)', $2 = 'test 9xx' LOG: duration: 0.014 ms bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,20)', $2 = 'test 10xx' LOG: duration: 0.041 ms EXECUTE pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1 DETAIL: parameters: $1 = '(0,20)', $2 = 'test 10xx' LOG: duration: 0.018 ms statement: DEALLOCATE pgsql_fdw_prep_1 LOG: duration: 0.021 ms statement: CLOSE c1 LOG: duration: 73.715 ms statement: COMMIT TRANSACTION LOG: duration: 93.146 ms statement: EXPLAIN analyze UPDATE remote_table SET payload = payload || 'x';
That's quite a lot of things. As you can see, true to what commit message said, each row was updated separately.
In 9.6, though, it looks differently:
$ EXPLAIN analyze UPDATE remote_table SET payload = payload || 'x'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- UPDATE ON remote_table (cost=100.00..150.33 ROWS=1241 width=42) (actual TIME=0.408..0.408 ROWS=0 loops=1) -> FOREIGN UPDATE ON remote_table (cost=100.00..150.33 ROWS=1241 width=42) (actual TIME=0.407..0.407 ROWS=10 loops=1) Planning TIME: 0.120 ms Execution TIME: 2.400 ms (4 ROWS)
(please note that it's on different server, so times are not really comparable.)
And logs show:
LOG: connection authorized: USER=depesz DATABASE=depesz LOG: duration: 0.065 ms statement: SET search_path = pg_catalog LOG: duration: 0.145 ms statement: SET timezone = 'UTC' LOG: duration: 0.025 ms statement: SET datestyle = ISO LOG: duration: 0.011 ms statement: SET intervalstyle = postgres LOG: duration: 0.018 ms statement: SET extra_float_digits = 3 LOG: duration: 0.014 ms statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ LOG: duration: 0.170 ms parse <unnamed>: UPDATE public.test SET payload = (payload || 'x'::text) LOG: duration: 0.124 ms bind <unnamed>: UPDATE public.test SET payload = (payload || 'x'::text) LOG: duration: 0.050 ms EXECUTE <unnamed>: UPDATE public.test SET payload = (payload || 'x'::text) LOG: duration: 25.206 ms statement: COMMIT TRANSACTION LOG: duration: 28.431 ms statement: EXPLAIN analyze UPDATE remote_table SET payload = payload || 'x';
Much better, isn't it? Thank you, for all involved 🙂
postgresql9.6 added two other properties about postgres_fdw,which is sort push down and join pushdown.
It is difficult to understand these two properties , can you give some examples to help understand?
Please!
Thank you very much !!!
@puqun:
can you give me link to committers messages about it? or commit hashes?
https://www.postgresql.org/message-id/E1aBS4o-0002l1-Tv@gemulon.postgresql.org
postgres_fdw: Consider requesting sorted data so we can do a merge join.
When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join.
https://www.postgresql.org/message-id/E1aTDlV-0007xw-Vj@gemulon.postgresql.org
postgres_fdw: Push down joins to remote servers.
If we’ve got a relatively straightforward join between two tables,
this pushes that join down to the remote server instead of fetching
the rows for each table and performing the join locally. Some cases
are not handled yet, such as SEMI and ANTI joins. Also, we don’t
yet attempt to create presorted join paths or parameterized join
paths even though these options do get tried for a base relation
scan. Nevertheless, this seems likely to be a very significant win
in many practical cases.
@puqun:
well, from the description it doesn’t sound overly hard to understand what’s going on.
I’m hesitant to write “waiting for” for new optimizations for fdw cases for the same reason I generally don’t write about optimization changes in other code – it’s hard to show apples-to-apples comparison, and it doesn’t really change functionality. in case of fdw there is also the factor of it being really fringe feature, which might get more used in future, but now, I would say, is definitely not common.
In PostgreSQL9.6, when TRUE or FLASE is set to use_remote_estimate, the flow of new characteristic Push down joins to remote servers processing is different.
*********************
case 1:
*********************
1. Create table for test:
postgres=# CREATE TABLE t0 (i int);
CREATE TABLE
postgres=# INSERT INTO t0 VALUES (generate_series(1, 1000));
INSERT 0 1000
postgres=# create database db2;
CREATE DATABASE
postgres=#
postgres=# \c db2
You are now connected to database “db2” as user “postgres96r”.
db2=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION
db2=# CREATE SERVER server1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname ‘postgres’, port ‘55667’);
CREATE SERVER
db2=# CREATE USER MAPPING FOR public SERVER server1 OPTIONS (USER ‘postgres96r’);
CREATE USER MAPPING
db2=# CREATE SCHEMA remote1;
CREATE SCHEMA
db2=# IMPORT FOREIGN SCHEMA public FROM SERVER server1 INTO remote1;
IMPORT FOREIGN SCHEMA
db2=#
db2=# SELECT count(*) FROM remote1.t0;
count
——-
1000
(1 row)
2.Analyze the join operation of the foreign table.
db2=# alter server server1 options (ADD use_remote_estimate ‘true’);
ALTER SERVER
db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i AND a.i = 500;
QUERY PLAN
——————————————————————————————————————–
Foreign Scan (cost=100.00..135.03 rows=1 width=8) (actual time=1.068..1.069 rows=1 loops=1)
Output: a.i, b.i
Relations: (remote1.t0 a) INNER JOIN (remote1.t0 b)
Remote SQL: SELECT r1.i, r2.i FROM (public.t0 r1 INNER JOIN public.t0 r2 ON (((r2.i = 500)) AND ((r1.i = 500))))
Planning time: 3.654 ms
Execution time: 1.272 ms
(6 rows)
db2=# alter server server1 options (ADD use_remote_estimate ‘false’);
ERROR: option “use_remote_estimate” provided more than once
db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i AND a.i = 500;
QUERY PLAN
——————————————————————————————————————–
Foreign Scan (cost=100.00..135.03 rows=1 width=8) (actual time=1.002..1.003 rows=1 loops=1)
Output: a.i, b.i
Relations: (remote1.t0 a) INNER JOIN (remote1.t0 b)
Remote SQL: SELECT r1.i, r2.i FROM (public.t0 r1 INNER JOIN public.t0 r2 ON (((r2.i = 500)) AND ((r1.i = 500))))
Planning time: 2.767 ms
Execution time: 1.193 ms
(6 rows)
Thus,the setting of the parameter “use_remote_estimate” has a certain influence on the execution of the join operation.
*******
case2
******
Just change the SQL statement of join operation:
db2=# alter server server1 options (SET use_remote_estimate ‘false’);
ALTER SERVER
db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i;
QUERY PLAN
——————————————————————————————————————————–
Merge Join (cost=732.29..1388.59 rows=42778 width=8) (actual time=6.878..9.135 rows=1000 loops=1)
Output: a.i, b.i
Merge Cond: (a.i = b.i)
-> Sort (cost=366.15..373.46 rows=2925 width=4) (actual time=3.944..4.344 rows=1000 loops=1)
Output: a.i
Sort Key: a.i
Sort Method: quicksort Memory: 71kB
-> Foreign Scan on remote1.t0 a (cost=100.00..197.75 rows=2925 width=4) (actual time=1.037..3.171 rows=1000 loops=1)
Output: a.i
Remote SQL: SELECT i FROM public.t0
-> Sort (cost=366.15..373.46 rows=2925 width=4) (actual time=2.916..3.292 rows=1000 loops=1)
Output: b.i
Sort Key: b.i
Sort Method: quicksort Memory: 71kB
-> Foreign Scan on remote1.t0 b (cost=100.00..197.75 rows=2925 width=4) (actual time=0.385..2.350 rows=1000 loops=1)
Output: b.i
Remote SQL: SELECT i FROM public.t0
Planning time: 2.085 ms
Execution time: 10.980 ms
(19 rows)
db2=# alter server server1 options (SET use_remote_estimate ‘true’);
ALTER SERVER
db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i;
QUERY PLAN
—————————————————————————————————-
Foreign Scan (cost=127.50..176.25 rows=1000 width=8) (actual time=1.172..3.260 rows=1000 loops=1)
Output: a.i, b.i
Relations: (remote1.t0 a) INNER JOIN (remote1.t0 b)
Remote SQL: SELECT r1.i, r2.i FROM (public.t0 r1 INNER JOIN public.t0 r2 ON (((r1.i = r2.i))))
Planning time: 5.567 ms
Execution time: 3.668 ms
(6 rows)
After the change of SQL operation of the join, the settings of parameter “use_remote_estimate” has no effect on the join operation process.
How does the parameter “use_remote_estimate” affect the operation of the join?
Which factors are related to it?
Please give me some guidance.