Every so often, on irc, someone asks how to get value from column that is passed as argument.
This is generally seen as not possible, as pl/PgSQL doesn't have support for dynamic column names.
We can work around it, though. Are the workarounds usable, in terms of performance?
To test it, I will need some sample data. This doesn't have to be big, but let's make it big enough so that I can test performance within Pg itself. Let's say 10 million rows.
=$ create table test ( id int4 generated always as identity primary key, payload int4 ); =$ insert into test (payload) select random() * 100000 - 50000 from generate_series(1,10000000);
This made me a table with 10M rows, where payload column is a random int from -50k to +50k.
To be able to sensibly compare the performance we will need some baseline. Since I was to measure effect of getting dynamic field only, I will not work using triggers, but instead will make a function that gets payload from row given as argument.
Baseline will be function that will statically get payload column:
=$ CREATE OR REPLACE FUNCTION get_static(in p_row test) RETURNS INT4 as $$ declare BEGIN RETURN p_row.payload; END; $$ language plpgsql;
I can see it work like this:
=$ select t.*, get_static(t) from test t limit 2; id | payload | get_static ----+---------+------------ 1 | 29712 | 29712 2 | -32547 | -32547 (2 rows)
OK. Now, let's see how to get the specific value. I will write a set of functions, each testing one approach, that will get the row and name of field, and will return value of the field.
First approach – plain pl/PgSQL, using EXECUTE:
=$ CREATE OR REPLACE FUNCTION get_dynamic_plpgsql(in p_row test, in p_column TEXT) RETURNS INT4 as $$ declare v_got INT4; BEGIN execute format('SELECT ($1).%I', p_column) using p_row INTO v_got; RETURN v_got; END; $$ language plpgsql;
Sanity check:
=$ select t.*, get_dynamic_plpgsql(t, 'payload') from test t where t.id > 10 limit 2; id | payload | get_dynamic_plpgsql ----+---------+--------------------- 11 | 40481 | 40481 12 | -17427 | -17427 (2 rows)
This works by building query like: SELECT ($1).payload, and then doing EXECUTE on it, passing whole row (t) as $1.
Next approach is to use hstore:
=$ create extension hstore; =$ CREATE OR REPLACE FUNCTION get_dynamic_hstore(in p_row test, in p_column TEXT) RETURNS INT4 as $$ declare BEGIN RETURN hstore(p_row)->p_column; END; $$ language plpgsql;
sanity check:
=$ select t.*, get_dynamic_hstore(t, 'payload') from test t where t.id > 30 limit 2; id | payload | get_dynamic_hstore ----+---------+-------------------- 31 | -6526 | -6526 32 | -34175 | -34175 (2 rows)
This works by first converting the whole row to hstore datatype (which is basically hash/associative array/dictionary), and then extracting single value from it using -> hstore operator.
Next idea – basically the same, but using JSON, and not hstore:
=$ CREATE OR REPLACE FUNCTION get_dynamic_json(in p_row test, in p_column TEXT) RETURNS INT4 as $$ declare BEGIN RETURN row_to_json(p_row) ->> p_column; END; $$ language plpgsql; =$ select t.*, get_dynamic_json(t, 'payload') from test t where t.id > 40 limit 2; id | payload | get_dynamic_json ----+---------+------------------ 41 | 45291 | 45291 42 | -37996 | -37996 (2 rows)
Since we have JSON, let's also try jsonb:
=$ CREATE OR REPLACE FUNCTION get_dynamic_jsonb(in p_row test, in p_column TEXT) RETURNS INT4 as $$ declare BEGIN RETURN to_jsonb(p_row) ->> p_column; END; $$ language plpgsql; =$ select t.*, get_dynamic_jsonb(t, 'payload') from test t where t.id > 50 limit 2; id | payload | get_dynamic_jsonb ----+---------+------------------- 51 | 18689 | 18689 52 | 19602 | 19602 (2 rows)
Finally, one more thing. When someone asks me about how to do it (dynamic column in plpgsql code), I generally suggest to use different language. For me the most natural language is Perl, so let's see:
=$ create extension plperl; =$ CREATE OR REPLACE FUNCTION get_dynamic_perl(test, TEXT) RETURNS INT4 as $$ my ($test, $field) = @_; return $test->{$field}; $$ language plperl; =$ select t.*, get_dynamic_perl(t, 'payload') from test t where t.id > 60 limit 2; id | payload | get_dynamic_perl ----+---------+------------------ 61 | -39619 | -39619 62 | -28303 | -28303 (2 rows)
How about performance? I ran each function like this:
=$ select sum(get_*(...)) from test t;
Each call was immediately repeated to get the most benefit from buffer cache. And then I took the better time. Results:
Function: | Time: | vs. get_static(): |
---|---|---|
get_static | 5,802.037 ms | – |
get_dynamic_hstore | 9,878.367 ms | + 70 % |
get_dynamic_json | 16,601.156 ms | + 186 % |
get_dynamic_perl | 17,119.276 ms | + 195 % |
get_dynamic_jsonb | 17,600.991 ms | + 203 % |
get_dynamic_plpgsql | 83,334.069 ms | + 1,336 % |
So, the results are in. Looks that if you want to get dynamic field you have to agree to at least 70% of reduced performance.
Of course – this doesn't meant that your trigger, or some other function/procedure, will take 70% more time.
In your real-life code there will be, probably, other things – some calculations, some checks. And, in case of triggers – there will be non-zero time to actually write the data.
With this in mind – you just have to test your case if it's worth it. But with the numbers above, I can see clearly that if I wanted to get dynamic field, and I'd want to do it in plpgsql – hstore is the fastest way.