One of the questions that gets asked quite a lot, is: how can you use variables in your queries? Other databases have them, does PostgreSQL?
Actually the answer is “no". But, it is actually very easy to work around, and what's more important – quite often, what you need are not variables in queries, but rather variables in client. What does it mean? Let's see.
It all depends on your usecase. We have, basically these options:
- Variables in psql (the command line tool)
- GUCs
- Per session variables in temp tables
- Per session and global variables in normal tables
Variables in psql
In quite a lot of cases, these will be enough for your usage. These are not evaluated in PostgreSQL backend, but in psql client.
Example usage:
$ \prompt 'How many rows? ' ROWS How many ROWS? 10 $ SELECT relname FROM pg_class LIMIT :ROWS; relname --------------------------------- pg_statistic pg_type pg_toast_2619 pg_toast_2619_index pg_authid_rolname_index pg_authid_oid_index pg_attribute_relid_attnam_index pg_attribute_relid_attnum_index pg_toast_1255 pg_toast_1255_index (10 ROWS)
Such variables can be set by using \prompt, or by using \set, or by running shell commands (on client, not on server). Afterwards you can use them, in queries, using notations like:
- :var – simply inserts value of var in the query at given location
- :'var' – inserts the value as properly quoted string – even if it includes ‘ characters
- :"var" – inserts the value as properly quoted identifier
So, you can, for example, write something like this:
$ \SET TABLE pg_class $ \SET MATCH oid $ \SET LIMIT 10 $ SELECT relname FROM :"table" WHERE relname ~ :'match' LIMIT :LIMIT; relname -------------------------- pg_authid_oid_index pg_am_oid_index pg_amop_oid_index pg_amproc_oid_index pg_aggregate_fnoid_index pg_cast_oid_index pg_collation_oid_index pg_database_oid_index pg_proc_oid_index pg_operator_oid_index (10 ROWS)
In PostgreSQL 9.3, we'll get the ability to set value of variable based on a query, which will let us do some cooler stuff, in simpler way than before (it is possible to set value of variable from a query in older versions too, but the way of doing this is not nice – example is in the blog post).
So, if you only need some shared value between multiple queries that you run in batch mode via psql – it's perfectly possible, and very simple.
The cool thing about psql variables, is that you can build whole queries in them. For example:
=$ \SET top 'select now() - query_start as runtime, query from pg_stat_activity where state <> ''idle'' order by query_start;' =$ :top
After pressing enter in the :top command, you'll getr information about queries that run.
GUCs
You might know that there are various Pg settings – like work_mem, shared_buffers, listen_addresses and so on.
What you might not know is that you can relatively easily setup your own variables. To do so, you need to use some kind of prefix (class). In PostgreSQL before 9.2, you had to configure the class in postgresql.conf, using:
custom_variable_classes = depesz
So you can use “depesz" class. In 9.2 and later classes are defined on use.
Usage of these variables is very simple – using functions. There are two functions that you'll need:
- current_setting(variable) – returns current value for given variable, raises exception on unknown variable
- set_config(variable, value, is_local) – sets variable to given value (and declares it if it wasn't declared before). If is_local is set to true, the change is only for current transaction. Otherwise – it's for current session.
So, let's see how it works. First, I'll try to use unknown variable:
$ SELECT current_setting('depesz.xx'); ERROR: unrecognized configuration parameter "depesz.xx"
OK. That worked. Now, let's set the value:
$ SELECT set_config('depesz.xx', '123', FALSE); set_config ------------ 123 (1 ROW) $ SELECT current_setting('depesz.xx'); current_setting ----------------- 123 (1 ROW)
Couple of important notes:
- variable cannot contain null. If you'll store null in variable, or if it becomes declared without value – it will have empty string as value
- if you'll use is_local set to true, and you're in AutoCommit mode – this is basically no-op (side effect will be declaring of the variable, but it will have empty string as value). This did bite me couple of times 🙂
- you can set default values per user and per database using normal “alter user/database set … = …"
There is (as far as I know) no way to use GUCs to store real global variable – across multiple sessions. The problem is that while you can use “ALTER … SET …", it is evaluated on connection, so it will not affect existing connections. This might be good for you, though.
Value of variable can be also fetched using “SHOW" sql command, and set using “SET" command, but since using function interface is simpler in case of writing my own functions, I tend to use functions to read/write the variables.
Per session variables in temp tables
In case you're on PostgreSQL 9.1 (or earlier), and you can't modify postgresql.conf to add your own custom_variable_classes, you can do more or less the same thing using some very simple functions.
Using temp tables makes it very fast, and there is no need to care about identification of session – you just get what you need.
For example, we can imagine two simple functions, like this:
CREATE OR REPLACE FUNCTION set_variable( IN p_var TEXT, IN p_val TEXT ) RETURNS void AS $$ DECLARE v_var TEXT; BEGIN EXECUTE 'CREATE temp TABLE IF NOT exists sys_variables ( variable TEXT PRIMARY KEY, value TEXT );'; LOOP EXECUTE 'UPDATE sys_variables SET value = $1 WHERE variable = $2 returning variable' INTO v_var USING p_val, p_var; IF v_var IS NOT NULL THEN RETURN; END IF; BEGIN EXECUTE 'INSERT INTO sys_variables ( variable, value ) VALUES ( $1, $2 )' USING p_var, p_val; RETURN; EXCEPTION WHEN unique_violation THEN -- ignore, re-process the loop END; END LOOP; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION get_variable( IN p_var TEXT ) RETURNS TEXT AS $$ DECLARE v_val TEXT; BEGIN EXECUTE 'CREATE temp TABLE IF NOT exists sys_variables ( variable TEXT PRIMARY KEY, value TEXT );'; EXECUTE 'SELECT value FROM sys_variables WHERE variable = $1' INTO v_val USING p_var; RETURN v_val; END; $$ LANGUAGE plpgsql;
And then you can use it like:
$ SELECT set_variable('xx', 'qq'); set_variable -------------- (1 ROW)
and
$ SELECT get_variable('xx'); NOTICE: relation "sys_variables" already EXISTS, skipping CONTEXT: SQL statement "CREATE temp TABLE IF NOT exists sys_variables ( variable TEXT PRIMARY KEY, value TEXT );" PL/pgSQL FUNCTION get_variable(text) line 5 at EXECUTE statement get_variable -------------- qq (1 ROW)
There is a problem with NOTICEs, though. It can be alleviated in couple of ways:
- move the “create table" to separate function, and call it just once, right after connecting to database
- add some kind of logic that will check if the temp table exists, and run ‘create temp table' only if it doesn't
- set client_min_messages = warning; to hide those notices
Per session and global variables in normal tables
This is the most complicated, but, on the other hand – the most versatile approach.
For starters, we will need real table – let's assume it's sys_variables, again. And we'll need to create it. Suggested idea:
CREATE TABLE sys_variables (
variable TEXT,
VALUE TEXT,
backend_pid INT4,
backend_start timestamptz
);
CREATE UNIQUE INDEX sys_variables_global_variable ON sys_variables (variable) WHERE backend_pid IS NULL;
CREATE UNIQUE INDEX sys_variables_session_variable ON sys_variables (variable, backend_pid, backend_start) WHERE backend_pid IS NOT NULL;
CREATE OR REPLACE FUNCTION get_variable( IN p_variable TEXT ) RETURNS TEXT AS $$
DECLARE
v_value TEXT;
BEGIN
SELECT sv.value INTO v_value FROM pg_stat_get_activity(pg_backend_pid()) AS x JOIN sys_variables sv ON x.pid = sv.backend_pid AND x.backend_start = sv.backend_start WHERE backend_pid IS NOT NULL AND variable = p_variable;
IF FOUND THEN
RETURN v_value;
END IF;
SELECT sv.value INTO v_value FROM sys_variables sv WHERE backend_pid IS NULL AND variable = p_variable;
IF FOUND THEN
RETURN v_value;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql stable;
CREATE OR REPLACE FUNCTION set_local_variable( IN p_variable TEXT, IN p_value TEXT ) RETURNS void AS $$
DECLARE
v_data record;
BEGIN
SELECT pid, backend_start INTO v_data FROM pg_stat_get_activity(pg_backend_pid());
LOOP
UPDATE sys_variables SET VALUE = p_value WHERE variable = p_variable AND backend_pid IS NOT NULL AND backend_pid = v_data.pid AND backend_start = v_data.backend_start;
IF FOUND THEN
RETURN;
END IF;
BEGIN
INSERT INTO sys_variables ( variable, VALUE, backend_pid, backend_start ) VALUES ( p_variable, p_value, v_data.pid, v_data.backend_start );
RETURN;
EXCEPTION WHEN unique_violation THEN
-- ignore, re-process the loop
END;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION set_global_variable( IN p_variable TEXT, IN p_value TEXT ) RETURNS void AS $$
DECLARE
BEGIN
LOOP
UPDATE sys_variables SET VALUE = p_value WHERE variable = p_variable AND backend_pid IS NULL;
IF FOUND THEN
RETURN;
END IF;
BEGIN
INSERT INTO sys_variables ( variable, VALUE, backend_pid, backend_start ) VALUES ( p_variable, p_value, NULL, NULL );
RETURN;
EXCEPTION WHEN unique_violation THEN
-- ignore, re-process the loop
END;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION sys_variables_cleanup( ) RETURNS void AS $$
DECLARE
BEGIN
DELETE FROM sys_variables WHERE backend_pid IS NOT NULL AND (backend_pid, backend_start) NOT IN ( SELECT a.pid, a.backend_start FROM pg_stat_activity AS a );
RETURN;
END;
$$ LANGUAGE plpgsql;
Is it scary? Don't worry, it's actually pretty simple thing. Let's see it step by step.
First part – lines 1-8 create the storage table, and unique indexes. We need two unique indexes – one for global variables (not assigned to any session, thus having backend_pid NULL, and session variables, assigned to some backend. Theoretically we'd need only backend_pid, but the problem is that in system with lots of connections the pid could be reused rather quickly. So I add also backend_start time, which has microsecond precision, and together with pid give me truly unique session identifier.
Next, lines 10-24, the getter. Inside, it first (line 14) checks if, for current session, there is session variable of given name. Data for session identification are taken from pg_stat_get_activity() function – the same that is used by commonly used pg_stat_activity view.
If such variable exists – the function returns it, and ends. If it doesn't – it then checks for global variable with the same name. Simple.
Functions set_local_variable (lines 26-45) and set_global_variable (lines 47-64) do basically the same – insert or update of a row, with proper values in variable, backend_pid and backend_start columns. Nothing worth talking about in here.
Last function is cleanup. The problem is that we can't delete session variables once the session ends – there are no “on disconnect" triggers. So, after some use of such system, we would have quite a lot of old session variables. Solution is to call this function ( sys_variables_cleanup – lines 66-72 ) from crontab. This removes old, obsolete rows.
Of course, this can get further improvement – for example, you might want to consider usage of unlogged tables. Or, perhaps, move session variables to temp tables. Or add a function that would remove session variable, so you could get value of global one. But this is not the really critical stuff.
Summary
So, the final question is: which one is the best? And if I can't answer it, which one do I use?
Well, there is no “best" solution. All solutions have benefits and drawbacks. And as for what I use – I don't. Really. When I need for writing quick batch scripts, I use psql variables, but that happens rarely. And if I need something in Pg itself, I simple end up writing a function anyway, and use variables in functions (or in DO blocks).
All things said – this blogpost does not describe all possible ways. For example, if you're writing a lot of functions in PL/Perl (or other PL/* languages, but not PL/PgSQL) – there is some kind of shared (between queries, within session) memory. In PL/Perl it's %_SHARED, in PL/Python it's SD, in other languages it's probably named differently.
Hi, if you have a slave machine with a big shared_buffers value, then creating and dropping temporary table leads to slowdown of WAL replay. So be careful with this approach (it leads to full sharded_buffers scan on slave every time the temp table is dropped). So if you use such temp tables very very often, that can influence your replication very negatively.