On 6th of December, Alvaro Herrera committed patch:
Background worker processes Background workers are postmaster subprocesses that run arbitrary user-specified code. They can request shared memory access as well as backend database connections; or they can just use plain libpq frontend database connections. Modules listed in shared_preload_libraries can register background workers in their _PG_init() function; this is early enough that it's not necessary to provide an extra GUC option, because the necessary extra resources can be allocated early on. Modules can install more than one bgworker, if necessary. Care is taken that these extra processes do not interfere with other postmaster tasks: only one such process is started on each ServerLoop iteration. This means a large number of them could be waiting to be started up and postmaster is still able to quickly service external connection requests. Also, shutdown sequence should not be impacted by a worker process that's reasonably well behaved (i.e. promptly responds to termination signals.) The current implementation lets worker processes specify their start time, i.e. at what point in the server startup process they are to be started: right after postmaster start (in which case they mustn't ask for shared memory access), when consistent state has been reached (useful during recovery in a HOT standby server), or when recovery has terminated (i.e. when normal backends are allowed). In case of a bgworker crash, actions to take depend on registration data: if shared memory was requested, then all other connections are taken down (as well as other bgworkers), just like it were a regular backend crashing. The bgworker itself is restarted, too, within a configurable timeframe (which can be configured to be never). More features to add to this framework can be imagined without much effort, and have been discussed, but this seems good enough as a useful unit already. An elementary sample module is supplied. Author: Álvaro Herrera This patch is loosely based on prior patches submitted by KaiGai Kohei, and unsubmitted code by Simon Riggs. Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund, Heikki Linnakangas, Simon Riggs, Amit Kapila
The description in commit message is pretty detailed, so let me just post couple of comments.
In case it's not clear – this patch makes it possible to write extension/module that will make PostgreSQL start (and keep running, hopefully) additional process – basically a daemon.
Of course it is possible to run daemon always, using cronjob, or some kind of init scripts, but thanks to this patch – new process is bound to running PostgreSQL instance. Whenever PostgreSQL is started – it's started too. When it will be shutdown – the daemon will also be killed.
What's also great – such background process will have access to PostgreSQL shared_buffers.
As far as I can tell, it's not possible to write such program in, say, Perl, because the code for background process has to be compiled to “library". That is .so, or .dll.
Because of this, I will not write a test case for you (my C knowledge is, to put it delicately, not up to the task). Luckily, the patch provides also test contrib module – so we can see how it works.
When I just start my PostgreSQL, I see following processes:
=$ pid="$( head -n 1 data/postmaster.pid )"; ps uwf -p $pid --ppid $pid USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND pgdba 24108 0.0 0.1 171284 12840 pts/5 S 12:59 0:00 /home/pgdba/work/bin/postgres pgdba 24109 0.0 0.0 24808 788 ? Ss 12:59 0:00 \_ postgres: logger process pgdba 24111 0.0 0.6 171612 81824 ? Ss 12:59 0:00 \_ postgres: checkpointer process pgdba 24112 0.0 0.0 171284 1760 ? Ss 12:59 0:00 \_ postgres: writer process pgdba 24113 0.0 0.0 171284 5188 ? Ss 12:59 0:00 \_ postgres: wal writer process pgdba 24114 0.0 0.0 172568 2752 ? Ss 12:59 0:00 \_ postgres: autovacuum launcher process pgdba 24115 0.0 0.0 26904 868 ? Ss 12:59 0:00 \_ postgres: archiver process last was 000000010000000000000007 pgdba 24116 0.0 0.0 27456 1352 ? Ss 12:59 0:00 \_ postgres: stats collector process
Enabling is simple, I just change this line in postgresql.conf:
#shared_preload_libraries = '' # (change requires restart)
to:
shared_preload_libraries = 'worker_spi' # (change requires restart)
Afterwards:
=$ pg_ctl -D data/ -w restart waiting for server to shut down.... done server stopped waiting for server to start....2012-12-07 14:15:23.125 CET @ 29448 LOG: registering background worker: SPI worker 1 2012-12-07 14:15:23.126 CET @ 29448 LOG: registering background worker: SPI worker 2 2012-12-07 14:15:23.126 CET @ 29448 LOG: loaded library "worker_spi" done server started 14:15:24 pgdba@h3po4 ~ =$ pid="$( head -n 1 data/postmaster.pid )"; ps uwf -p $pid --ppid $pid USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND pgdba 29448 0.0 0.1 173436 13000 pts/5 S 14:15 0:00 /home/pgdba/work/bin/postgres pgdba 29449 0.0 0.0 26864 792 ? Ss 14:15 0:00 \_ postgres: logger process pgdba 29451 0.0 0.0 173436 976 ? Ss 14:15 0:00 \_ postgres: checkpointer process pgdba 29452 0.0 0.0 173436 984 ? Ss 14:15 0:00 \_ postgres: writer process pgdba 29453 0.0 0.0 173436 976 ? Ss 14:15 0:00 \_ postgres: wal writer process pgdba 29454 0.0 0.0 174812 2628 ? Ss 14:15 0:00 \_ postgres: autovacuum launcher process pgdba 29455 0.0 0.0 28960 652 ? Ss 14:15 0:00 \_ postgres: archiver process pgdba 29456 0.0 0.0 29512 1368 ? Ss 14:15 0:00 \_ postgres: stats collector process pgdba 29457 0.0 0.0 175328 5568 ? Ss 14:15 0:00 \_ postgres: bgworker: SPI worker 2 pgdba 29458 0.0 0.0 175324 5504 ? Ss 14:15 0:00 \_ postgres: bgworker: SPI worker 1
Why there are two of them?
Logs show:
2012-12-07 14:15:23.211 CET @ 29457 LOG: SPI worker 2 initialized WITH our schema2.counted ROWS 2012-12-07 14:15:23.211 CET @ 29458 LOG: SPI worker 1 initialized WITH schema1.counted
Which suggests that it's by design, and this particular pgworker is supposed to have two processes.
I started psql, and ran: select * from pg_stat_activity:
$ SELECT * FROM pg_stat_activity; -[ RECORD 1 ]----+-------------------------------- datid | 12063 datname | postgres pid | 29457 usesysid | 10 usename | pgdba application_name | client_addr | [NULL] client_hostname | [NULL] client_port | [NULL] backend_start | 2012-12-07 14:15:23.211041+01 xact_start | [NULL] query_start | [NULL] state_change | [NULL] waiting | f state | [NULL] query | -[ RECORD 2 ]----+-------------------------------- datid | 12063 datname | postgres pid | 29458 usesysid | 10 usename | pgdba application_name | client_addr | [NULL] client_hostname | [NULL] client_port | [NULL] backend_start | 2012-12-07 14:15:23.2114+01 xact_start | [NULL] query_start | [NULL] state_change | [NULL] waiting | f state | [NULL] query | -[ RECORD 3 ]----+-------------------------------- datid | 16392 datname | depesz pid | 29654 usesysid | 16384 usename | depesz application_name | psql client_addr | [NULL] client_hostname | [NULL] client_port | -1 backend_start | 2012-12-07 14:18:16.075852+01 xact_start | 2012-12-07 14:18:16.086198+01 query_start | 2012-12-07 14:18:16.086198+01 state_change | 2012-12-07 14:18:16.0862+01 waiting | f state | active query | SELECT * FROM pg_stat_activity;
I see that I have 3 connections, which matches what I expected – one connection for psql, and two for bgworkers.
As we can see both bgworkers (side note: they should set application_name, I think) connect to postgres database. Let's see what's there:
\dt *schema?.* List OF relations Schema | Name | TYPE | Owner -------------+--------------+-------+------- our schema2 | counted ROWS | TABLE | pgdba schema1 | counted | TABLE | pgdba (2 ROWS)
Tables are:
\d "our schema2"."counted rows" TABLE "our schema2.counted rows" COLUMN | TYPE | Modifiers --------+---------+----------- TYPE | text | VALUE | INTEGER | Indexes: "counted rows_unique_total" UNIQUE, btree (TYPE) WHERE TYPE = 'total'::text CHECK constraints: "counted rows_type_check" CHECK (TYPE = ANY (ARRAY['total'::text, 'delta'::text])) \d "schema1"."counted" TABLE "schema1.counted" COLUMN | TYPE | Modifiers --------+---------+----------- TYPE | text | VALUE | INTEGER | Indexes: "counted_unique_total" UNIQUE, btree (TYPE) WHERE TYPE = 'total'::text CHECK constraints: "counted_type_check" CHECK (TYPE = ANY (ARRAY['total'::text, 'delta'::text]))
Content of the tables:
SELECT * FROM "our schema2"."counted rows"; TYPE | VALUE ------+------- (0 ROWS) SELECT * FROM schema1.counted; TYPE | VALUE ------+------- (0 ROWS)
Now, according to description in worker_spi.c, I should insert some rows, with type being “delta", but first – single row with “total" type.
I assume that both tables work the same way, just each is kept by different background process.
So, couple of inserts:
$ INSERT INTO schema1.counted (TYPE, VALUE) VALUES ('total', 0); INSERT 0 1 $ INSERT INTO schema1.counted (TYPE, VALUE) VALUES ('delta', 1), ('delta', 5), ('delta', 10); INSERT 0 3 $ SELECT * FROM schema1.counted; TYPE │ VALUE ───────┼─────── total │ 0 delta │ 1 delta │ 5 delta │ 10 (4 ROWS) -- short wait $ SELECT * FROM schema1.counted; TYPE │ VALUE ───────┼─────── total │ 16 (1 ROW)
Nice. Seems to be working. Of course I could get the same effect by using triggers, but that's just an example.
If worker would die, or get killed – PostgreSQL will restart it.
All this is great, and I just can't wait for getting some real tools that use it. For starters – something like pgAgent. Or perhaps pgBouncer.
Great work guys, thank you.
I too was thinking pgAgent, or at least a version of it that doesn’t rely so much on wxWidgets. I may even get round to writing a non wxWidgets version at some stage, I’ve been threatening to do so for a while now, alas, spare time does not grow on trees!
Also, custom log rotation programs? Log file readers (akin to pgBadger perhaps, but in C). The list here is only limited by the imagination. A great patch IMHO, and a good blog as always.
Is there a typo in the select after the inserts:
select * from schema1.counted;
Should that be:
select * from schema2.counted
@Simon B:
no. The test worker summarizes within single table.
hi how do i get it to work if i accidentally drop the entire schema1, schema1.counted. is there a way to recreate the schema or?
@chaooo:
well, you can “create schema …”. but afterwards you’ll likely have to setup background worker again.
oh okay thank you