Couple of days earlier I wrote about tools to run tasks in db at scheduled times.
In writing this I missed pg_cron. So let's look into it now.
Just like previously, I'm on Ubuntu 20.04, with PGDB apt repo, so installing pg_cron is trivial:
=$ sudo apt-get install postgresql-13-cron
Afterwards I need to enable it, by adding pgcron to preloaded libraries:
$ SHOW shared_preload_libraries; shared_preload_libraries ────────────────────────── (1 ROW) $ ALTER system SET shared_preload_libraries = 'pg_cron'; ALTER SYSTEM
This can be changed only on Pg start, so I have to restart my server:
=$ sudo systemctl restart postgresql@13-main.service
afterwards quick check if it's really enabled:
$ SHOW shared_preload_libraries ; shared_preload_libraries ────────────────────────── pg_cron (1 ROW)
All is well. What's more, we can see that it's loaded from shell:
=$ ps f -u postgres PID TTY STAT TIME COMMAND 61445 ? Ss 0:00 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13/main -c config_file=/etc/postgresql/13/main/postgresql.conf 61458 ? Ss 0:00 \_ postgres: 13/main: checkpointer 61459 ? Ss 0:00 \_ postgres: 13/main: background writer 61460 ? Ss 0:00 \_ postgres: 13/main: walwriter 61461 ? Ss 0:00 \_ postgres: 13/main: autovacuum launcher 61462 ? Ss 0:00 \_ postgres: 13/main: stats collector 61463 ? Ss 0:00 \_ postgres: 13/main: pg_cron launcher 61464 ? Ss 0:00 \_ postgres: 13/main: logical replication launcher
As you can see, there is a process, pid 61463, that's actually “pg_cron launcher". This should handle all the things that pg_cron should do. So, let's try to use it.
$ CREATE EXTENSION pg_cron; CREATE EXTENSION $ CREATE TABLE testit (id serial PRIMARY KEY, inserted_on timestamptz, inserted_by int4); CREATE TABLE $ SELECT cron.schedule('test-insert', '* * * * *', 'insert into testit (inserted_on, inserted_by) values (now(), pg_backend_pid())'); schedule ────────── 1 (1 ROW)
After two minutes:
$ SELECT * FROM testit; id │ inserted_on │ inserted_by ────┼───────────────────────────────┼───────────── 1 │ 2021-01-28 16:15:00.033298+01 │ 63466 2 │ 2021-01-28 16:16:00.029691+01 │ 63540 (2 ROWS)
Nice. Looks to be working.
What do we have available?
First, tables:
$ \dt cron. List OF relations Schema │ Name │ TYPE │ Owner ────────┼─────────────────┼───────┼────────── cron │ job │ TABLE │ postgres cron │ job_run_details │ TABLE │ postgres (2 ROWS)
Names suggest the content, but let's see it:
$ SELECT * FROM cron.job; jobid │ schedule │ command │ nodename │ nodeport │ DATABASE │ username │ active │ jobname ───────┼───────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼───────────── 1 │ * * * * * │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ localhost │ 5432 │ postgres │ postgres │ t │ test-INSERT (1 ROW) $ SELECT * FROM cron.job_run_details ; jobid │ runid │ job_pid │ DATABASE │ username │ command │ STATUS │ return_message │ start_time │ end_time ───────┼───────┼─────────┼──────────┼──────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┼────────────────┼───────────────────────────────┼─────────────────────────────── 1 │ 1 │ 63466 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 1 │ 2021-01-28 16:15:00.033227+01 │ 2021-01-28 16:15:00.036568+01 1 │ 2 │ 63540 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 1 │ 2021-01-28 16:16:00.029662+01 │ 2021-01-28 16:16:00.032796+01 1 │ 3 │ 63614 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 1 │ 2021-01-28 16:17:00.035651+01 │ 2021-01-28 16:17:00.038934+01 1 │ 4 │ 63715 │ postgres │ postgres │ INSERT INTO testit (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) │ succeeded │ INSERT 0 1 │ 2021-01-28 16:18:00.021993+01 │ 2021-01-28 16:18:00.025342+01 (4 ROWS)
Pretty cool. Now, let's see what functions we have:
$ \df cron. List OF functions Schema │ Name │ RESULT DATA TYPE │ Argument DATA types │ TYPE ────────┼──────────────────────┼──────────────────┼────────────────────────────────────────────┼────── cron │ job_cache_invalidate │ TRIGGER │ │ func cron │ schedule │ BIGINT │ job_name name, schedule text, command text │ func cron │ schedule │ BIGINT │ schedule text, command text │ func cron │ unschedule │ BOOLEAN │ job_id BIGINT │ func cron │ unschedule │ BOOLEAN │ job_name name │ func (5 ROWS)
Well, it looks that we can schedule jobs with job name, and without it, and unschedule based on jobid or job_name, if it is there. That's simple. But what is job_cache_invalidate?
This is actually a function called by trigger on cron.job table, that apparently clears some cache about schedule. Nothing that we need to worry about if we don't do anything crazy.
There are couple of interesting points:
- pg_cron makes sure only one copy of a given job runs at the same time
- cronjob definition is lifted from vixie cron, so it should handle things like “3-59/17 * * * *" – which can be great for spreading load.
One side thing – all pg_cron config is in single database, postgres by default, but it looks that you can make cronjobs work in any db. Let's try. First, let's clean the situation:
$ SELECT * FROM cron.unschedule(1); $ DELETE FROM cron.job_run_details WHERE jobid = 1;
Now, in another database, let's make test table:
=$ psql -U depesz -c "create table testit2 (id serial primary key, inserted_on timestamptz, inserted_by int4);"
Please note that it is different name from previously.
Now, I can add the schedule, in postgres db:
$ SELECT cron.schedule('test2-insert', '* * * * *', 'insert into testit2 (inserted_on, inserted_by) values (now(), pg_backend_pid())'); schedule ────────── 2 (1 ROW)
After a short while, I saw in job_run_details:
$ SELECT * FROM cron.job_run_details \gx ─[ RECORD 1 ]──┬──────────────────────────────────────────────────────────────────────────────── jobid │ 2 runid │ 16 job_pid │ 65402 DATABASE │ postgres username │ postgres command │ INSERT INTO testit2 (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) STATUS │ failed return_message │ ERROR: relation "testit2" does NOT exist ↵ │ LINE 1: INSERT INTO testit2 (inserted_on, inserted_by) VALUES (now()... ↵ │ ^ ↵ │ start_time │ 2021-01-28 16:30:00.025378+01 end_time │ 2021-01-28 16:30:00.026865+01
This failed, because it was ran in database postgres, but the testit2 table is in depesz.
So, let's update the job:
$ UPDATE cron.job SET DATABASE='depesz' WHERE jobid = 2; UPDATE 1 $ SELECT * FROM cron.job \gx ─[ RECORD 1 ]───────────────────────────────────────────────────────────────────────────── jobid │ 2 schedule │ * * * * * command │ INSERT INTO testit2 (inserted_on, inserted_by) VALUES (now(), pg_backend_pid()) nodename │ localhost nodeport │ 5432 DATABASE │ depesz username │ postgres active │ t jobname │ test2-INSERT
And now, after some time:
$ SELECT * FROM testit2; id │ inserted_on │ inserted_by ────┼───────────────────────────────┼───────────── 1 │ 2021-01-28 16:31:00.017937+01 │ 65524 (1 ROW)
Cool. All works. That looks really impressive.
Consider https://github.com/RekGRpth/pg_task