On 24th of March 2021, Peter Eisentraut committed patch:
Add date_bin function Similar to date_trunc, but allows binning by an arbitrary interval rather than just full units. Author: John Naylor <john.naylor@enterprisedb.com> Reviewed-by: David Fetter <david@fetter.org> Reviewed-by: Isaac Morland <isaac.morland@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Artur Zakirov <zaartur@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
This is pretty interesting.
First some background. We have date_trunc function which does:
=$ SELECT 'untruncated' AS spec, now() UNION ALL SELECT spec, date_trunc(spec, now()) FROM unnest('{microseconds,milliseconds,second,minute,hour,day,week,month,quarter,year,decade,century,millennium}'::text[]) AS u(spec); spec │ now ──────────────┼─────────────────────────────── untruncated │ 2021-03-31 20:27:25.923338+02 microseconds │ 2021-03-31 20:27:25.923338+02 milliseconds │ 2021-03-31 20:27:25.923+02 SECOND │ 2021-03-31 20:27:25+02 MINUTE │ 2021-03-31 20:27:00+02 HOUR │ 2021-03-31 20:00:00+02 DAY │ 2021-03-31 00:00:00+02 week │ 2021-03-29 00:00:00+02 MONTH │ 2021-03-01 00:00:00+01 quarter │ 2021-01-01 00:00:00+01 YEAR │ 2021-01-01 00:00:00+01 decade │ 2020-01-01 00:00:00+01 century │ 2001-01-01 00:00:00+01 millennium │ 2001-01-01 00:00:00+01 (14 ROWS)
Nice. So, this new function, date_bin, but it takes (almost) any interval as base for truncation.
Almost as it can't take any interval with units of months or more (due to varying duration).
Let's see:
=$ SELECT date_bin('5 minutes', now()); ERROR: FUNCTION date_bin(UNKNOWN, TIMESTAMP WITH TIME zone) does NOT exist LINE 1: SELECT date_bin('5 minutes', now()); ^ HINT: No FUNCTION matches the given name AND argument types. You might need TO ADD explicit TYPE casts.
hmm … careful reason suggests that there is second argument – basically what is base for the bin calculation. So, to get the same results, I'd need to use epoch start as base:
=$ SELECT date_bin('5 minutes', now(), '1970-01-01'); date_bin ------------------------ 2021-03-31 20:35:00+02 (1 ROW)
Nice. Of course, I can use any length that can be expressed as interval, and any starting point:
=$ SELECT date_bin('17 minutes 31 seconds'::INTERVAL, now(), '2000-01-01'); date_bin ------------------------ 2021-03-31 20:23:16+02 (1 ROW)
This is pretty cool.
Of course I don't expect many people needing to group their data in weird durations, like 17 minutes and 31 seconds, but 5/15 minutes can be pretty easy to imagine to be helpful.
Thanks a lot, everyone 🙂
Hi Depesz,
very cool! I just read about the time_bucket function in TimescaleDB (https://docs.timescale.com/latest/api#time_bucket) two days ago and thought that this is a very cool feature. Nice to see this in PostgreSQL itself as well.
Best regards
Salek
Hi Depesz,
can you just point out what u(spec) in your first query is doing?
never has seen that…
thanks karsten
Hi karsten,
I wondered as well, never saw it before either.
I did not find an explanation in the SELECT docs for this, but it is in the docs as well here: https://www.postgresql.org/docs/current/functions-srf.html (search for “AS s(a)”).
It does not seem to matter if you put s(spec), xx(spec) or just spec, though. So it’s just a different way to specify an ALIAS name.
Best regards
Salek
Thanks Salek, with you hint and a second look its clear to me! thx
Hi @karsten
I think that ‘u’ is the alias name and ‘spec’ is the column name in the resultset. Greetings
this solves this problem, but better right ? https://www.depesz.com/2010/10/22/grouping-data-into-time-ranges/
@Mark:
yes. Built-in way to do the thing 🙂
I needed this some time ago and ended up with more or less the same function in SQL. It still may be used in pre-14 Postgresql versions.