so, there you have a users table, with a very basic structure:
Table "public.users" Column | Type | Modifiers -----------+---------+----------- id | integer | not null birthdate | date | Indexes: "x_pkey" PRIMARY KEY, btree (id)
then, you have a task: find a query that will return all users which have birthday tomorrow. how will you do it?
there are couple of problems with it.
first – let's assume that you want the search using:
SELECT * FROM users WHERE to_char(birthdate, 'MM-DD') = to_char(now() + '1 day'::INTERVAL, 'MM-DD');
this is simple, yet it seq-scans whole table:
# EXPLAIN ANALYZE SELECT * FROM users WHERE to_char(birthdate, 'MM-DD') = to_char(now() + '1 day'::INTERVAL, 'MM-DD'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Seq Scan ON users (cost=0.00..43674.95 ROWS=7458 width=8) (actual TIME=0.036..5537.293 ROWS=2553 loops=1) FILTER: (to_char((birthdate)::TIMESTAMP WITH TIME zone, 'MM-DD'::text) = to_char((now() + '1 day'::INTERVAL), 'MM-DD'::text)) Total runtime: 5538.686 ms (3 ROWS)
now, you surely can index it with functional index. right? wrong!:
# CREATE INDEX q ON users (to_char(birthdate, 'MM-DD')); ERROR: functions IN INDEX expression must be marked IMMUTABLE
ouch.
to_char() is “stable", but not “immutable" – if you don't know what it means – consult the manual.
of course i can write my own wrapper around to_char:
CREATE OR REPLACE FUNCTION indexable_month_day(DATE) RETURNS TEXT AS $BODY$ SELECT to_char($1, 'MM-DD'); $BODY$ LANGUAGE 'sql' IMMUTABLE STRICT;
ok, now i can:
# CREATE INDEX q ON users ( indexable_month_day(birthdate) ); CREATE INDEX # EXPLAIN analyze SELECT * FROM users WHERE indexable_month_day(birthdate) = indexable_month_day( (now() + '1 day'::INTERVAL)::DATE); QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan ON users (cost=138.71..10719.75 ROWS=7458 width=8) (actual TIME=1.693..5.091 ROWS=2553 loops=1) Recheck Cond: (indexable_month_day(birthdate) = indexable_month_day(((now() + '1 day'::INTERVAL))::DATE)) -> Bitmap INDEX Scan ON q (cost=0.00..136.85 ROWS=7458 width=0) (actual TIME=1.104..1.104 ROWS=2553 loops=1) INDEX Cond: (indexable_month_day(birthdate) = indexable_month_day(((now() + '1 day'::INTERVAL))::DATE)) Total runtime: 6.103 ms
now bad. but is it correct?
let's assume that we have a person that was born on 29th of february, 1980. and let's assume today is 28th of february of 2007.
now() + ‘1 day' will give me 1st of march, and i will not find the guy from 29th of february.
of course – this is technically right, but anyway – it would be cool to wish the guy “happy birthday".
also – if the guy was born on 29th of february – should be greet him on 28th of february or on 1st of march?
of course – if current year is also leap year, the problem doesn't exist. but – the odds are that it is not leap year.
to make the story short, i wrote a function which returns list of dates that match some “now() + x days" criteria:
CREATE OR REPLACE FUNCTION birthdates_soon(in_today DATE, in_after_days INT4) RETURNS TEXT[] AS $BODY$ DECLARE work_date DATE := in_today + in_after_days * '1 day'::INTERVAL; output TEXT[] := ARRAY[ indexable_month_day( work_date ) ]; BEGIN IF NOT is_leap_year(in_today) THEN IF output[1] IN ('02-28', '03-01') THEN output := output || '02-29'::TEXT; END IF; END IF; RETURN output; END; $BODY$ LANGUAGE plpgsql IMMUTABLE STRICT;
this function needs this small function:
CREATE OR REPLACE FUNCTION is_leap_year(DATE) RETURNS bool AS $BODY$ DECLARE use_year INT4 := date_part('year', $1)::INT4; BEGIN IF 0 < use_year % 4 THEN RETURN FALSE; ELSIF 0 < use_year % 100 THEN RETURN TRUE; ELSIF 0 < use_year % 400 THEN RETURN FALSE; END IF; RETURN TRUE; END; $BODY$ LANGUAGE plpgsql IMMUTABLE strict;
what it does? let's see:
# SELECT x.date AS today, birthdates_soon(x.date, 1) AS tomorrow FROM ( VALUES ('2007-02-28'::DATE), ('2007-03-01'::DATE), ('2007-02-27'), ('2000-02-27'), ('2000-02-28'), ('2000-02-29'), (now()::DATE) ) AS x (DATE); today | tomorrow ------------+--------------- 2007-02-28 | {03-01,02-29} 2007-03-01 | {03-02} 2007-02-27 | {02-28,02-29} 2000-02-27 | {02-28} 2000-02-28 | {02-29} 2000-02-29 | {03-01} 2007-10-26 | {10-27} (7 ROWS)
usually we will be calling the function with first argument set to now()::date, so – let's make a wrapper:
CREATE OR REPLACE FUNCTION birthdates_soon(INT4) RETURNS TEXT[] AS $BODY$ SELECT birthdates_soon(now()::DATE, $1); $BODY$ LANGUAGE SQL IMMUTABLE STRICT;
this will allow us to write: … birthdates_soon(1) or … birthdates_soon( ‘2006-01-01'::date, 1).
and now, i can use the function in our “user-finding" query:
# EXPLAIN analyze SELECT * FROM users WHERE indexable_month_day(birthdate) = ANY ( birthdates_soon(1) ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan ON users (cost=138.45..8789.73 ROWS=7458 width=8) (actual TIME=1.442..4.913 ROWS=2553 loops=1) Recheck Cond: (indexable_month_day(birthdate) = ANY ('{10-27}'::text[])) -> Bitmap INDEX Scan ON q (cost=0.00..136.59 ROWS=7458 width=0) (actual TIME=0.870..0.870 ROWS=2553 loops=1) INDEX Cond: (indexable_month_day(birthdate) = ANY ('{10-27}'::text[])) Total runtime: 5.899 ms (5 ROWS)
as you can see we have to use = ANY() construct because our function returns array. if it was returning “setof text", i would have to use … in ().
so, that would be all. hope you'll find it useful.
Minor quibble: using current_date seems simpler and clearer than using now()::date.
Thank, Hubert, for presenting this. The birthday problem is something I’ve wanted to take a stab at and your post prompted me to look at it now.
It’s a personal pet peeve, but I really don’t like casting a datetime value to text. You can get away with it here because you’re relying on the fact that the collation of text representation is going to be the same as the ordering of the dates. As far as I know, you have to do something like this because there isn’t a type to represent just the month and day portion of a datetime value.
To avoid using to_char, I chose to convert the month and day to an integer representation:
It’s the same in spirit as what you’re doing with to_char, but I think it’s a bit clearer what is happening: you’re no longer dealing with something that looks like a date. I haven’t profiled it to see how the two pairs of
and
and a little arithmetic compare to the single
call: I suspect the former may be slower.
I also took advantage of the fact that today always follows yesterday: rather than determine whether or not a leap year is involved and conditionally add February 29, I just compare the date in question to see if it’s after yesterday: during a leap year, yesterday relative to March 1 (301) is February 29 (229) and otherwise it’s February 28 (228). In any year, the
representation of February 29 is going to fall between yesterday and today on March 1.
Here’s the rest of the code:
Query plans are similar (though they look messier due to the inlined SQL):
http://explain-analyze.info/query_plans/1227-birthday-within-one-day
I was a bit surprised that the
call wasn’t inlined and didn’t take advantage of
, rather using a sequential scan with very unfortunate results:
http://explain-analyze.info/query_plans/1228-birthday-in-tomorrow-seq-scan
Inlining the function body in the query worked well, however:
http://explain-analyze.info/query_plans/1230-birthdays-soon
http://explain-analyze.info/query_plans/1231-birthdays-soon-7-days
http://explain-analyze.info/query_plans/1232-birthdays-in-one-week
Granted, my little analysis isn’t very rigorous, but it looks like your array code is a bit faster than the comparisons I’m using.
aargh. andrewsn ranted, and grzm showed me that i made something really bad – used to_char() to get integer value of part of date.
function (is_leap_year()) is now fixed to properly use date_part().
Also per AndrewSN, redefining is_birthday to not be strict allows it to be properly inlined.
I’m confused as to why you would’t just:
create index idx_bday on users( date_part(‘doy’,birthday));
then query as say:
select * from users where date_part(‘doy’,birthday) = date_part(‘doy’,’feb 28, 2008′::date + 1);
Oops, my previous post is obviously wrong. Pardon the noise.
Hello! I am from Tojikiston. Your blog is very intrestin, but it very difficult to read it with on-line translator. Can you make it in my own Tojikiston language?
@Ravshan:
sorry,. but i dont know your language, and i think that writing in it will make it very difficult to read for others.
There’s no need to use all these function magic to make it work for 02-29.
select * from public.users
where
indexable_month_day(birthdate) > indexable_month_day(now()::date) and
indexable_month_day(birthdate) <= indexable_month_day((now() + ‘1 days’::interval)::date);
hmmm, isn’t this simpler?
SELECT … WHERE birthdate+date_trunc (‘years’, age(current_date, birthdate))::date +interval ‘1 year’ BETWEEN CURRENT_DATE AND CURRENT_DATE + N * interval ‘1 day’;
to find if someone’s birthday is in the next N days?
that gnarly expression between the WHERE and BETWEEN calculates the date of their next birthday…