nixternal wrote about boost library for c++.
with it he was able to find the answer to title question in miliseconds (he didn't specify how many, but let's assume that is was less than 10 ms).
so i decided to check how fast can i do it in postgresql …
first, i will need list of all days from given date range (1901-01-01 to 2000-12-31):
SELECT '1901-01-01'::DATE + i AS DATE FROM generate_series(0, '2000-12-31'::DATE - '1901-01-01'::DATE) AS i ;
now i need to check which of these days are 1st days of month:
SELECT '1901-01-01'::DATE + i AS DATE FROM generate_series(0, '2000-12-31'::DATE - '1901-01-01'::DATE) AS i WHERE EXTRACT(DAY FROM '1901-01-01'::DATE + i) = 1 ;
and the “sunday" question:
SELECT '1901-01-01'::DATE + i AS DATE FROM generate_series(0, '2000-12-31'::DATE - '1901-01-01'::DATE) AS i WHERE EXTRACT(DAY FROM '1901-01-01'::DATE + i) = 1 AND EXTRACT(dow FROM '1901-01-01'::DATE + i) = 0 ;
great, the code works ok, so now, let's get count, and time the query:
# SELECT =# COUNT(*) =# FROM =# generate_series(0, '2000-12-31'::DATE - '1901-01-01'::DATE) AS i =# WHERE =# EXTRACT(DAY FROM '1901-01-01'::DATE + i) = 1 AND EXTRACT(dow FROM '1901-01-01'::DATE + i) = 0 =# ; COUNT ------- 171 (1 ROW) TIME: 59.698 ms
not bad, but i'd like to get the time a bit lower.
perhaps i should go with single comparison with “to_char" ?
# SELECT =# COUNT(*) =# FROM =# generate_series(0, '2000-12-31'::DATE - '1901-01-01'::DATE) AS i =# WHERE =# to_char('1901-01-01'::DATE + i, 'D DD') = '1 01' =# ; COUNT ------- 171 (1 ROW) TIME: 113.314 ms
oops. single to_char() is slower than 2 separate extract()s.
what else can i do?
i know. let's use the fact that i can add intervals, and not only days. theoretically if i'll add ‘1 month' i should stay on 1st day of the month. quick test:
# SELECT '1901-01-01'::DATE + '1 month'::INTERVAL * i FROM generate_series(1,10) i; ?COLUMN? --------------------- 1901-02-01 00:00:00 1901-03-01 00:00:00 1901-04-01 00:00:00 1901-05-01 00:00:00 1901-06-01 00:00:00 1901-07-01 00:00:00 1901-08-01 00:00:00 1901-09-01 00:00:00 1901-10-01 00:00:00 1901-11-01 00:00:00 (10 ROWS)
looks cool.
so, how to use it?
basic approach:
# SELECT COUNT(*) FROM generate_series(0, ('2000-12-31'::DATE - '1901-01-01'::DATE)/28) AS i WHERE EXTRACT(DAY FROM '1901-01-01'::DATE + i * '1 month'::INTERVAL) = 1 AND EXTRACT(dow FROM '1901-01-01'::DATE + i * '1 month'::INTERVAL) = 0 AND '1901-01-01'::DATE + i * '1 month'::INTERVAL < '2000-12-31'::DATE ; COUNT ------- 171 (1 ROW) TIME: 6.083 ms
shows success 🙂
now, it would be cool to de-uglify the query, and perhaps make it so it will need to have dates specified only once:
# SELECT COUNT(*) FROM ( SELECT CAST(START.date + '1 month'::INTERVAL * generate_series(0, (finish.date - START.date)/28) AS DATE) AS DATE, finish.date AS "finish" FROM (SELECT '2000-12-31'::DATE AS DATE) AS finish, (SELECT '1901-01-01'::DATE AS DATE) AS START ) x WHERE 1 = EXTRACT(DAY FROM x.date) AND 0 = EXTRACT(dow FROM x.date) AND x.date < x.finish ; COUNT ------- 171 (1 ROW) TIME: 4.905 ms
it's a bit faster this some as date + interval additions are done only once for each month.
in general – i was able to write a 5ms query to calculate “number of sundays since 1901", in interpreted language. and the language/engine is not even calendar centric (designed to work mostly with dates).
it's just one of the things why i love postgresql.
I think you can do this more efficiently (and more correctly) like this. The 1200 comes from the fact that you’re looking at 100 years, each of which has 12 months.
the difference is that i didn’t want to hardcode numbers like this (1200) – i wanted the query to calculate it by its own. that’s why i have this:
generate_series(0, (finish.date – start.date)/28)
and later:
where … AND x.date < x.finish
Fair enough. I think this one’s faster and just as easy to parameterize as yours 🙂
SELECT count(*)
FROM (
SELECT start + generate_series(0,
12 * date_part(‘year’, finish)::integer +
date_part(‘month’, finish)::integer –
12 * date_part(‘year’, start)::integer –
date_part(‘month’, start)::integer
) * INTERVAL ‘1 month’ AS i
FROM
(VALUES(
‘1901-01-01’::date,
‘2000-01-01’::date
)) AS t(“start”, “finish”)
) AS j
WHERE
extract(‘dow’ FROM i)=0
AND
extract(‘day’ FROM i)=1
;
oops. formatted this time:
just re-tested my approach (last sql from post) and davids.
it is tested on different machine, so times will be different, but i wanted to check how much faster davids approach will be.
i ran both queries 5 times, and got results:
(depesz way):
Time: 9.272 ms
Time: 9.316 ms
Time: 9.225 ms
Time: 9.422 ms
Time: 9.224 ms
(david way):
Time: 7.736 ms
Time: 6.622 ms
Time: 3.124 ms
Time: 2.591 ms
Time: 2.600 ms
whoa. that’s some nice speedup in last 3 cases 🙂 and it’s repeatable!
david, kudos to you 🙂
These are very interesting queries. Thanks for the post.
But what about finding the lasts Sundays and not the 1sts ones ?
Wouldn’t this be more challenging as the last Sunday in a month may be the 4th or the 5th one ?
Have fun 🙂
@M.Le_maudit:
Not sure I understand.
What exactly do you want to find?
Your right, it’s not exactly the same kind of challenge…
Well, I’m looking for the dates of all last Sundays in a month during a period of time and the rank of these Sundays in the month.
for example :
– in August 2011, last Sunday will be August 28 and it will be the 4th Sunday in the month.
– in October 2011, last Sunday will be October 30 and it will be the 5th Sunday in the month.
So why not something that would give this kind of result for the year 2011 :
Year | Month | Day | Rank
—–+———-+—–+—–
2011 | January | 30 | 5
2011 | February | 27 | 4
2011 | March | 27 | 4
2011 | April | 24 | 4
2011 | May | 29 | 5
…
2011 | August | 28 | 4
…
2011 | October | 30 | 5
…
2011 |December | 25 | 4
I’m not sure this will be so challenging for you, but it is for me… yes, I have to confess that I’m not a SQL pro. 🙂
Anyway, I need to manage recurring events that occur every first and every last Sundays in a month, so I need to identify the dates of these days.
@M.Le_maudit:
@depesz
Hi, this is my first post on your blog and I must say all kudos to you and commentators.
Truly great readings for postres average like me.
However, while I was reading this page and fiddling with code I did spot few minor errors.
Depesz, in your final ‘select’ dividing with 28 generates unnecessary rows. IMHO it’s more correct to
borrow from DAVID FETTER example; resulting in:
@DAVID FETTER
In your final ‘select’ IMHO it’s more correct to change this part
to
Simplifying even more this could be expressed as: