Who’s Birthday
Finding who’s birthday on a day or over a period of days is always interesting. Finding it from a computer database is even more interesting.
Here it is a table to start with (all works below is on PostgreSQL version 9.1.3):
CREATE TABLE person
(
id integer NOT NULL,
dob date NOT NULL,
CONSTRAINT “id must be unique” PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
Fill it with 500,000 dummy records with random birthday:
INSERT INTO person
WITH RECURSIVE t AS (
SELECT 1 AS id, DATE_PLI(‘1970-01-01’::date, (RANDOM() * 5000)::integer ) AS dob
UNION ALL
SELECT t.id+1 AS id, DATE_PLI(‘1970-01-01’::date, (RANDOM() * 5000)::integer ) AS dob FROM t
WHERE t.id<500000
)
SELECT t.id, t.dob FROM t;
Now let us examine a query to find out who’s birthday on Dec 30th 2012:
EXPLAIN ANALYZE
SELECT * FROM person WHERE DATE_PART(‘MONTH’,dob)=DATE_PART(‘MONTH’,’2012-12-30′::date) AND DATE_PART(‘DAY’,dob)=DATE_PART(‘DAY’,’2012-12-30′::date);
Result:
Seq Scan on person (cost=0.00..14713.00 rows=12 width=8) (actual time=0.226..284.272 rows=1365 loops=1)
Filter: ((date_part(‘MONTH’::text, (dob)::timestamp without time zone) = 12::double precision) AND (date_part(‘DAY’::text, (dob)::timestamp without time zone) = 30::double precision))
Total runtime: 286.369 ms
Can we do it faster? with the help of index.
CREATE INDEX “person_idx_dob1” ON person USING btree
( INT4PL( DATE_PART(‘MONTH’,dob)::integer*100, DATE_PART(‘DAY’,dob)::integer ) );
Let us examine a query in different way so the index works:
EXPLAIN ANALYZE
SELECT * FROM person
WHERE INT4PL(DATE_PART(‘MONTH’,dob)::integer*100,DATE_PART(‘DAY’,dob)::integer)=
INT4PL(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100,DATE_PART(‘DAY’,’2012-12-30′::date)::integer);
Result:
Bitmap Heap Scan on person (cost=43.69..2443.18 rows=2500 width=8) (actual time=0.481..3.805 rows=1365 loops=1)
Recheck Cond: (int4pl(((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) = 1230)
-> Bitmap Index Scan on person_idx_dob1 (cost=0.00..43.06 rows=2500 width=0) (actual time=0.305..0.305 rows=1365 loops=1)
Index Cond: (int4pl(((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) = 1230)
Total runtime: 5.783 ms (previously 286.369 ms)
Yes, the index works and it is faster now.
Finding who’s birthday over a period of days is more complex than what we have done on a single day. Specially when the period starts from a day close to end of year to a day next year.
I have a trick to transform a date-of-birth to the form of:
20000 + (month-part-of-a-date * 100) plus (day-part-of-a-date).
If the whole period is within a single year, then the start and stop date is transformed into:
20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)
But, if it crosses year border then it has lower and higher range detection. The lower range detection:
10000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)
And for higher range detection:
20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
30000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)
Give it a try. To query who’s birthday between 1) Dec 30 2012 and Dec 31 2012, 2) Dec 30 2012 and Jan 01 2013
Period within single year: Dec 30 2012 and Dec 31 2012.
EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-31′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-31′::date)::integer) )
);
Result:
Seq Scan on person (cost=0.00..32213.00 rows=2500 width=8) (actual time=0.504..611.400 rows=2715 loops=1)
Filter: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000,((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 614.999 ms
Period crosses years: Dec 30 2012 and Jan 01 2013:
EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(10000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
)
OR
(
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(30000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
);
Result:
Seq Scan on person (cost=0.00..57213.00 rows=4988 width=8) (actual time=1.232..1785.936 rows=4054 loops=1)
Filter: (((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 30101)))
Total runtime: 1791.009 ms
Can it be faster?
Create index:
CREATE INDEX “person_idx_dob2” ON person USING btree
INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) );
Re-run exactly the same query as above and see how the index works:
Period within single year: Dec 30 2012 and Dec 31 2012:
Result:
Bitmap Heap Scan on person (cost=49.96..2518.20 rows=2500 width=8) (actual time=0.829..6.969 rows=2715 loops=1)
Recheck Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 21231))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.549..0.549 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 10.649 ms (previously 614.999 ms)
Period crosses years: Dec 30 2012 and Jan 01 2013:
Result:
Bitmap Heap Scan on person (cost=101.17..2864.17 rows=4988 width=8) (actual time=1.552..9.987 rows=4054 loops=1)
Recheck Cond: (((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 30101)))
-> BitmapOr (cost=101.17..101.17 rows=5000 width=0) (actual time=1.210..1.210 rows=0 loops=1)
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.495..0.495 rows=1339 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 20101))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.710..0.710 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 30101))
Total runtime: 15.224 ms (previously 1791.009 ms)
What about Feb 29th?
It must be filtered out if the year of period detection is not a leap year. And I do not have time to show you because I am in a hurry to say:
To find whose birthday falls on a given day you can also use :
SELECT * FROM person WHERE to_char(dob, ‘monDD’)= to_char(‘2012-5-30’::date, ‘monDD’);
Finding the people whose birthday is in the next 5 days can be done a lot easier
select *
from person
where (current_date + ((extract(year from age(dob)) + 1)::integer * interval '1' year - age(dob)))::date <= current_date + 5;
The downside is that this expression cannot be indexed and thus this is going to be slow on large tables.
And for the second part( to find bithdays that fall between two days):
SELECT * FROM person WHERE to_char(dob, ‘monDD’) in (select to_char(dob, ‘monDD’) from person where dob between ‘1975-11-28’::date and ‘1975-11-30’::date);
By the Postgres doesnt even allow Feb 29 in its date based columns so that case can be ruled out.
Thanks Aziz, my engineering background has driven me to think in numbers. Regarding Feb 29, what I want to tell you is that “to strip out someone’s birthday whose date-of-birth of [Feb 29, 1996] from query result with period starts from [Feb 28, 2013] to [Mar 1, 2013]”.
Why not do
SELECT * FROM person WHERE dob=’2012-12-30′::date;
or
SELECT * FROM person WHERE dob between ‘2012-12-30’::date AND ‘2013-01-01’::date;
or for everyone born in January of 1970
SELECT * FROM person WHERE date_trunc(‘month’, dob) = date_trunc(‘month’, ‘1970-01-01’::date);
Also 5000 days after 1970 01 01 is 1983 09 10 so I had no one born after that in my data set so the first two queries returned no rows.
(with a) SELECT ‘1970-01-01′::date + 5000;
And Feb 29?
SELECT * FROM person WHERE dob=’1980-02-29’::date;
A small English correction. “Who’s” is short for “who is”. It is never a possessive. The possessive is “whose”. Thus: “Who’s on first?” but “Whose book is this?”