Who’s Birthday

Finding who’s birthday on a day or over a period of days is always interesting. Finding it from a computer database is even more interesting.

Here it is a table to start with (all works below is on PostgreSQL version 9.1.3):

CREATE TABLE person
(
id integer NOT NULL,
dob date NOT NULL,
CONSTRAINT “id must be unique” PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

Fill it with 500,000 dummy records with random birthday:

INSERT INTO person
WITH RECURSIVE t AS (
SELECT 1 AS id, DATE_PLI(‘1970-01-01′::date, (RANDOM() * 5000)::integer ) AS dob
UNION ALL
SELECT t.id+1 AS id, DATE_PLI(‘1970-01-01′::date, (RANDOM() * 5000)::integer ) AS dob FROM t
WHERE t.id<500000
)
SELECT t.id, t.dob FROM t;

Now let us examine a query to find out who’s birthday on Dec 30th 2012:

EXPLAIN ANALYZE
SELECT * FROM person WHERE DATE_PART(‘MONTH’,dob)=DATE_PART(‘MONTH’,’2012-12-30′::date) AND DATE_PART(‘DAY’,dob)=DATE_PART(‘DAY’,’2012-12-30′::date);

Result:

Seq Scan on person (cost=0.00..14713.00 rows=12 width=8) (actual time=0.226..284.272 rows=1365 loops=1)
Filter: ((date_part(‘MONTH'::text, (dob)::timestamp without time zone) = 12::double precision) AND (date_part(‘DAY'::text, (dob)::timestamp without time zone) = 30::double precision))
Total runtime: 286.369 ms

Can we do it faster? with the help of index.

CREATE INDEX “person_idx_dob1″ ON person USING btree
( INT4PL( DATE_PART(‘MONTH’,dob)::integer*100, DATE_PART(‘DAY’,dob)::integer ) );

Let us examine a query in different way so the index works:

EXPLAIN ANALYZE
SELECT * FROM person
WHERE INT4PL(DATE_PART(‘MONTH’,dob)::integer*100,DATE_PART(‘DAY’,dob)::integer)=
INT4PL(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100,DATE_PART(‘DAY’,’2012-12-30′::date)::integer);

Result:

Bitmap Heap Scan on person (cost=43.69..2443.18 rows=2500 width=8) (actual time=0.481..3.805 rows=1365 loops=1)
Recheck Cond: (int4pl(((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) = 1230)
-> Bitmap Index Scan on person_idx_dob1 (cost=0.00..43.06 rows=2500 width=0) (actual time=0.305..0.305 rows=1365 loops=1)
Index Cond: (int4pl(((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) = 1230)
Total runtime: 5.783 ms (previously 286.369 ms)

Yes, the index works and it is faster now.

More Challenges?

Finding who’s birthday over a period of days is more complex than what we have done on a single day. Specially when the period starts from a day close to end of year to a day next year.
I have a trick to transform a date-of-birth to the form of:

20000 + (month-part-of-a-date * 100) plus (day-part-of-a-date).

If the whole period is within a single year, then the start and stop date is transformed into:

20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

But, if it crosses year border then it has lower and higher range detection. The lower range detection:

10000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

And for higher range detection:

20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
30000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

Give it a try. To query who’s birthday between 1) Dec 30 2012 and Dec 31 2012, 2) Dec 30 2012 and Jan 01 2013

Period within single year: Dec 30 2012 and Dec 31 2012.

EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-31′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-31′::date)::integer) )
);

Result:

Seq Scan on person (cost=0.00..32213.00 rows=2500 width=8) (actual time=0.504..611.400 rows=2715 loops=1)
Filter: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000,((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 614.999 ms

Period crosses years: Dec 30 2012 and Jan 01 2013:

EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(10000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
)
OR
(
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(30000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
);

Result:

Seq Scan on person (cost=0.00..57213.00 rows=4988 width=8) (actual time=1.232..1785.936 rows=4054 loops=1)
Filter: (((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 30101)))
Total runtime: 1791.009 ms

Can it be faster?

Create index:

CREATE INDEX “person_idx_dob2″ ON person USING btree
INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) );

Re-run exactly the same query as above and see how the index works:

Period within single year: Dec 30 2012 and Dec 31 2012:

Result:

Bitmap Heap Scan on person (cost=49.96..2518.20 rows=2500 width=8) (actual time=0.829..6.969 rows=2715 loops=1)
Recheck Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 21231))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.549..0.549 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 10.649 ms (previously 614.999 ms)

Period crosses years: Dec 30 2012 and Jan 01 2013:

Result:

Bitmap Heap Scan on person (cost=101.17..2864.17 rows=4988 width=8) (actual time=1.552..9.987 rows=4054 loops=1)
Recheck Cond: (((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND

(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 30101)))
-> BitmapOr (cost=101.17..101.17 rows=5000 width=0) (actual time=1.210..1.210 rows=0 loops=1)
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.495..0.495 rows=1339 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 20101))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.710..0.710 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 30101))
Total runtime: 15.224 ms (previously 1791.009 ms)

What about Feb 29th?

It must be filtered out if the year of period detection is not a leap year. And I do not have time to show you because I am in a hurry to say:

Happy Birthday!
About these ads
7 comments
  1. Aziz said:

    To find whose birthday falls on a given day you can also use :
    SELECT * FROM person WHERE to_char(dob, ‘monDD’)= to_char(‘2012-5-30′::date, ‘monDD’);

  2. Thomas said:

    Finding the people whose birthday is in the next 5 days can be done a lot easier


    select *
    from person
    where (current_date + ((extract(year from age(dob)) + 1)::integer * interval '1' year - age(dob)))::date <= current_date + 5;

    The downside is that this expression cannot be indexed and thus this is going to be slow on large tables.

  3. Aziz said:

    And for the second part( to find bithdays that fall between two days):
    SELECT * FROM person WHERE to_char(dob, ‘monDD’) in (select to_char(dob, ‘monDD’) from person where dob between ‘1975-11-28′::date and ‘1975-11-30′::date);

  4. Aziz said:

    By the Postgres doesnt even allow Feb 29 in its date based columns so that case can be ruled out.

  5. Thanks Aziz, my engineering background has driven me to think in numbers. Regarding Feb 29, what I want to tell you is that “to strip out someone’s birthday whose date-of-birth of [Feb 29, 1996] from query result with period starts from [Feb 28, 2013] to [Mar 1, 2013]“.

  6. David said:

    Why not do
    SELECT * FROM person WHERE dob=’2012-12-30′::date;

    or

    SELECT * FROM person WHERE dob between ‘2012-12-30′::date AND ‘2013-01-01′::date;

    or for everyone born in January of 1970

    SELECT * FROM person WHERE date_trunc(‘month’, dob) = date_trunc(‘month’, ‘1970-01-01′::date);

    Also 5000 days after 1970 01 01 is 1983 09 10 so I had no one born after that in my data set so the first two queries returned no rows.
    (with a) SELECT ‘1970-01-01′::date + 5000;

    And Feb 29?
    SELECT * FROM person WHERE dob=’1980-02-29′::date;

  7. A small English correction. “Who’s” is short for “who is”. It is never a possessive. The possessive is “whose”. Thus: “Who’s on first?” but “Whose book is this?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: