Home > PostgreSQL > Who’s Birthday

Who’s Birthday

Finding who’s birthday on a day or over a period of days is always interesting. Finding it from a computer database is even more interesting.

Here it is a table to start with (all works below is on PostgreSQL version 9.1.3):

CREATE TABLE person
(
id integer NOT NULL,
dob date NOT NULL,
CONSTRAINT “id must be unique” PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

Fill it with 500,000 dummy records with random birthday:

INSERT INTO person
WITH RECURSIVE t AS (
SELECT 1 AS id, DATE_PLI(‘1970-01-01′::date, (RANDOM() * 5000)::integer ) AS dob
UNION ALL
SELECT t.id+1 AS id, DATE_PLI(‘1970-01-01′::date, (RANDOM() * 5000)::integer ) AS dob FROM t
WHERE t.id<500000
)
SELECT t.id, t.dob FROM t;

Now let us examine a query to find out who’s birthday on Dec 30th 2012:

EXPLAIN ANALYZE
SELECT * FROM person WHERE DATE_PART(‘MONTH’,dob)=DATE_PART(‘MONTH’,’2012-12-30′::date) AND DATE_PART(‘DAY’,dob)=DATE_PART(‘DAY’,’2012-12-30′::date);

Result:

Seq Scan on person (cost=0.00..14713.00 rows=12 width=8) (actual time=0.226..284.272 rows=1365 loops=1)
Filter: ((date_part(‘MONTH'::text, (dob)::timestamp without time zone) = 12::double precision) AND (date_part(‘DAY'::text, (dob)::timestamp without time zone) = 30::double precision))
Total runtime: 286.369 ms

Can we do it faster? with the help of index.

CREATE INDEX “person_idx_dob1″ ON person USING btree
( INT4PL( DATE_PART(‘MONTH’,dob)::integer*100, DATE_PART(‘DAY’,dob)::integer ) );

Let us examine a query in different way so the index works:

EXPLAIN ANALYZE
SELECT * FROM person
WHERE INT4PL(DATE_PART(‘MONTH’,dob)::integer*100,DATE_PART(‘DAY’,dob)::integer)=
INT4PL(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100,DATE_PART(‘DAY’,’2012-12-30′::date)::integer);

Result:

Bitmap Heap Scan on person (cost=43.69..2443.18 rows=2500 width=8) (actual time=0.481..3.805 rows=1365 loops=1)
Recheck Cond: (int4pl(((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) = 1230)
-> Bitmap Index Scan on person_idx_dob1 (cost=0.00..43.06 rows=2500 width=0) (actual time=0.305..0.305 rows=1365 loops=1)
Index Cond: (int4pl(((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) = 1230)
Total runtime: 5.783 ms (previously 286.369 ms)

Yes, the index works and it is faster now.

More Challenges?

Finding who’s birthday over a period of days is more complex than what we have done on a single day. Specially when the period starts from a day close to end of year to a day next year.
I have a trick to transform a date-of-birth to the form of:

20000 + (month-part-of-a-date * 100) plus (day-part-of-a-date).

If the whole period is within a single year, then the start and stop date is transformed into:

20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

But, if it crosses year border then it has lower and higher range detection. The lower range detection:

10000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

And for higher range detection:

20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
30000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

Give it a try. To query who’s birthday between 1) Dec 30 2012 and Dec 31 2012, 2) Dec 30 2012 and Jan 01 2013

Period within single year: Dec 30 2012 and Dec 31 2012.

EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-31′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-31′::date)::integer) )
);

Result:

Seq Scan on person (cost=0.00..32213.00 rows=2500 width=8) (actual time=0.504..611.400 rows=2715 loops=1)
Filter: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000,((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 614.999 ms

Period crosses years: Dec 30 2012 and Jan 01 2013:

EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(10000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
)
OR
(
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(30000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
);

Result:

Seq Scan on person (cost=0.00..57213.00 rows=4988 width=8) (actual time=1.232..1785.936 rows=4054 loops=1)
Filter: (((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 30101)))
Total runtime: 1791.009 ms

Can it be faster?

Create index:

CREATE INDEX “person_idx_dob2″ ON person USING btree
INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) );

Re-run exactly the same query as above and see how the index works:

Period within single year: Dec 30 2012 and Dec 31 2012:

Result:

Bitmap Heap Scan on person (cost=49.96..2518.20 rows=2500 width=8) (actual time=0.829..6.969 rows=2715 loops=1)
Recheck Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 21231))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.549..0.549 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 10.649 ms (previously 614.999 ms)

Period crosses years: Dec 30 2012 and Jan 01 2013:

Result:

Bitmap Heap Scan on person (cost=101.17..2864.17 rows=4988 width=8) (actual time=1.552..9.987 rows=4054 loops=1)
Recheck Cond: (((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND

(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 30101)))
-> BitmapOr (cost=101.17..101.17 rows=5000 width=0) (actual time=1.210..1.210 rows=0 loops=1)
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.495..0.495 rows=1339 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 20101))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.710..0.710 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH'::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY'::text, (dob)::timestamp without time zone))::integer) <= 30101))
Total runtime: 15.224 ms (previously 1791.009 ms)

What about Feb 29th?

It must be filtered out if the year of period detection is not a leap year. And I do not have time to show you because I am in a hurry to say:

Happy Birthday!
About these ads
Categories: PostgreSQL
  1. Aziz
    October 24, 2012 at 10:03 pm

    To find whose birthday falls on a given day you can also use :
    SELECT * FROM person WHERE to_char(dob, ‘monDD’)= to_char(‘2012-5-30′::date, ‘monDD’);

  2. Thomas
    October 24, 2012 at 10:23 pm

    Finding the people whose birthday is in the next 5 days can be done a lot easier


    select *
    from person
    where (current_date + ((extract(year from age(dob)) + 1)::integer * interval '1' year - age(dob)))::date <= current_date + 5;

    The downside is that this expression cannot be indexed and thus this is going to be slow on large tables.

  3. Aziz
    October 24, 2012 at 10:37 pm

    And for the second part( to find bithdays that fall between two days):
    SELECT * FROM person WHERE to_char(dob, ‘monDD’) in (select to_char(dob, ‘monDD’) from person where dob between ‘1975-11-28′::date and ‘1975-11-30′::date);

  4. Aziz
    October 24, 2012 at 10:42 pm

    By the Postgres doesnt even allow Feb 29 in its date based columns so that case can be ruled out.

  5. October 25, 2012 at 12:23 am

    Thanks Aziz, my engineering background has driven me to think in numbers. Regarding Feb 29, what I want to tell you is that “to strip out someone’s birthday whose date-of-birth of [Feb 29, 1996] from query result with period starts from [Feb 28, 2013] to [Mar 1, 2013]“.

  6. David
    October 29, 2012 at 8:10 pm

    Why not do
    SELECT * FROM person WHERE dob=’2012-12-30′::date;

    or

    SELECT * FROM person WHERE dob between ‘2012-12-30′::date AND ‘2013-01-01′::date;

    or for everyone born in January of 1970

    SELECT * FROM person WHERE date_trunc(‘month’, dob) = date_trunc(‘month’, ‘1970-01-01′::date);

    Also 5000 days after 1970 01 01 is 1983 09 10 so I had no one born after that in my data set so the first two queries returned no rows.
    (with a) SELECT ‘1970-01-01′::date + 5000;

    And Feb 29?
    SELECT * FROM person WHERE dob=’1980-02-29′::date;

  7. October 30, 2012 at 2:38 pm

    A small English correction. “Who’s” is short for “who is”. It is never a possessive. The possessive is “whose”. Thus: “Who’s on first?” but “Whose book is this?”

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: