Archive

Monthly Archives: October 2012

Finding who’s birthday on a day or over a period of days is always interesting. Finding it from a computer database is even more interesting.

Here it is a table to start with (all works below is on PostgreSQL version 9.1.3):

CREATE TABLE person
(
id integer NOT NULL,
dob date NOT NULL,
CONSTRAINT “id must be unique” PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

Fill it with 500,000 dummy records with random birthday:

INSERT INTO person
WITH RECURSIVE t AS (
SELECT 1 AS id, DATE_PLI(‘1970-01-01’::date, (RANDOM() * 5000)::integer ) AS dob
UNION ALL
SELECT t.id+1 AS id, DATE_PLI(‘1970-01-01’::date, (RANDOM() * 5000)::integer ) AS dob FROM t
WHERE t.id<500000
)
SELECT t.id, t.dob FROM t;

Now let us examine a query to find out who’s birthday on Dec 30th 2012:

EXPLAIN ANALYZE
SELECT * FROM person WHERE DATE_PART(‘MONTH’,dob)=DATE_PART(‘MONTH’,’2012-12-30′::date) AND DATE_PART(‘DAY’,dob)=DATE_PART(‘DAY’,’2012-12-30′::date);

Result:

Seq Scan on person (cost=0.00..14713.00 rows=12 width=8) (actual time=0.226..284.272 rows=1365 loops=1)
Filter: ((date_part(‘MONTH’::text, (dob)::timestamp without time zone) = 12::double precision) AND (date_part(‘DAY’::text, (dob)::timestamp without time zone) = 30::double precision))
Total runtime: 286.369 ms

Can we do it faster? with the help of index.

CREATE INDEX “person_idx_dob1” ON person USING btree
( INT4PL( DATE_PART(‘MONTH’,dob)::integer*100, DATE_PART(‘DAY’,dob)::integer ) );

Let us examine a query in different way so the index works:

EXPLAIN ANALYZE
SELECT * FROM person
WHERE INT4PL(DATE_PART(‘MONTH’,dob)::integer*100,DATE_PART(‘DAY’,dob)::integer)=
INT4PL(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100,DATE_PART(‘DAY’,’2012-12-30′::date)::integer);

Result:

Bitmap Heap Scan on person (cost=43.69..2443.18 rows=2500 width=8) (actual time=0.481..3.805 rows=1365 loops=1)
Recheck Cond: (int4pl(((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) = 1230)
-> Bitmap Index Scan on person_idx_dob1 (cost=0.00..43.06 rows=2500 width=0) (actual time=0.305..0.305 rows=1365 loops=1)
Index Cond: (int4pl(((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) = 1230)
Total runtime: 5.783 ms (previously 286.369 ms)

Yes, the index works and it is faster now.

More Challenges?

Finding who’s birthday over a period of days is more complex than what we have done on a single day. Specially when the period starts from a day close to end of year to a day next year.
I have a trick to transform a date-of-birth to the form of:

20000 + (month-part-of-a-date * 100) plus (day-part-of-a-date).

If the whole period is within a single year, then the start and stop date is transformed into:

20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

But, if it crosses year border then it has lower and higher range detection. The lower range detection:

10000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
20000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

And for higher range detection:

20000 + (month-part-of-the-start-date * 100) plus (day-part-of-the-start-date)
30000 + (month-part-of-the-stop-date * 100) plus (day-part-of-the-stop-date)

Give it a try. To query who’s birthday between 1) Dec 30 2012 and Dec 31 2012, 2) Dec 30 2012 and Jan 01 2013

Period within single year: Dec 30 2012 and Dec 31 2012.

EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-31′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-31′::date)::integer) )
);

Result:

Seq Scan on person (cost=0.00..32213.00 rows=2500 width=8) (actual time=0.504..611.400 rows=2715 loops=1)
Filter: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000,((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 614.999 ms

Period crosses years: Dec 30 2012 and Jan 01 2013:

EXPLAIN ANALYZE
SELECT * FROM person
WHERE (
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(10000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
)
OR
(
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) )
BETWEEN
( INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,’2012-12-30′::date)::integer*100)),DATE_PART(‘DAY’,’2012-12-30′::date)::integer) )
AND
( INT4PL( INT4PL(30000,(DATE_PART(‘MONTH’,’2013-01-01′::date)::integer*100)),DATE_PART(‘DAY’,’2013-01-01′::date)::integer) )
);

Result:

Seq Scan on person (cost=0.00..57213.00 rows=4988 width=8) (actual time=1.232..1785.936 rows=4054 loops=1)
Filter: (((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 30101)))
Total runtime: 1791.009 ms

Can it be faster?

Create index:

CREATE INDEX “person_idx_dob2” ON person USING btree
INT4PL( INT4PL(20000,(DATE_PART(‘MONTH’,dob)::integer*100)), DATE_PART(‘DAY’,dob)::integer) );

Re-run exactly the same query as above and see how the index works:

Period within single year: Dec 30 2012 and Dec 31 2012:

Result:

Bitmap Heap Scan on person (cost=49.96..2518.20 rows=2500 width=8) (actual time=0.829..6.969 rows=2715 loops=1)
Recheck Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 21231))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.549..0.549 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 21231))
Total runtime: 10.649 ms (previously 614.999 ms)

Period crosses years: Dec 30 2012 and Jan 01 2013:

Result:

Bitmap Heap Scan on person (cost=101.17..2864.17 rows=4988 width=8) (actual time=1.552..9.987 rows=4054 loops=1)
Recheck Cond: (((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 20101)) OR
((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND

(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 30101)))
-> BitmapOr (cost=101.17..101.17 rows=5000 width=0) (actual time=1.210..1.210 rows=0 loops=1)
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.495..0.495 rows=1339 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 11230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 20101))
-> Bitmap Index Scan on person_idx_dob2 (cost=0.00..49.34 rows=2500 width=0) (actual time=0.710..0.710 rows=2715 loops=1)
Index Cond: ((int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) >= 21230) AND
(int4pl(int4pl(20000, ((date_part(‘MONTH’::text, (dob)::timestamp without time zone))::integer * 100)), (date_part(‘DAY’::text, (dob)::timestamp without time zone))::integer) <= 30101))
Total runtime: 15.224 ms (previously 1791.009 ms)

What about Feb 29th?

It must be filtered out if the year of period detection is not a leap year. And I do not have time to show you because I am in a hurry to say:

Happy Birthday!
Advertisements