Parallel Query Inside Function

Parallel query is a big help for large amount of data retrieval. On my project using PostgreSQL 12.2, I found that the feature is enabled in plain SQL but disabled inside PLPGSQL or even SQL function. For years, I rely on user defined function for some considerations. First, to hide sql structure details from front end logic so that (as long as function declaration does not change) the change in internal query statements does not require front-end code rewriting. Second, it is easier to grant execution of function to group or login role while keep tables private rather than assign specific access for any single tables involved in query. How to make parallel query enabled inside function?

I have PostgreSQL 12.2 in my CentOS-7 laptop and create a table with single column:
create table public.test (mark integer not null);

Populate with ten million random numbers:
insert into public.test select (random() * 100)::integer from generate_series(1,10000000);

Update statistics:
analyze public.test;

Parallel query configuration setting is default:
show max_parallel_workers_per_gather;
2

Play with Query

Think of building histogram with 4 bins (25, 50, 75 and 100), let us check query performance using plain SQL versus PLPGSQL and SQL function (query structure here is just an example and you may have better one).

Plain SQL

select 25 as bin, count(*) as freq, avg(mark) from public.test where mark between 0 and 25
union
select 50, count(*), avg(mark) from public.test where mark between 26 and 50
union
select 75, count(*), avg(mark) from public.test where mark between 51 and 75
union
select 100, count(*), avg(mark) from public.test where mark between 76 and 100
order by 1;
bin | freq | avg
-----+---------+---------------------
25 | 2548432 | 12.7408167061157606
50 | 2501760 | 37.9986689370683039
75 | 2496096 | 63.0057958508006102
100 | 2453712 | 87.7504262928982700
(4 rows)

Time: 1917.847 ms (00:01.918)

Runs for 1.9 seconds. Explain the query plan in graphics and see that parallel query works:
Query Plan

PLPGSQL Function

create or replace function public.markstat_plpgsql(OUT bin integer, OUT freq bigint, OUT avg numeric)
returns setof record as
$body$
begin
return query
select 25 as bin, count(*) as freq, avg(mark) from public.test where mark between 0 and 25
union
select 50, count(*), avg(mark) from public.test where mark between 26 and 50
union
select 75, count(*), avg(mark) from public.test where mark between 51 and 75
union
select 100, count(*), avg(mark) from public.test where mark between 76 and 100
order by 1;
end;
$body$
language plpgsql stable;

select * from public.markstat_plpgsql();
bin | freq | avg
-----+---------+---------------------
25 | 2548432 | 12.7408167061157606
50 | 2501760 | 37.9986689370683039
75 | 2496096 | 63.0057958508006102
100 | 2453712 | 87.7504262928982700
(4 rows)

Time: 3444.977 ms (00:03.445)

The same result with longer run 3.4 seconds (1.8 times slower than plain SQL version).

SQL Function

create or replace function public.markstat_sql(OUT bin integer, OUT freq bigint, OUT avg numeric)
returns setof record as
$body$
select 25 as bin, count(*) as freq, avg(mark) from public.test where mark between 0 and 25
union
select 50, count(*), avg(mark) from public.test where mark between 26 and 50
union
select 75, count(*), avg(mark) from public.test where mark between 51 and 75
union
select 100, count(*), avg(mark) from public.test where mark between 76 and 100
order by 1;
$body$
language sql stable;

select * from public.markstat_sql();
bin | freq | avg
-----+---------+---------------------
25 | 2548432 | 12.7408167061157606
50 | 2501760 | 37.9986689370683039
75 | 2496096 | 63.0057958508006102
100 | 2453712 | 87.7504262928982700
(4 rows)

Time: 3448.760 ms (00:03.449)

3.4 seconds, almost as slow as PLPGSQL version.

Discussion

Plain SQL: 1.9 seconds
PLPGSQL function: 3.4 seconds
SQL function: 3.4 seconds

Why query runs in function is slower? CPU usage graph below clears up the situation.
SQL CPU Usage
With plain SQL (1), all of the CPU works indicating parallel query. While PLPGSQL (2) and SQL (3) functions only run on single CPU, not a parallel query.

Not much I can do with SQL function since the query structure is already exactly the same as the plain one. It looks like I can do more with PLPGSQL function. How if instead of ‘RETURN QUERY’ the heavy query directly, the result set is saved into temporary array in memory then export it out once finished.

Create composite type for temporary array:
create type public.stat AS (bin integer, freq bigint, avg numeric);

Create second PLPGSQL function version, call it public.markstat_plpgsql2.
create or replace function public.markstat_plpgsql2(OUT bin integer, OUT freq bigint, OUT avg numeric)
returns setof record as
$body$
declare
_stats public.stat[];
begin
select array_agg(t) into _stats from (
select 25 as bin, count(*) as freq, avg(mark) from public.test where mark between 0 and 25
union
select 50, count(*), avg(mark) from public.test where mark between 26 and 50
union
select 75, count(*), avg(mark) from public.test where mark between 51 and 75
union
select 100, count(*), avg(mark) from public.test where mark between 76 and 100
) t;
return query select s.bin, s.freq, s.avg from unnest(_stats) s order by s.bin;
end;
$body$
language plpgsql stable;

select * from public.markstat_plpgsql2();
bin | freq | avg
-----+---------+---------------------
25 | 2548432 | 12.7408167061157606
50 | 2501760 | 37.9986689370683039
75 | 2496096 | 63.0057958508006102
100 | 2453712 | 87.7504262928982700
(4 rows)

Time: 2063.029 ms (00:02.063)

Now, it is about 2.1 seconds, close to the fastest plain SQL 1.9 seconds.

Please take a look on CPU usage graph below:
SQL CPU Usage

Number 1, 2 and 3 are plain SQL, PLPGSQL function and SQL function, respectively. While number 4 is for my newly crafted PLPGSQL function called public.markstat_plpgsql2, runs on multiple CPU, parallel query is back.

2 comments
  1. fdv said:

    If you add “parallel safe” to your function, it’s working !
    Not need to modify it, plpgsql as sql.

  2. abdulyadi said:

    Thank you for showing the solution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: