Wednesday, March 20, 2013

Improving a PostgreSQL report performance: Part 1 - RETURN QUERY EXECUTE

I was working on optimising a report which had a very poor performance. The "heart" of the report was a fairly complex query which I will briefly refer to as follows

select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;

In fact, the query joined seven tables and several WHERE conditions, grouping on four fields and finally sorting the results. I went through the usual stuff with analyzing the query plan, verifying that all required indexes were in place (a couple of joins on particularly large tables, unfortunately, were on the 'varchar' fields, but modifying the database at this stage is out of the question so I had to do what I could). Eventually what limited amount of tricks I had at my disposal was depleted, and the performance only slightly improved. However, when I measured the performance of the report when called from the application and compared it to running the query directly against the database, there was a significant overhead in case of the report. When the report was ran from code, it sent the following query to the database:

select * from myreportsubfunction ('2013-03-13', '2013-03-20');

And the myreportsubfunction was declared similar to the following:

CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone)
  RETURNS TABLE(Name1 character varying, Name2 character varying) AS
$BODY$
select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;
$BODY$
  LANGUAGE sql VOLATILE

So - what's the trick here? The function seems to return the result of the query, but takes way much longer to execute compared to the raw query. And here is the reason: when the database prepares a query plan for the function, it does not know anything about the parameters. The result is likely to be a bad query plan, especially if the query is complex. The solution is to change sql language to plpgsql and make use of the RETURN QUERY EXECUTE command. Now the myreportsubfunction looks like the following:

CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone)
  RETURNS TABLE(Name1 character varying, Name2 character varying) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;' USING $1, $2;
END
$BODY$
  LANGUAGE plpgsql VOLATILE

The function now takes as much time to run as the "raw" query, significantly improving the performance.

References

39.5. Basic Statements
Postgresql Slow on custom function, php but fast if directly input on psql using text search with gin index
by . Also posted on my website

No comments: