I was working on optimising a report which had a very poor performance. The "heart" of the report was a fairly complex query which I will briefly refer to as follows
select Column1 as Name1, Column2 as Name2 from sometable tbl inner join ... where ... and ... and $1 <= somedate and $2 >= somedate group by ... order by ...;
In fact, the query joined seven tables and several WHERE conditions, grouping on four fields and finally sorting the results. I went through the usual stuff with analyzing the query plan, verifying that all required indexes were in place (a couple of joins on particularly large tables, unfortunately, were on the 'varchar' fields, but modifying the database at this stage is out of the question so I had to do what I could). Eventually what limited amount of tricks I had at my disposal was depleted, and the performance only slightly improved. However, when I measured the performance of the report when called from the application and compared it to running the query directly against the database, there was a significant overhead in case of the report. When the report was ran from code, it sent the following query to the database:
select * from myreportsubfunction ('2013-03-13', '2013-03-20');
And the myreportsubfunction was declared similar to the following:
CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone) RETURNS TABLE(Name1 character varying, Name2 character varying) AS $BODY$ select Column1 as Name1, Column2 as Name2 from sometable tbl inner join ... where ... and ... and $1 <= somedate and $2 >= somedate group by ... order by ...; $BODY$ LANGUAGE sql VOLATILE
So - what's the trick here? The function seems to return the result of the query, but takes way much longer to execute compared to the raw query. And here is the reason: when the database prepares a query plan for the function, it does not know anything about the parameters. The result is likely to be a bad query plan, especially if the query is complex. The solution is to change sql language to plpgsql and make use of the RETURN QUERY EXECUTE command. Now the myreportsubfunction looks like the following:
CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone) RETURNS TABLE(Name1 character varying, Name2 character varying) AS $BODY$ BEGIN RETURN QUERY EXECUTE 'select Column1 as Name1, Column2 as Name2 from sometable tbl inner join ... where ... and ... and $1 <= somedate and $2 >= somedate group by ... order by ...;' USING $1, $2; END $BODY$ LANGUAGE plpgsql VOLATILE
The function now takes as much time to run as the "raw" query, significantly improving the performance.
References
39.5. Basic StatementsPostgresql Slow on custom function, php but fast if directly input on psql using text search with gin index
by Evgeny. Also posted on my website
No comments:
Post a Comment