Wednesday, March 27, 2013

Improving a PostgreSQL report performance: Part 2 - Temporary Table

The report I was working on still did not live up to expectations. There was something else going on. I had to dig a little deeper.

The report was generated by XTraReports and I have no authority to edit it. The ReportDataSource contains functions for retrieving datasets for main report and subreport.

public class ReportDataSource : BaseDataSource
{
 public DataSet GetPrimaryReportData(DateTime fromDate, DateTime toDate)
 {
  string commandText = "select * from myreportfunction;";
  var reportDataSet = ExecuteQuery("ReportDataSet", commandText, new[] { "MainDataSet" });
  return reportDataSet;
 }

 public DataSet GetSubReportData(DateTime fromDate, DateTime toDate)
 {
  string commandText = String.Format("select * from myreportsubfunction");
  return ExecuteQuery("SubReportDataSet", commandText, new[] { "SubDataSet" });
 }
}

And here's how the PostgreSQL functions looked like.

The myreportsubfunction is the one I worked on already so now it looked like the following

CREATE OR REPLACE FUNCTION myreportsubfunction(IN from timestamp without time zone, IN to timestamp without time zone)
  RETURNS TABLE(Name1 character varying, Name2 character varying) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;' USING $1, $2;
END
$BODY$
  LANGUAGE plpgsql VOLATILE

And there was the myreportfunction

CREATE FUNCTION myreportfunction ( FROMdate timestamp without time zone, todate timestamp without time zone )
 RETURNS TABLE ( name character varying, somevalue1 integer, somevalue2 real ) AS $body$
 SELECT
    something,
    sum(somevalue1)::int as somevalue1,
    sum(somevalue2)::real as somevalue2
 FROM myreportsubfunction($1, $2)      group by something;
  $body$ LANGUAGE sql;

What's going on here? Well, looks like first the myreportfunction is called, it calls the myreportsubfunction and returns aggregated results. But then the myreportsubfunction is called separately and essentially executes the same huge query again! No wonder the performance is nowhere near acceptable. Anyway, to satisfy the report requirements, I need to have the aggregated data first, which means that I need to save the results of the huge complex query, aggregate the results and return them for the main report, and then return the saved results of the query as subreport, or "detailed" data. My approach is to use the temporary table.

Here is what the functions will do:

myreportfunction

  • if a temptable exists, drop it
  • create a temptable
  • run a complex query and save results in the temptable
  • run a query that returns the aggregated data for the report

myreportsubfunction

  • if the temptable exists, return everything from the table, then drop the table

And the resulting PostgreSQL code

myreportfunction

CREATE OR REPLACE FUNCTION myreportfunction(IN fromdate timestamp without time zone, IN todate timestamp without time zone)
  RETURNS TABLE("name" character varying, somevalue1 character varying, somevalue2 character varying) AS
$BODY$
BEGIN
DROP TABLE IF EXISTS temptable;
CREATE TEMPORARY TABLE temptable("name" character varying, somevalue1 character varying, somevalue2 character varying);
DELETE FROM temptable;

EXECUTE '
insert into temptable(name, somevalue1, somevalue2)  
select Column1 as Name1, Column2 as Name2
from sometable tbl
inner join ...
where ...
and ...
and $1 <= somedate
and $2 >= somedate
group by ...
order by ...;' USING $1, $2;

RETURN QUERY EXECUTE 'SELECT name, somevalue1, somevalue2 FROM temptable group by name;';
END
$BODY$
  LANGUAGE plpgsql VOLATILE

myreportsubfunction

CREATE OR REPLACE FUNCTION myreportsubfunction(IN timestamp without time zone, IN timestamp without time zone)
  RETURNS TABLE(name character varying, somevalue1 integer, somevalue2 real) AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM temptable) THEN 
 RETURN QUERY EXECUTE'select * from temptable';
 DELETE FROM temptable;
 DROP TABLE IF EXISTS temptable;
END IF;
END
$BODY$
  LANGUAGE plpgsql VOLATILE

Now hoping for the performance improvement by at least 50% ...

References

CREATE TABLE
Improving a PostgreSQL report performance: Part 1 - RETURN QUERY EXECUTE
by . Also posted on my website

No comments: