There is a problem generally called 'index bloating' that occurs in PostgreSQL under certain circumstances when there are continuous inserts and deletes happening in a table. It is described as follows "B-tree index pages that have become completely empty are reclaimed for re-use. However, there is still a possibility of inefficient use of space: if all but a few index keys on a page have been deleted, the page remains allocated. Therefore, a usage pattern in which most, but not all, keys in each range are eventually deleted will see poor use of space. For such usage patterns, periodic reindexing is recommended".
This looks exactly like the problem I came across, when a table with ~2K rows had an index of over 120MB and another table with ~80K rows had 4 indexes on it, with total size approaching 3GB. The AUTOVACUUM was running as configured by default but apparently not enough to prevent index bloating.
Eventually, I decided configuring a REINDEX to run monthly on said tables. pgAgent is the job scheduler to use with PostgreSQL, but with PostgreSQL 9.1 I could install it following the documentation - the tables were created and the service was running, but I could not find any UI for it. So here's an example script that I used to create a scheduled job.
SET search_path = pgagent; INSERT INTO pga_jobclass VALUES (6, 'Scheduled Tasks'); INSERT INTO pga_job VALUES (5, 6, 'TableReindex', 'Reindex tables', '', true, '2013-03-27 10:00:00.000+11', --date created '2013-03-07 10:00:00.000+11', --date changed NULL, NULL, NULL); INSERT INTO pga_schedule VALUES (3, 5, 'TableReindexSchedule', 'Reindex tables', true, --enabled '2013-03-27 10:00:00.000+11', --start date NULL, --end (never) '{t,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f}', --minutes: 't' for run on the first minute of an hour '{t,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f}', --hours: 't' to run at 3 AM '{t,t,t,t,t,t,t}', -- weekdays: don't care, all false '{t,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f}', -- monthdays: 't' to run on the first day '{t,t,t,t,t,t,t,t,t,t,t,t}'); -- months: all true to run on the first day on each month INSERT INTO pga_jobstep VALUES (5, 5, 'TableReindexInfo', '', true, 's', 'REINDEX TABLE mytable1;REINDEX TABLE mytable2;', '', '@@DATABASE_NAME@@', 'f', NULL);
To verify, I checked the pga_job table and found '2013-04-01 03:00:00+11' in jobnextrun column - that's when I want to run it, at 3 AM on the first of next month.
I still have a question though - I tried using the
'{f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,t}', -- monthdays: 't'
to run on the first day pattern, because the last value is supposedly used for setting the "last day" - run on the last day of the month.
This, however, returns me quite a bunch of errors which suggest that PostgreSQL has some troubles calculating the next time the job will run.
ERROR: value "32768" is out of range for type smallint CONTEXT: PL/pgSQL function "pga_next_schedule" line 1254 at assignment SQL statement "SELECT MIN(pgagent.pga_next_schedule(jscid, jscstart, jscend, jscminutes, jschours, jscweekdays, jscmonthdays, jscmonths)) FROM pgagent.pga_schedule WHERE jscenabled AND jscjobid=OLD.jobid" PL/pgSQL function "pga_job_trigger" line 24 at SQL statement SQL statement "UPDATE pgagent.pga_job SET jobnextrun = NULL WHERE jobenabled AND jobid=NEW.jscjobid" PL/pgSQL function "pga_schedule_trigger" line 60 at SQL statement ********** Error ********** ERROR: value "32768" is out of range for type smallint SQL state: 22003 Context: PL/pgSQL function "pga_next_schedule" line 1254 at assignment SQL statement "SELECT MIN(pgagent.pga_next_schedule(jscid, jscstart, jscend, jscminutes, jschours, jscweekdays, jscmonthdays, jscmonths)) FROM pgagent.pga_schedule WHERE jscenabled AND jscjobid=OLD.jobid" PL/pgSQL function "pga_job_trigger" line 24 at SQL statement SQL statement "UPDATE pgagent.pga_job SET jobnextrun = NULL WHERE jobenabled AND jobid=NEW.jscjobid" PL/pgSQL function "pga_schedule_trigger" line 60 at SQL statement
The simplest way to reproduce it is to run an UPDATE similar to the following
UPDATE pgagent.pga_schedule SET jscmonthdays = '{f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,t}' WHERE jscid = 3
Since I don't really care about running it on a first or last day, I won't dig deep into it. Could be a bug in PostgreSQL for all I know.
References:
23.2. Routine ReindexingHow To Install PgAgent On Windows (PostgreSQL Job Scheduler)
Setting up PgAgent and Doing Scheduled Backups
Automated Backup on Windows
by Evgeny. Also posted on my website
No comments:
Post a Comment