Wednesday, February 27, 2013

On PostgreSQL Inverse mode and database audit table triggers

While working on a different issue, I noticed something strange about a history table. A history table keeps all changes done to the records in the main table - whenever something was inserted, updated or deleted, a record is added to the history table. However, each time something was inserted, there were two records added to the history table. First one had a null value in the main table foreign key field, and the second one had the correct value.

Duplicates in history table

Turns out that the reason is the way NHibernate maps objects to data. In my case, the parent entity knows about the child entity. The child entity, however, has no knowledge of the parent entity. This can be most easily explained by looking at the following snippet:

public class Order
{
 public virtual int Id { get; set; }
 public virtual ICollection<Detail> Details { get; set; }
}

public class Detail
{
 public virtual int Id { get; set; }
 public virtual string Name { get; set; }
}

// ...
var order = new Order();
var detail = new Detail() {Name = "Widget"};

session.Persist(detail);
order.Details.Add(detail);

While the Order knows about the Details, each Detail has no knowledge about the parent Order. PostgreSQL will perform the following sequence of actions:

  • Insert the Order
  • Insert the Detail with OrderId = null
  • Update the Detail with actual OrderId
which in my case looks like this:

Insert, then Update

How can that be fixed so that only one INSERT statement is executed? Well, here is where Inverse comes into play. Check the references at the end of the post for a proper explanation, but the definition is that setting Inverse to true places responsibility of the saving the relationship on the "other side", in this case - on the child. In the mapping, that will look as a change from

mapping.HasMany(x => x.Children).KeyColumn("inventoryitemid")
 .AsSet();

To

mapping.HasMany(x => x.Children).KeyColumn("inventoryitemid")
 .AsSet()
 .Inverse(); 

And the code snippet should now look this way

public class Order
{
 public virtual int Id { get; set; }
 public virtual ICollection<Detail> Details { get; set; }
}

public class Detail
{
 public virtual Order Order { get; set; }
 public virtual int Id { get; set; }
 public virtual string Name { get; set; }
}

// ...
var order = new Order();
var detail = new Detail() {Name = "Widget", Order = order};

session.Persist(detail);
order.Details.Add(detail);

Note how the Order is added to Detail and also is passed when the Detail object is created. In my case, the profiler now shows the following:

Insert only

So, problem solved? Well, yes and no. While profiling the modified solution, I found that now a lot more actions were logged in the same scenario - most of them were second level cache access. The impact on performance is not immediately obvious, but hardly beneficial.

Second level cache access

It is likely that by adding a reference to the parent I forced NHibernate to retrieve the whole parent entity. Eventually, not having enough time to evaluate full impact of the change, a somewhat "compromise" solution was chosen: modify a trigger that inserts into a "history" table to check if the value of a foreign key is not null.

So, a trigger that looked like this

CREATE OR REPLACE FUNCTION process_emp_audit() RETURNS TRIGGER AS $emp_audit$
    BEGIN
        IF (TG_OP = 'INSERT') THEN
            INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
            RETURN NEW;
  ...
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$emp_audit$ LANGUAGE plpgsql;

would now be modified as follows

CREATE OR REPLACE FUNCTION process_emp_audit() RETURNS TRIGGER AS $emp_audit$
    BEGIN
        IF (TG_OP = 'INSERT' AND NEW.foreignkeyfield IS NOT NULL) THEN
            INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
            RETURN NEW;
  ...
        RETURN NULL; -- result is ignored since this is an AFTER trigger
    END;
$emp_audit$ LANGUAGE plpgsql;

Looks like a dirty hack, which falls into the "best we can do in the least time" category.

References:

NHibernate Inverse attribute
Fluent NHibernate and Collections Mapping
NHibernate performance issues #1: evil List (non-inverse relationhip)
NHibernate's inverse - what does it really mean?
39.9. Trigger Procedures
by . Also posted on my website

No comments: