When Your Postgres Database and ORM Collide: Partitioning

Thursday, March 26, 2015 - Jerry Sievert

An ORM (Object-Relational Mapping) has become a near-essential tool of software development. Whether you agree with the model or not, it has become ubiquitous. So, what happens when your ORM is so generic that it can't actually deal with the advanced features of your database? Problem: impedance mismatch. How bad can it be? Really bad, and the workarounds can be just as bad, if not worse.

Let's talk about where things can break down: partitioning. Partitioning is a very specific optimization for databases; tables are broken up into subtables that store the data, and can be queried separately either directly or via a CHECK constraint that confines the query to a specific table. Partitioning with Postgres uses a base table, and tables that inherit from that table, along with a trigger that puts data where it needs to go. It's a rather manual process, but it's extremely powerful and allows for a lot of options.

Setting Up

Let's look at a specific example - create the primary table:

CREATE TABLE t (
  id PRIMARY KEY SERIAL,
  p NUMERIC,
  value VARCHAR
);

And create the partitioned tables, using p as the partition key:

CREATE TABLE t1
  (CHECK (p = 1))
  INHERITS (t);


CREATE TABLE t2 (CHECK (p = 2)) INHERITS (t);

We have created two (extra) tables, but on insert we need to be able to put the data where it needs to be. This requires a TRIGGER (which requires a function to run when the INSERT happens):

CREATE OR REPLACE FUNCTION t_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.p = 1) THEN
        INSERT INTO t1 VALUES (NEW.*);
    ELSIF ( NEW.p = 2 ) THEN
        INSERT INTO t2 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'p out of range.  It needs to be 1 or 2!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

So here's the trigger:

CREATE TRIGGER trigger_t
  BEFORE INSERT
  ON t FOR EACH ROW
  EXECUTE PROCEDURE t_insert_trigger();

From here, whenever we insert something into the table t, it goes into the correct table (or errors if p doesn't contain 1 or 2, details right?).

The Problem

Postgres's trigger partitioning has very specific requirements that direct its behavior. We are specifically intercepting the INSERT and changing where the data lives in the database. This is extremely powerful, but the requirement is to either return NULL, which causes the database to just continue on and not do anything further (which, since we are inserting into other tables is generally what we want), or if we return a value it gets inserted into the parent table (whoa, suddenly we have two copies: one in the table we want it to be in, and one in the parent table that we don't want it in).

During a normal INSERT, this is fine, but most ORMs try to be smart. They insert the data into the table on your behalf, and attempt to get the results of the INSERT back for you. Considering that the database can make changes to the data, typically add a unique identifier, and do other things, this smart actually turns into smart.

There's a pretty straightforward pattern for ORMs to use with Postgres when trying to get the latest changes on an INSERT: RETURNING *:

INSERT INTO t (p, value) VALUES (1, 'hello world') RETURNING *;

Without partitioning, this would return something like:

id: 5, p: 1, value: 'hello world'

The ORM takes this, and converts it into a model, and all is good. Since we are intercepting the INSERT, and inserting the data into the correct table, and returning NULL, the ORM gets nothing back. Often times, this causes a bunch of additional inserts, which could cause other issues. Remember, we are no longer returning the data that we inserted, so the results are NULL.

Working Around the Problem

Let's talk about working around the problem. There's a fairly straightforward way that we could work through this. We could override the insert method of whatever ORM we are using and try to be smart about overriding our RETURNING * to correctly return the latest record inserted via a TRANSACTION.

The standard response is to create a TRIGGER that inserts the data, then create an after trigger that deletes the extra row. Whoa. That's a lot of extra work (see here for more details).

That's a lot of work, so let's look at how we should be able to improve that. We can begin a TRANSACTION, do the INSERT, and SELECT the latest ID by using CURRVAL():

INSERT INTO t (p, value) VALUES (1, 'hello world');
SELECT * FROM t WHERE id = CURRVAL('t_id_seq');

Or, if your ORM is using a language like ruby and sequel:

  execute_insert(sql)
  tn = self.model.table_name.to_s
  returning_sql = "SELECT * FROM #{tn} WHERE id=currval('#{tn}_id_seq');"
  returning_fetch_rows(returning_sql, &block)

This sucks. No, really, it's not good. Querying against CURRVAL executes a SEQ SCAN against all of the tables of the partition scheme. If we have 25 tables, that means 25 parallel queries reading everything off of disk and scanning through it (this is also sometimes known as a full table scan). When you're dealing with millions of rows of data, this becomes extremely expensive. Seriously, don't use CURRVAL(), you would think that it should be in memory, but it's not: you'll be watching it scan millions of rows of data, possibly seconds per INSERT.

So, let's look at something that could be better:

  tn = self.model.table_name.to_s

row = DB.fetch("SELECT nextval('#{tn}_id_seq') AS next;").server(:default).first id = row[:next]

values[0][:id] = id

sql = insert_sql(*values)

execute_insert(sql)

returning_sql = "SELECT * FROM #{tn} WHERE id=#{id};" returning_fetch_rows(returning_sql, &block)

It's a lot more work, but here we're getting the NEXTVAL() of the sequence, assigning it to the id and doing the INSERT from there. It's a lot more work, and requires a big view into both the database and the code, but ultimately means a huge improvement overall.

In our case, it meant a change from a constant CPU load of 25%, to a CPU load of less than 2%. This is significant.

By understanding the full path of how data is dealt with, we can make major optimizations that can have big impacts on costs and infrastructures.

ORMs are stupid (and so are databases), but they don't have to be. A little bit of view into both your ORM and your database can have a massive change in both performance and cost.