Atomic UPSERT with unique constraint on null-able column in PostgreSQL



This post will be probably be very boring for everyone who doesn't have this problem. But since it cost me some time to figure out a solution, I think it is worth sharing.

Problem Statement

Let's consider a PostgreSQL table called foo with an ID column and a unique null-able column called bar:
CREATE TABLE foo (
    id serial PRIMARY KEY,
    bar int UNIQUE
);
Our goal is to perform an operation on that table that given some value $X$ for bar 1. in case $X$ was not in the table, inserts a row with bar set to $X$ and returns the newly inserted ID 2. or otherwise returns the ID of the row where bar contains the value $X$. This is a special case of a so called UPSERT (( combination of "INSERT" and "UPDATE" )) where the actual update should not change anything.

Returning the ID

PostgreSQL supports returning a newly inserted ID through the RETURNING construct (( There are other ways to accomplish this. Namely curval and LASTVAL but they are not feasible in the general case where triggers may insert additional rows (in the same table). )):
INSERT INTO foo (bar) VALUES(42) RETURNING id; -- returns newly inserted id (1 for example)
INSERT INTO foo (bar) VALUES(23) RETURNING id; -- returns newly inserted id (2 for example)
INSERT INTO foo (bar) VALUES(23) RETURNING id; -- this throws an error as expected

Handle Conflicts

We now want to formulate a query that either inserts 23 and returns the ID $2$ or that just returns the ID of 23 if 23 was already present. PostgreSQL supports this through the ON CONFLICT construct. Let's try
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(bar) DO NOTHING RETURNING id;
This correctly returns the newly inserted ID in case $23$ wasn't in the table before, but doesn't return anything if it was. Since RETURNING only returns stuff of rows that have been inserted or updated, we need to simulate an update in case the value already exists:
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(bar) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
This correctly returns $2$. Please note that one needs to specify on which column the conflict happens now: For DO NOTHING this was optional but according to the docs DO UPDATE requires to specify this explicitly.

The Billion Dollars Mistake

So far so good. Now a well-known (( not to me of course )) detail of SQL is that unique constraints treats NULL to be different from everything else, especially NULL. So the following insert commands lead to two new rows in the table which have the same value in the bar column (at least for some definition of "the same"):
INSERT INTO foo (bar) VALUES(NULL);
INSERT INTO foo (bar) VALUES(NULL);

SELECT * FROM foo;
 id | bar
----+-----
  1 |  42
  2 |  23
  3 |  NULL
  4 |  NULL
In my use case there should at most be one row with a value of NULL. Sadly, there doesn't seem to be a satisfying solution for this. The best thing I could come up with is to replace NULL by some special value in the range of bar that doesn't normally occur in the data. If bar models the age of a person for example -1 may be such a special value. If there is no such value in your range, you are screwed I think. If I am mistaken, please let me know in the comments below! So let's assume the value 0 is special for the bar column in that sense that bar never takes 0 as a value. Sadly, we cannot use the UNIQUE shortcut during table creation any more but need to define the index separately:
CREATE TABLE foo (
    id serial PRIMARY KEY,
    bar int CHECK(bar != 0) -- let's at least enforce that the special value does not appear
);
CREATE UNIQUE INDEX foo_unique_idx ON foo (COALESCE(bar, 0));
Executing the following insert
INSERT INTO foo (bar) VALUES(42) ON CONFLICT(bar) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
of course causes an error since there is no index on the column bar anymore but on something else (namely either the value of bar or $0$ if it is NULL). According to the documentation ON CONFLICT accepts a constraint name. Sadly technically, an index isn't a constraint. If you are using PostgrSQL version 9.5.3 or lower, you are screwed again now. Because in those versions ON CONFLICT doesn't accept arbitrary expression.

UPSERT at Last

If you can update to more recent version, the following inserts correctly returns three different ids:
INSERT INTO foo (bar) VALUES(42) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(NULL) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(NULL) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
That's it. Hope it helps someone out there.

Last Caveat

The way PostgreSQL handles upserts implemented with ON CONFLICT leads to the sequence corresponding to the ID column increasing even in the conflict (and update) case. So this technique may not be feasible in cases where successful inserts happen rarely but queries like above are executed rapidly.

Tags: - -

5 Replies to “Atomic UPSERT with unique constraint on null-able column in PostgreSQL”

  1. One more upsert problem. Imagine third column in your example "baz TEXT NOT NULL". Now this query become invalid: INSERT INTO foo (bar) VALUES(42) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; You get [23502] ERROR: null value in column "baz" violates not-null constraint. But why? Because Postgres actually trying to insert before update and not null constraint check happens before ON CONFLICT. So, even if baz filled with values not-null always will be violated.
    1. what value have you entered for baz? CREATE TABLE foo ( id serial PRIMARY KEY, bar int CHECK(bar != 0), baz int NOT NULL ); CREATE UNIQUE INDEX foo_unique_idx ON foo (COALESCE(bar, 0)); INSERT INTO foo (bar) VALUES(42) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; ERROR: null value in column "baz" violates not-null constraint DETAIL: Failing row contains (1, 42, null). INSERT INTO foo (bar,baz) VALUES(42,1) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar, baz=EXCLUDED.baz RETURNING *; id | bar | baz ----+-----+----- 2 | 42 | 1 INSERT INTO foo (bar,baz) VALUES(42,2) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar, baz=EXCLUDED.baz RETURNING *; id | bar | baz ----+-----+----- 2 | 42 | 2
  2. > Hope it helps someone out there. It helped me today, after quite a while of trying suggestions from a whole number of different places - thank you!

Leave a Reply to Seb Cancel reply

Your email address will not be published. Required fields are marked *