This post will be probably be very boring for everyone who doesn't have this problem. But since it cost me some time to figure out a solution, I think it is worth sharing.
Problem Statement
Let's consider a PostgreSQL table calledfoo
with an ID column and a unique null-able column called bar
:
CREATE TABLE foo (
id serial PRIMARY KEY,
bar int UNIQUE
);
Our goal is to perform an operation on that table that given some value $X$ for bar
1. in case $X$ was not in the table, inserts a row with bar
set to $X$ and returns the newly inserted ID
2. or otherwise returns the ID of the row where bar
contains the value $X$.
This is a special case of a so called UPSERT (( combination of "INSERT" and "UPDATE" )) where the actual update should not change anything.
Returning the ID
PostgreSQL supports returning a newly inserted ID through theRETURNING
construct (( There are other ways to accomplish this. Namely curval
and LASTVAL
but they are not feasible in the general case where triggers may insert additional rows (in the same table). )):
INSERT INTO foo (bar) VALUES(42) RETURNING id; -- returns newly inserted id (1 for example)
INSERT INTO foo (bar) VALUES(23) RETURNING id; -- returns newly inserted id (2 for example)
INSERT INTO foo (bar) VALUES(23) RETURNING id; -- this throws an error as expected
Handle Conflicts
We now want to formulate a query that either inserts23
and returns the ID $2$ or that just returns the ID of 23
if 23
was already present. PostgreSQL supports this through the ON CONFLICT
construct. Let's try
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(bar) DO NOTHING RETURNING id;
This correctly returns the newly inserted ID in case $23$ wasn't in the table before, but doesn't return anything if it was. Since RETURNING
only returns stuff of rows that have been inserted or updated, we need to simulate an update in case the value already exists:
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(bar) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
This correctly returns $2$. Please note that one needs to specify on which column the conflict happens now: For DO NOTHING
this was optional but according to the docs DO UPDATE
requires to specify this explicitly.
The Billion Dollars Mistake
So far so good. Now a well-known (( not to me of course )) detail of SQL is that unique constraints treatsNULL
to be different from everything else, especially NULL
. So the following insert commands lead to two new rows in the table which have the same value in the bar
column (at least for some definition of "the same"):
INSERT INTO foo (bar) VALUES(NULL);
INSERT INTO foo (bar) VALUES(NULL);
SELECT * FROM foo;
id | bar
----+-----
1 | 42
2 | 23
3 | NULL
4 | NULL
In my use case there should at most be one row with a value of NULL
. Sadly, there doesn't seem to be a satisfying solution for this. The best thing I could come up with is to replace NULL
by some special value in the range of bar
that doesn't normally occur in the data. If bar
models the age of a person for example -1
may be such a special value. If there is no such value in your range, you are screwed I think. If I am mistaken, please let me know in the comments below!
So let's assume the value 0
is special for the bar
column in that sense that bar
never takes 0
as a value. Sadly, we cannot use the UNIQUE
shortcut during table creation any more but need to define the index separately:
CREATE TABLE foo (
id serial PRIMARY KEY,
bar int CHECK(bar != 0) -- let's at least enforce that the special value does not appear
);
CREATE UNIQUE INDEX foo_unique_idx ON foo (COALESCE(bar, 0));
Executing the following insert
INSERT INTO foo (bar) VALUES(42) ON CONFLICT(bar) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
of course causes an error since there is no index on the column bar
anymore but on something else (namely either the value of bar
or $0$ if it is NULL
). According to the documentation ON CONFLICT
accepts a constraint name. Sadly technically, an index isn't a constraint. If you are using PostgrSQL version 9.5.3 or lower, you are screwed again now. Because in those versions ON CONFLICT
doesn't accept arbitrary expression.
UPSERT at Last
If you can update to more recent version, the following inserts correctly returns three different ids:INSERT INTO foo (bar) VALUES(42) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(23) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(NULL) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
INSERT INTO foo (bar) VALUES(NULL) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id;
That's it. Hope it helps someone out there.
Last Caveat
The way PostgreSQL handles upserts implemented withON CONFLICT
leads to the sequence corresponding to the ID column increasing even in the conflict (and update) case. So this technique may not be feasible in cases where successful inserts happen rarely but queries like above are executed rapidly.
5 Replies to “Atomic UPSERT with unique constraint on null-able column in PostgreSQL”