Atomic UPSERT with unique constraint on null-able column in PostgreSQL



This post will be probably be very boring for everyone who doesn't have this problem. But since it cost me some time to figure out a solution, I think it is worth sharing. <span id="more-4594"></span> <h2>Problem Statement</h2> Let's consider a PostgreSQL table called `foo` with an ID column and a unique null-able column called `bar`: ```sql CREATE TABLE foo ( id serial PRIMARY KEY, bar int UNIQUE ); ``` Our goal is to perform an operation on that table that given some value $X$ for `bar` 1. in case $X$ was not in the table, inserts a row with `bar` set to $X$ and returns the newly inserted ID 2. or otherwise returns the ID of the row where `bar` contains the value $X$. This is a special case of a so called UPSERT (( combination of "INSERT" and "UPDATE" )) where the actual update should not change anything. <h2>Returning the ID</h2> PostgreSQL supports returning a newly inserted ID through the `RETURNING` construct (( There are other ways to accomplish this. Namely `curval` and `LASTVAL` but they are not feasible in the general case where triggers may insert additional rows (in the same table). )): ```sql INSERT INTO foo (bar) VALUES(42) RETURNING id; -- returns newly inserted id (1 for example) INSERT INTO foo (bar) VALUES(23) RETURNING id; -- returns newly inserted id (2 for example) INSERT INTO foo (bar) VALUES(23) RETURNING id; -- this throws an error as expected ``` <h2>Handle Conflicts</h2> We now want to formulate a query that either inserts `23` and returns the ID $2$ or that just returns the ID of `23` if `23` was already present. PostgreSQL supports this through the `ON CONFLICT` construct. Let's try ```sql INSERT INTO foo (bar) VALUES(23) ON CONFLICT(bar) DO NOTHING RETURNING id; ``` This correctly returns the newly inserted ID in case $23$ wasn't in the table before, but doesn't return anything if it was. Since `RETURNING` only returns stuff of rows that have been inserted or updated, we need to simulate an update in case the value already exists: ```sql INSERT INTO foo (bar) VALUES(23) ON CONFLICT(bar) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; ``` This correctly returns $2$. Please note that one needs to specify on which column the conflict happens now: For `DO NOTHING` this was optional but according to the docs `DO UPDATE` requires to specify this explicitly. <h2>The Billion Dollars Mistake</h2> So far so good. Now a well-known (( not to me of course )) detail of SQL is that unique constraints treats `NULL` to be different from everything else, especially `NULL`. So the following insert commands lead to two new rows in the table which have the same value in the `bar` column (at least for some definition of "the same"): ```sql INSERT INTO foo (bar) VALUES(NULL); INSERT INTO foo (bar) VALUES(NULL); SELECT * FROM foo; id | bar ----+----- 1 | 42 2 | 23 3 | NULL 4 | NULL ``` In my use case there should at most be one row with a value of `NULL`. Sadly, there doesn't seem to be a satisfying solution for this. The best thing I could come up with is to replace `NULL` by some special value in the range of `bar` that doesn't normally occur in the data. If `bar` models the age of a person for example `-1` may be such a special value. If there is no such value in your range, you are screwed I think. If I am mistaken, please let me know in the comments below! So let's assume the value `0` is special for the `bar` column in that sense that `bar` never takes `0` as a value. Sadly, we cannot use the `UNIQUE` shortcut during table creation any more but need to define the index separately: ``` CREATE TABLE foo ( id serial PRIMARY KEY, bar int CHECK(bar != 0) -- let's at least enforce that the special value does not appear ); CREATE UNIQUE INDEX foo_unique_idx ON foo (COALESCE(bar, 0)); ``` Executing the following insert ``` INSERT INTO foo (bar) VALUES(42) ON CONFLICT(bar) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; ``` of course causes an error since there is no index on the column `bar` anymore but on something else (namely either the value of `bar` or $0$ if it is `NULL`). According to the documentation `ON CONFLICT` accepts a constraint name. Sadly technically, an index isn't a constraint. If you are using PostgrSQL version 9.5.3 or lower, you are screwed again now. Because in those versions `ON CONFLICT` doesn't accept arbitrary expression. <h2>UPSERT at Last</h2> If you can update to more recent version, the following inserts correctly returns three different ids: ``` INSERT INTO foo (bar) VALUES(42) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; INSERT INTO foo (bar) VALUES(23) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; INSERT INTO foo (bar) VALUES(23) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; INSERT INTO foo (bar) VALUES(NULL) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; INSERT INTO foo (bar) VALUES(NULL) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; ``` That's it. Hope it helps someone out there. <h2>Last Caveat</h2> The way PostgreSQL handles upserts implemented with `ON CONFLICT` leads to the sequence corresponding to the ID column increasing even in the conflict (and update) case. So this technique may not be feasible in cases where successful inserts happen rarely but queries like above are executed rapidly.

Tags: - -

5 Replies to “Atomic UPSERT with unique constraint on null-able column in PostgreSQL”

  1. One more upsert problem. Imagine third column in your example "baz TEXT NOT NULL". Now this query become invalid: INSERT INTO foo (bar) VALUES(42) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; You get [23502] ERROR: null value in column "baz" violates not-null constraint. But why? Because Postgres actually trying to insert before update and not null constraint check happens before ON CONFLICT. So, even if baz filled with values not-null always will be violated.
    1. what value have you entered for baz? CREATE TABLE foo ( id serial PRIMARY KEY, bar int CHECK(bar != 0), baz int NOT NULL ); CREATE UNIQUE INDEX foo_unique_idx ON foo (COALESCE(bar, 0)); INSERT INTO foo (bar) VALUES(42) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar RETURNING id; ERROR: null value in column "baz" violates not-null constraint DETAIL: Failing row contains (1, 42, null). INSERT INTO foo (bar,baz) VALUES(42,1) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar, baz=EXCLUDED.baz RETURNING *; id | bar | baz ----+-----+----- 2 | 42 | 1 INSERT INTO foo (bar,baz) VALUES(42,2) ON CONFLICT(COALESCE(bar, 0)) DO UPDATE SET bar=EXCLUDED.bar, baz=EXCLUDED.baz RETURNING *; id | bar | baz ----+-----+----- 2 | 42 | 2
  2. &gt; Hope it helps someone out there. It helped me today, after quite a while of trying suggestions from a whole number of different places - thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *