There is a subtle difference in how SQL UNIQUE
and DISTINCT
reason about NULL
values. For UNIQUE
they are not the same, but for DISTINCT
they are 🤯
Cheat Sheet
- UNIQUE constraint makes sure no values will be the same in that column except for
null
values. - DISTINCT on the column will return all possible values (including
null
) exactly once.
UNIQUE
Let’s say you have a very simple table customer
that has a primary key id
that is also auto-increment, and two more columns: name
and personal_id
.
Furthermore, you know that personal_id
cannot be the same for any of the customers, so we’ll put the UNIQUE constraint on this column.
DDL for such a table (for Postgresql) would be like this:
1
2
3
4
5
create table customer (
id Serial primary key,
name varchar(50),
personal_id varchar(50) unique
);
Now, let’s populate this table with some data:
1
2
3
4
5
6
7
8
9
10
INSERT INTO customer(name, personal_id)
VALUES
('Darrel Beckett', '875820185015303160')
('Franklyn Paget', null)
('Russ Thompkins', '173662838850888606')
('Jackson Martinson', '459054575053219773')
('Yolanda Frost', null)
('Lindsay Granger', '625931342427148797')
('Corbin Steele', '916300003568751325')
('Christen Hanson', null)
We can notice there are multiple rows with null
value for a personal_id
.
The reasoning behind this is as follows:
With a UNIQUE constraint
null
values are interpreted as unknown, therefore not breaking the rule of the uniqueness of the column.In other words, a
null
is theabscence of the value
rather than the value itself. It’s simply a placeholder for a possible value, which can be filled in later on.
DISTINCT
Now let’s try to get distinct values from personal_id
column in our table:
1
2
SELECT distinct personal_id
from customer;
What exactly Distinct is doing? As the name suggests, it returns distinct values, so any repeated value would be returned only once. Unlike for the UNIQUE constraint, from DISTINCT point of view, all null
values are the same. The reasoning behind this is that there is at least one unknown
value, rather than returning all unknown
values.
The result should not be surprising:
1
2
3
4
5
6
7
personal_id
(null)
173662838850888606
875820185015303160
459054575053219773
916300003568751325
625931342427148797
What you should take care of is how you interpret those results.
number of values in UNIQUE column is not always the same as number of values returned by DISTINCT on the same column. It depends on
null
values, as we’ve seen. If there is maximum onenull
value in the column, they are the same, otherwise they are not.