There is a subtle difference in how SQL
DISTINCT reason about
NULL values. For
UNIQUE they are not the same, but for
DISTINCT they are 🤯
- UNIQUE constraint makes sure no values will be the same in that column except for
- DISTINCT on the column will return all possible values (including
null) exactly once.
Let’s say you have a very simple table
customer that has a primary key
id that is also auto-increment, and two more columns:
Furthermore, you know that
personal_id cannot be the same for any of the customers, so we’ll put the UNIQUE constraint on this column.
DDL for such a table (for Postgresql) would be like this:
1 2 3 4 5 create table customer ( id Serial primary key, name varchar(50), personal_id varchar(50) unique );
Now, let’s populate this table with some data:
1 2 3 4 5 6 7 8 9 10 INSERT INTO customer(name, personal_id) VALUES ('Darrel Beckett', '875820185015303160') ('Franklyn Paget', null) ('Russ Thompkins', '173662838850888606') ('Jackson Martinson', '459054575053219773') ('Yolanda Frost', null) ('Lindsay Granger', '625931342427148797') ('Corbin Steele', '916300003568751325') ('Christen Hanson', null)
We can notice there are multiple rows with
null value for a
The reasoning behind this is as follows:
With a UNIQUE constraint
nullvalues are interpreted as unknown, therefore not breaking the rule of the uniqueness of the column.
In other words, a
abscence of the valuerather than the value itself. It’s simply a placeholder for a possible value, which can be filled in later on.
Now let’s try to get distinct values from
personal_id column in our table:
1 2 SELECT distinct personal_id from customer;
What exactly Distinct is doing? As the name suggests, it returns distinct values, so any repeated value would be returned only once. Unlike for the UNIQUE constraint, from DISTINCT point of view, all
null values are the same. The reasoning behind this is that there is at least one
unknown value, rather than returning all
The result should not be surprising:
1 2 3 4 5 6 7 personal_id (null) 173662838850888606 875820185015303160 459054575053219773 916300003568751325 625931342427148797
What you should take care of is how you interpret those results.
number of values in UNIQUE column is not always the same as number of values returned by DISTINCT on the same column. It depends on
nullvalues, as we’ve seen. If there is maximum one
nullvalue in the column, they are the same, otherwise they are not.