SQL DISTINCT

SQL DISTINCT

Introduction to SQL DISTINCT operator

To remove duplicate rows from a result set, you use the DISTINCT operator in the SELECT clause as follows:

SELECT DISTINCT
    column1, column2, ...
FROM
    table1;Code language: SQL (Structured Query Language) (sql)

If you use one column after the DISTINCT operator, the DISTINCT operator uses values in that column to evaluate duplicates.

If you use two or more columns, the DISTINCT will use the combination of values in those columns to evaluate the duplicate.

Note that the DISTINCT only removes the duplicate rows from the result set. It doesn’t delete duplicate rows in the table.

If you want to select two columns and remove duplicates in one column, you should use the GROUP BY clause instead.

SQL DISTINCT examples

We will use the employees table in the sample database to demonstrate how the DISTINCT operator works.

employees_table

1) Using SQL DISTINCT operator on one column example

The following statement selects the salary data from the salary column of the employees table and sorts them from high to low:

SELECT 
    salary
FROM
    employees
ORDER BY salary DESC;Code language: SQL (Structured Query Language) (sql)

Try It

+----------+
| salary   |
+----------+
| 24000.00 |
| 17000.00 |
| 17000.00 |
| 14000.00 |
| 13500.00 |
| 13000.00 |
| 12000.00 |
| 12000.00 |
| 11000.00 |
| 10000.00 |
|  9000.00 |
|  9000.00 |
...

The result set has some duplicates. For example, 17000, 12000, and 9000.

The following statement uses the DISTINCT operator to select unique values from the salary column of the employees table:

SELECT 
    DISTINCT salary
FROM
    employees
ORDER BY salary DESC;Code language: SQL (Structured Query Language) (sql)

Try It

+----------+
| salary   |
+----------+
| 24000.00 |
| 17000.00 |
| 14000.00 |
| 13500.00 |
| 13000.00 |
| 12000.00 |
| 11000.00 |
| 10000.00 |
|  9000.00 |Code language: plaintext (plaintext)

As you can see, the result set doesn’t contain any duplicate salary values.

2) Using SQL DISTINCT operator on multiple columns example

The following statement selects the job id and salary from the employees table:

SELECT
	job_id,
	salary
FROM
	employees
ORDER BY
	job_id,
	salary DESC;Code language: SQL (Structured Query Language) (sql)

Try It

+--------+----------+
| job_id | salary   |
+--------+----------+
|      1 |  8300.00 |
|      2 | 12000.00 |
|      3 |  4400.00 |
|      4 | 24000.00 |
|      5 | 17000.00 |
|      5 | 17000.00 |
|      6 |  9000.00 |
|      6 |  8200.00 |
...Code language: plaintext (plaintext)

The result set has some duplicate rows e.g., job id 5 salary 17000. It means that there are two employees with the same job id and salary.

The following statement uses the DISTINCT operator to remove the duplicate values in job id and salary:

SELECT DISTINCT
	job_id,
	salary
FROM
	employees
ORDER BY
	job_id,
	salary DESC;Code language: SQL (Structured Query Language) (sql)

Try It

+--------+----------+
| job_id | salary   |
+--------+----------+
|      1 |  8300.00 |
|      2 | 12000.00 |
|      3 |  4400.00 |
|      4 | 24000.00 |
|      5 | 17000.00 |
|      6 |  9000.00 |
|      6 |  8200.00 |
...Code language: plaintext (plaintext)

Note that you still see the duplicate in the job_id column because the DISTINCT operator uses values from both job_id and salary to evaluate the duplicate, not just values in the job_id column.

SQL DISTINCT and NULL

In the database, NULL means unknown or missing data.

Unlike values like numbers, strings, dates, etc. NULL does not equal anything, even itself. The following expression will return unknown (or NULL):

NULL=NULLCode language: PHP (php)

Typically, the DISTINCT operator treats all NULL the same. Therefore, the DISTINCT operator keeps only one NULL in the result set.

Note that this behavior may be different between database products.

For example, the following statement returns the distinct phone numbers of employees:

SELECT DISTINCT phone_number
FROM employees
ORDER BY phone_number;Code language: SQL (Structured Query Language) (sql)

Try It

+--------------+
| phone_number |
+--------------+
| NULL         |
| 515.123.4444 |
| 515.123.4567 |
| 515.123.4568 |
| 515.123.4569 |
| 515.123.5555 |
...Code language: plaintext (plaintext)

Notice that the query returns only one NULL in the result set.

Summary

  • Use DISTINCT operator in the SELECT clause to remove duplicate rows from the result set.

Popular posts from this blog

SQL Syntax

SQL FETCH

SQL LIMIT