Skip to main content

EXCEPT and EXCEPT ALL Operators

The EXCEPT operator returns the rows from the first query that are not present in the result of the second query. It is used to perform set difference operations between two result sets.

EXCEPT ALL returns all rows from the first query that are not present in the second query, including duplicates. This means if a row appears N times in the first query and M times in the second, it will appear max(N-M, 0) times in the result.

Note: EXCEPT ALL is supported in Databricks, but not available in Vertica.

Syntax

query1
EXCEPT [ALL]
query2
  • query1: The first SELECT statement.

  • query2: The second SELECT statement.

  • ALL (optional): If specified, preserves duplicate rows (Databricks only).

Parameters

  • Both query1 and query2 must return the same number of columns, and corresponding columns must have compatible data types.

Returns

  • Returns the set of rows from query1 that are not in query2.

  • If ALL is specified (Databricks only), duplicate rows are preserved according to set difference semantics.

  • Column names in the result are taken from the first query.

Limits / Notes

  • The number and types of columns must match in both queries.

  • EXCEPT ALL is not supported in Vertica. Attempting to use it in Vertica will result in a SQLValidation error.

  • EXCEPT without ALL removes duplicates from the result (set semantics).

  • NULL values are treated as equal when comparing rows.

Examples

Basic EXCEPT usage

-- Given these tables:
-- table_a: id | name
--          1  | 'Alice'
--          2  | 'Bob'  
--          3  | 'Charlie'

-- table_b: id | name
--          2  | 'Bob'
--          4  | 'David'

> SELECT id, name FROM table_a
  EXCEPT
  SELECT id, name FROM table_b;

Result:

id | name
1  | Alice
3  | Charlie

EXCEPT with duplicates (both engines)

-- Given:
-- orders_2023: customer_id | product_id
--              1          | 100
--              1          | 100  -- duplicate
--              2          | 200
--              3          | 300

-- orders_2024: customer_id | product_id  
--              1          | 100
--              4          | 400

> SELECT customer_id, product_id FROM orders_2023
  EXCEPT
  SELECT customer_id, product_id FROM orders_2024;

Result:

customer_id | product_id
2           | 200
3           | 300

Note: The duplicate (1, 100) is removed in the result.

EXCEPT ALL usage (Databricks only)

-- Same data as above
> SELECT customer_id, product_id FROM orders_2023
  EXCEPT ALL
  SELECT customer_id, product_id FROM orders_2024;

Result in Databricks:

customer_id | product_id
1           | 100        -- One instance remains (2 in first - 1 in second = 1)
2           | 200
3           | 300

Result in Vertica: Syntax error - EXCEPT ALL is not supported.

EXCEPT with multiple columns

-- employees_dept_a: emp_id | name     | salary
--                   1     | 'John'   | 50000
--                   2     | 'Jane'   | 60000
--                   3     | 'Mike'   | 55000

-- employees_dept_b: emp_id | name     | salary
--                   2     | 'Jane'   | 60000
--                   4     | 'Sarah'  | 65000

> SELECT emp_id, name, salary FROM employees_dept_a
  EXCEPT  
  SELECT emp_id, name, salary FROM employees_dept_b;

Result:

emp_id | name | salary
1      | John | 50000
3      | Mike | 55000