EXCEPT and EXCEPT ALL Operators
The EXCEPT operator returns the rows from the first query that are not present in the result of the second query. It is used to perform set difference operations between two result sets.
EXCEPT ALL returns all rows from the first query that are not present in the second query, including duplicates. This means if a row appears N times in the first query and M times in the second, it will appear max(N-M, 0) times in the result.
Note:
EXCEPT ALLis supported in Databricks, but not available in Vertica.
Syntax
query1 EXCEPT [ALL] query2
query1: The first SELECT statement.query2: The second SELECT statement.ALL(optional): If specified, preserves duplicate rows (Databricks only).
Parameters
Both
query1andquery2must return the same number of columns, and corresponding columns must have compatible data types.
Returns
Returns the set of rows from
query1that are not inquery2.If
ALLis specified (Databricks only), duplicate rows are preserved according to set difference semantics.Column names in the result are taken from the first query.
Limits / Notes
The number and types of columns must match in both queries.
EXCEPT ALLis not supported in Vertica. Attempting to use it in Vertica will result in a SQLValidation error.EXCEPTwithoutALLremoves duplicates from the result (set semantics).NULL values are treated as equal when comparing rows.
Examples
Basic EXCEPT usage
-- Given these tables: -- table_a: id | name -- 1 | 'Alice' -- 2 | 'Bob' -- 3 | 'Charlie' -- table_b: id | name -- 2 | 'Bob' -- 4 | 'David' > SELECT id, name FROM table_a EXCEPT SELECT id, name FROM table_b;
Result:
id | name 1 | Alice 3 | Charlie
EXCEPT with duplicates (both engines)
-- Given: -- orders_2023: customer_id | product_id -- 1 | 100 -- 1 | 100 -- duplicate -- 2 | 200 -- 3 | 300 -- orders_2024: customer_id | product_id -- 1 | 100 -- 4 | 400 > SELECT customer_id, product_id FROM orders_2023 EXCEPT SELECT customer_id, product_id FROM orders_2024;
Result:
customer_id | product_id 2 | 200 3 | 300
Note: The duplicate (1, 100) is removed in the result.
EXCEPT ALL usage (Databricks only)
-- Same data as above > SELECT customer_id, product_id FROM orders_2023 EXCEPT ALL SELECT customer_id, product_id FROM orders_2024;
Result in Databricks:
customer_id | product_id 1 | 100 -- One instance remains (2 in first - 1 in second = 1) 2 | 200 3 | 300
Result in Vertica: Syntax error - EXCEPT ALL is not supported.
EXCEPT with multiple columns
-- employees_dept_a: emp_id | name | salary -- 1 | 'John' | 50000 -- 2 | 'Jane' | 60000 -- 3 | 'Mike' | 55000 -- employees_dept_b: emp_id | name | salary -- 2 | 'Jane' | 60000 -- 4 | 'Sarah' | 65000 > SELECT emp_id, name, salary FROM employees_dept_a EXCEPT SELECT emp_id, name, salary FROM employees_dept_b;
Result:
emp_id | name | salary 1 | John | 50000 3 | Mike | 55000