SQL best practice checklist

When managing your SQL, use the following best practice checklist:

Category	Rule	Comment
CREATE TABLE/VIEW	Use Views if they are not/seldom accessed in the transformations	The use of views should be limited to DM tables only. For large data model tables with complex definitions (e.g. multiple joins) a table may be created for DM instead, if DM load runtimes are longer than expected.
	Use Tables if many transformations access it	Temporary tables (especially large tables) should not be created as views when used by subsequent transformations.
	Combine tables if possible	Avoid extra tables that carry the same data, e.g. case table and TMP table
	Order a table that is often joined by the columns used in this JOIN and segment by one	This will sort the default projection by those columns and will improve performance of subsequent queries that use the table
	Ensure Table Statistics are refreshed for optimal performance of queries.	Run SELECT ANALYZE_STATISTICS(‘TABLENAME’); after the creation of every temporary/custom table.
	Field sizes are significant from a performance aspect. Review your tableschemas and reduce the size of the fields if it is not used.	Example: A reduction from VARCHAR(200) to VARCHAR(20) might have a significantperformance impact.
SELECT	Avoid DISTINCT	Consider whether a better designed query logic makes the use of DISTINCT unnecessary.
	Reduce SELECT * to needed columns	The fewer columns carried over (in temp tables or DM tables), the more performant the pipeline will be.
	Avoid (nearly) unused JOINS	e.g. if only used for one column that 'might' be used in an analysis
JOINS	Tables should be INNER JOINED in the order of ascending size	The sorted hash table needed for the INNER join will thus be created based on the smaller table. If Table Statistics have been properly refreshed, attention to the join order is not required.
	JOINS fields > 500 characters are significantly slower than JOINS on fields < 500 characters.	Make sure to optimize the field size, example: VARCHAR(60) offers better performance than using VARCHAR(600).
	Avoid INNER JOIN with the purpose of limiting the data set	using WHERE EXISTS in such cases may help to avoid duplications of records in the resulting data set that a JOIN operation would cause, forcing the user to resort to a use of DISTINCT, negatively affecting performance
	Don’t modify columns with functions in the join	e.g. avoid the use of functions such as substring(), right() or cast() in JOIN conditions
UNION	Move conditions to extraction if possible	If same conditions are applied on a table throughout the pipeline, limiting the table contents during extraction will improve runtimes throughout the pipeline, as well as provide APC savings
UNION	Use UNION ALL if possible	Using UNION ALL instead of UNION avoids DISTINCT operations in the background
CONDITIONS	Move WHERE-conditions directly into inner joins	Any WHERE conditions that are applied on tables that are part of INNER JOINS of a query can be incorporated in the JOIN conditions instead.
CONDITIONS	BETWEEN is better than AND	DATE≥1970 AND DATE≤1980 is slower than DATE BETWEEN 1970 AND 1980

SQL best practice checklist

Search results