Basic rule set for performant scripting
When optimizing your scripts, follow these rules:
Category | Rule | Comment |
---|---|---|
CREATE | Use views if they are not/seldom accessed in the transformations (e.g. views solely for the datamodel) | Further information: Table vs. View |
Use tables if many transformations access it (e.g. tmp table for activities) | fFurther information: Table vs. View | |
Combine tables if possible | Noo extra table with the same information, e.g. case table and tmp table | |
Order a table that is often joined by the columns used in this join | ||
OPERATORS | BETWEEN is better than AND | DATE≥1970 AND DATE≤1980 is slower than DATE BETWEEN 1970 AND 1980 |
Avoid using NOT clauses, reformulate the condition to be positive | NOT<1970 is slower than ≥1970 | |
SELECT | Avoid DISTINCT | further information: Usage of DISTINCT |
Reduce SELECT * to needed columns | ||
Avoid clean up by clean code | e.g., write else statements for activities or limit data in the joins sufficiently | |
JOINS | Avoid (nearly) unused joins | e.g., if only used for one column that 'might' be used in an analysis |
Tables should be inner joined in the order of ascending size | ||
Don’t modify columns in the join | e.g., don’t use right() or cast() in the join itself | |
Join columns should be grouped to their respective sides of an equation based on their source table | T.a+T.b+X.b= X.a is slower thanT.a+T.b = X.a-X.b | |
UNION | Use UNION ALL if possible | Using UNION ALL instead of UNION avoids DISTINCT operations in the background (further information about the usage of DISTINCT here) |
CONDITIONS | Move conditions to extraction if possible | e.g., if a table is only used once and filtered in that case |
Move WHERE-conditions directly into inner joins |