SQL troubleshooting
If you're experiencing SQL issues with your transformation tasks, the following troubleshooting advice is available:
Delta extractions execute the merging of the delta chunks to their respective tables in the background. This process executes a DELETE statement to ensure that records that exist both in the table and the new delta chunk are cleared from the main table before being inserted, to avoid duplicates. In larger tables, and in cases where the delta chunk is too large, this operation may become too costly and does not complete in the maximum time allocated.
Suggested solutions
Always run an initial full extraction of a table before proceeding with scheduling/running any delta extractions on that table.
Ensure that delta extractions are scheduled to run frequently (at least once a day), so as to avoid the background operations being run on larger delta chunks.
Avoid running delta extraction tasks for tables that have not recently been extracted (>2 days), either fully or by delta.
Error may appear in queries with multiple joined tables, specifically USR02. The table joins are performed in such an order that the memory needed for the execution exceeds the allocated/available resources.
Suggested solutions
Users need to ensure that all temporary join tables used by the query have Table Statistics refreshed by including the following commands prior to the failing query or at the transformation point that defines each table:
SELECT ANALYZE_STATISTICS(‘TABLE_NAME’);
Source tables that have been extracted through Celonis will have Table Statistics refreshed during their extraction.
The data model load is invoking the materialization of large and costly/complex views or the the size of the tables is too big.
Suggested solutions:
Identify data model views that involve large source tables (e.g. VBAP). As these are defined as views, their definition script needs to run every time they are invoked (e.g. during DM Load).
Check whether these views are using other views in their definition script. Invoking a view that uses a view is a very inefficient way of creating a data set, as sorted projections and statistics are not being used.
Consider whether some of these larger data model views (or the views they are using) can be defined as tables instead.
Consider using staging tables as described in our Vertica SQL Performance Optimization Guide. By using a view to bring together original table fields along with a staging table containing calculated fields, the performance of the DM load will improve and lower APC consumption can also be achieved.
Ensure that all Data Model View definitions are properly optimized based on the guidance of the Vertica SQL Performance Optimization Guide .
Consider limiting the number of columns in a Data Model View/Table. Is a SELECT * from a transactional table necessary or can the script be limited to only dimensions needed for the analyses?
The error may appear in transformations that are performing de-duplication using the ROW_NUMBER function. Usually, in the first step, the table is created and contains the NUM column (used for deleting the duplicates). After deleting the duplicates, when you try to drop the NUM table, you might get the error, due to a Vertica projection dependency.
Suggested solution:
The solution for this is to place the UNSEGMENTED ALL NODES in all statements that are creating tables with Row number column generated by ROW_NUMBER function. That would ensure that the RowNumber column is not part of segmentation and can be dropped.
This error may appear when a set of queries including an ANALYZE_STATISTICS() command is executed within the Transformation Workbench but not when run through a Data Job execution.
Suggested solution
When running ANALYZE_STATISTICS() in the workbench, it has to be an independent operation. Instead of selecting all statements and executing them at the same time, users should execute the ANALYZE_STATISTICS() statement on its own.
For example:
DROP and CREATE table
Run SELECT ANALYZE_STATISTICS(‘TABLE_NAME’);
The failing transformation is performing a CREATE TABLE statement with a large resulting data set and/or a complex script (ie. multiple joins on large tables, expensive commands). Either that or the failing transformation is performing an UPDATE statement on a large table and/or
Suggested solution
Consider executing the query in smaller parts/chunks. For limiting the query through a condition, we suggest that users select a field that appears towards the beginning of the table field list (ie. the table projection is sorted by that field) and whose values are close to evenly distributed across the data set. Such fields may be: date, document type.
CREATE the table by limiting the query to a smaller data chunk.
Run additional INSERT statements to the new table, each filtering for a different chunk.
For limiting the query through a condition, we suggest that users select a field that appears towards the beginning of the table field list (ie. the table projection is sorted by that field) and whose values are close to evenly distributed across the data set. Such fields may be: date, document type.