Performance estimation
To estimate the performance of a query in Vertica, you can either check the run time or the estimated cost.
Cost (before executing a query): If you state 'explain' before any select/update statement, the query output will show you its 'Access Path'.This 'Access Path'shows the query plan, i.e., the steps to be taken to execute the query with estimated performance costs. This estimation is rather rough and does not necessarily tell you something about the run time of the query.
Exemplary explain query
explain SELECT "LFA1"."VBUND" , "LFA1"."MANDT" , "LFA1"."NAME1" , "LFA1"."LIFNR" , "LFA1"."XCPDK" , "LFA1"."XZEMP" , "LFA1"."MCOD3" , "LFA1"."LAND1" , "LFA1"."ORT01" , "LFA1"."KUNNR" FROM "LFA1" Result: Access Path: +-STORAGE ACCESS for LFA1 [Cost: 272K, Rows: 676K (NO STATISTICS)] (PATH ID: 1)| Projection: 6a1d6a1d-cdeb-4b02-8618-046c39fbdc91_47a9807f-d369-4263-b554-eb160fd8e7b7._CELONIS_TMP_LFA1_v1_b0| Materialize: LFA1.MANDT, LFA1.LIFNR, LFA1.LAND1, LFA1.NAME1, LFA1.ORT01, LFA1.MCOD3, LFA1.KUNNR, LFA1.XCPDK, LFA1.XZEMP, LFA1.VBUND| Execute on: All Nodes
Run time (after executing a query): After running a query, you can either check the run time via the Data Integration logs or you can run the following statement. The result shows you the execution time in seconds for every query that started within the time frame you specified in the where condition.
Check run time
SELECT DATE_TRUNC('second',query_start::TIMESTAMP) as query_start, session_id , transaction_id, statement_id, node_name, LEFT(query,100), ROUND((query_duration_us/1000000)::NUMERIC(10,3),3) duration_sec FROM query_profiles WHERE query_start BETWEEN '2020-01-01 01:00:00' AND '2020-01-09 13:00:00' ORDER BY duration_sec DESC;
Varying run times: As the performance of a query in the Celonis Platform depends on the load of the cluster, it is recommended to execute the query several times and take the average as estimate on how long the run-time is (as visible in the query below). If you know how often you executed the query, you can indicate the quantity in the HAVING COUNT(*) statement to easily find yours. Otherwise, just comment-out this line.
Check run time average
SELECT avg(ROUND((query_duration_us/1000000)::NUMERIC(10,3),3)) AS avg_duration_sec, min(ROUND((query_duration_us/1000000)::NUMERIC(10,3),3)) AS min_duration_sec, max(ROUND((query_duration_us/1000000)::NUMERIC(10,3),3)) AS max_duration_sec, query FROM query_profiles WHERE query_start BETWEEN '2019-11-18 15:15:37' AND '2019-11-18 19:37:26' GROUP BY query HAVING COUNT(*) = '3' ORDER BY avg_duration_sec DESC;