Understanding Differences Between First-Generation and Premium Process Query Engine Behavior
Description
As part of our efforts to enable scalability and support for more complex, data-intensive processes, we are transitioning from the first-generation engine to a new engine architecture. The Premium Process Query Engine lays the groundwork for handling larger data volumes while maintaining reliable performance and responsiveness.
This shift may lead to differences in query results due to updated logic and computation methods that align with modern standards and ensure optimal performance. This document outlines the most significant and commonly encountered differences, along with guidance on how to validate results and adjust queries when needed to match the behavior of the previous engine. Also deprecated behavior from the first-generation engine announced in the documentation will not be implemented in the premium engine.
Please note that other, more subtle differences may exist and are not exhaustively covered here.
Difference in NOT
Filter Behavior for NULL
values
Description
The behavior of the NOT
operator in filters differs due to how each engine handles NULL
values in logical expressions.
First-generation Engine | Premium Engine | |
---|---|---|
Logic Behaviour | Two-valued logic (TRUE or FALSE ) | Three-valued logic (TRUE , FALSE , or UNKNOWN/NULL )* |
NULL in conditions | Treated as FALSE | Evaluated as UNKNOWN (excluded when negated) |
Effect of NOT | NULL values are included in result | NULL values are excluded from result |
*Except for operators IN
and MULTI_IN
.
Example
Query |
---|
FILTER NOT ("mytable"."id" = 1); |
First-generation engine: Returns all rows where
id
is not 1, including rows whereid
isNULL
.Premium Engine: Returns only rows where
id
is not 1, but excludes rows whereid
isNULL
.
Recommendation
If your data may include NULL
s and you want to preserve first-generation engine behavior, adjust the filter to explicitly handle them:
Query |
---|
FILTER ( "mytable"."id" IS NULL OR "mytable"."id" != 1 ); |
This explicitly includes rows where id
is NULL
or not equal to 1
, maintaining compatibility with the previous logic.
Difference in QUANTILE
/ MEDIAN
Approximation
Description
In the first-generation engine, all aggregations - including MEDIAN
and QUANTILE
- were always computed using exact methods.
In contrast, the Premium Engine performs exact aggregations by default for all operators except MEDIAN
and QUANTILE
, which use approximate algorithms to improve scalability and performance. While this enhances speed, it may lead to minor deviations from exact results, particularly on small or edge-case datasets.
To maintain full control, the new engine provides the ability to explicitly choose between exact and approximate modes, allowing users to balance accuracy and performance based on their specific use case.
Example
MEDIAN
(approximate) -> uses QUANTILE 0.5
, which uses APPROX
Dataset: [1, 1, 2, 3, 4, 5]
Quantile: 0.5
( = MEDIAN
)
First-generation engine: 3
Premium Engine: (APPROX, default): 2
Premium Engine: (EXACT): 3
Recommendation
By default, the APPROX
operator is used for both functions. If you require the EXACT
logic for cases where precision is critical - particularly for KPIs that depend on statistical accuracy - please contact our support team. We will be happy to assist you.
Difference in Default Sorting
Description
To ensure high performance with large datasets, the Premium Engine does not apply a default sorting to queries unless explicitly specified by the user with an ORDER BY
clause. This aligns with standard industry practices. The first-generation engine may have provided a seemingly consistent default order, but this behavior was never guaranteed and should not be relied upon. This can lead to different results between the two engines, particularly with functions that rely on order to determine the result, such as FIRST
or LAST
.
Example
Query |
---|
STRING_AGG ( "CASE_TABLE"."ACTIVITY_ID", ', ' ) |
First-generation engine: May return a result like "123, 456, 789"
Premium Engine: May return a different, non-deterministic result like "456, 789, 123" because there is no defined order.
Recommendation
To ensure consistent and predictable results, you should always apply an explicit ORDER BY
clause in your queries, especially when using functions where the order of data is critical. This includes operators like FIRST, LAST, STRING_AGG, PU_FIRST, PU_LAST, PU_STRING_AGG, and all window functions.
Difference in Sorting of Eventlogs
Description
Similar to the previous difference, when querying unaggregated columns from an eventlog, such as in an OLAP table component, the Premium Engine does not apply any default sorting. The order of activities is not guaranteed, which can lead to mixed-up cases or an illogical sequence of events.
Example
Consider an OLAP table showing case id
and activity names
. Without an explicit sorting applied, there is no guarantee that the activities will be displayed in their chronological order based on timestamps. Furthermore, it might also happen that different cases are mixed up.
Recommendation
To view activity sequences correctly within cases, you should:
Always add both the case id and timestamp columns to your view.
Explicitly sort by
case id
first, and then bytimestamp
second.For performance, it is highly recommended to filter on the specific cases you need, as sorting the entire event log can be computationally expensive.
Difference in Allowance of Duplicated Primary Keys
Description
The First-generation engine was more permissive towards duplicate values in primary key columns. To support advanced functionalities like Augmented Attributes & Tasks and ensure delta readiness for data models, the Premium Engine strictly enforces the uniqueness of primary keys directly during the load.
Recommendation
If you encounter a primary key duplication error when loading a data model, you must go back to your data transformations and modify the necessary SQL queries to remove the duplicates. Once the transformations are corrected, reload the data model.