Activity table sorting
Description
This article gives an overview of how to define sorting in the activity table.
Sorting column
An activity table consists of at least a case id, activity, and timestamp column. A sorting column is optional. You need the sorting column if two activities inside a case have the same timestamp. Then the order of the activities is determined by the sorting column. The activity with the lower sorting value is regarded as the activity which happens first.
Example
[1] In this example, the timestamps of activities B and C are equal. Those activities are therefore ordered by their sorting value, which is why C is regarded to happen before B: | |||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||
|
Sorting is important. This might seem like a negligible corner case, but it's actually very important. If two activities belong to the same case and have the same timestamp without any configured sorting, the resulting process graph is undefined.
Without setting the sorting column in the example above, this can result in the following process variants:
A → B → C → D
A → C → B → D
The process variant that's used can vary for each reload. That affects all process-based operators, including Variant and Process Explorer. By using filters based on process edges, it's possible that all KPIs can be affected and become unstable.
If you encounter unexpected results, you should first ensure that, if available, the sorting column is configured in the data model editor. Then, check whether each activity type has a unique sorting id. You may have a well-defined process model even though activity types share the same sorting ids. However, this is much harder to verify.
Day-based activities
Some activity types don't have an exact timestamp because the information is not available. There can be several reasons for this. For example, it happens when the information is not stored in the source system. We call such activities day-based activities. All other activities are named timestamp-based activities. Timestamp-based activities are the default. Day-based activities happen only occasionally.
To identify day-based activity types, we examine all the activities of an activity type and check whether time information is present across the complete eventlog. A day-based activity is an activity that always happens precisely at midnight. In theory, this could lead to a false positive. However, the chances that an activity type always takes place precisely at midnight is negligible.
Example
[2] In this example, activity type A is a day-based activity. In the whole eventlog, A always takes place at midnight. Activity type B is a timestamp-based activity. Activity B does indeed take place at midnight in case 1, however, it has a different timestamp in case 2. As there is no sorting column defined, there is no special treatment for day-based activities. | ||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||
|
Ordering of day-based activities
Day-based activities don't have information about the time of the day that an activity takes place. We have to rely on the sorting column (if it's available) to order day-based activities and timestamp-based activities which happen on the same day. As a result, the timestamps of day-based activities need to be changed. The following example explains this behavior in detail:
Example
[3] Here activity A is a day-based activity. B, C and D also happened on the same day, but they have exact timestamps (timestamp-based activities). Now, Celonis relies on the sorting column to place activity A in the right position. It tries to move activity A further down the process variant, as long as its sorting value is larger than the sorting value of the following activity. In the given example, Celonis would perform the following steps:
Now the day-based activity is ordered according to its sorting value. However, now a calculation of a throughput time between B and A would result in a negative throughput time. To fix this, we assign the timestamp of B to A and obtain the result activity table below: | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
|
Special case - Day-based activity remains the first activity of the day
If the position of a day-based activity doesn't change because it isn't followed by an activity with a lower sorting value, the day-based activity takes the timestamp of the following activity on the same day, if available.
Example
[4] Here, activity A is a day-based activity. It cannot be moved down as its sorting value is already smaller than the sorting value of the following activity B. Therefore, we assign the timestamp of the following activity B to A: | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
|