Creating IDs for objects and events

The IDs that you use in the ID field for objects and events must be unique. If they aren't, you might end up with inaccurate results for metrics and KPIs, such as an incorrect count of events for an event type. Look out for warning messages about this when you load a perspective with your built objects and events (using the Load Data Model button in the data pool). We're soon going to start enforcing this by failing the perspective load if there are any duplicate IDs in an event type.

In your transformations, the best practice to create the ID is to concatenate the name of the object type or event type with the primary key field (or fields) from the source data, and use the result as the ID. Use single quotes around the name of the object type or event type to identify it as a string. For example:

'Report_' || "Report ID" AS "ID"

When concatenating IDs, it is recommended that you use delimiters such as “::” to enable better parsing of that data. For example:

SELECT <%=sourceSystem%> || '::' || "EKKO"."MANDT" || '::' || "EKKO"."EBELN"
   AS "ID",

The query engine uses object and event IDs for partitioning. To optimize performance during data retrieval, your IDs should be meaningful and include expressions found in other columns and tables. That helps the query engine relate the columns to each other in order to produce more specific partitions. If the source system data uses random UUIDs in the ID column, you should avoid using that column by itself and instead create a new ID column by concatenating it with columns that are present in other tables too.

Important

Though meaningful IDs produce useful partitions, try to keep the overall length of the ID as short as you can - don’t include every possible expression. The ID length has a performance impact for running transformations, not only for creating the objects and events themselves, but also when the ID is used as a foreign key in relationship columns and tables for other objects and events.

If you’re using data from more than one source system to create objects and events, you might have duplicated identifiers across the source systems. You can handle this by concatenating a unique name or identifier for each source system instance as part of your event ID. The Celonis catalog transformations use a local parameter sourceSystem, the value of which is prepended to the ID for an object type or event type, for example:

SELECT <%=sourceSystem%> || 'Contract_' || "EKKO"."MANDT" || "EKKO"."EBELN"
  AS "ID",

If you have multiple copies of the Celonis catalog transformations for the same process for different source system connections, set the sourceSystem parameter to a unique name for each source system instance. For custom transformations, you can do the same thing with your IDs, either using a local parameter or hard-coding a source system name.

Important

Though the source system names need to be unique, remember to keep the names you use as short as possible, to minimize the overall length of the ID string. You don't need to use the full name of the system - a single letter will do as long as it's different.

You might still see duplicate IDs if you have either of these situations:

Duplicate records in the source system data.
Including the same data in more than one way.

If you think the cause might be duplicate records in your source system data, Troubleshooting perspectives includes a method to check this. If you can't fix the problem in the source system, you can use filtering during data extraction to remove unwanted duplicate records, or remove the duplicates in pre-processing. See Troubleshooting data extraction and pre-processing for instructions on how to set up a pre-processing stage.

Including the same data in more than one way can happen if, for example:

You include the same source system data set in two different sets of pre-transformed data, and use them both in transformations to create events.
You take data for an attribute of an object type from a different data table, and inadvertently create duplicates during the join - maybe by populating the attribute from the row level data rather than the header data.

Situations like this create a representation of the same real event more than once in the system. This isn't correct, and you should change the setup of your transformations so it doesn't happen - for example, by adding a filter condition. For the instructions to work with the transformation editor, see Creating custom transformations.

Creating IDs for objects and events

Search results