Skip to main content

Celonis Product Documentation

Creating custom transformations

For custom object types and event types, and for custom attributes and relationships that you extend Celonis object types with, you need to build SQL transformations to map the relevant data extracted from your business system into the Celonis database tables. You’ll also need to do this for the Celonis object types and event types for your enabled processes if your source system data isn’t covered by the supplied transformations.

You can also overwrite part or all of a Celonis-supplied transformation with your own custom SQL query. For example, if you don’t have business data for a particular column, you can create a partial overwrite to set that attribute of the object to NULL. Or if you need to change table joins or add filtering for your data, you can create a full overwrite to replace the transformation for an object or event with your own SQL query.

Data extractions and transformations for object-centric process mining has an overview of how the transformations for object-centric process mining work, where they are stored in the OCPM Data Pool, and the naming convention for the tables in the Celonis database.

You can only create a transformation for an object type or event type after you’ve created the OCPM Data Pool and connected it to at least one source system. It’s best to also extract your data before you start creating the transformations, so that you can preview the transformations with your data to make sure they are working. For the steps to provision the OCPM Data Pool, create a data connection, and extract your data, see Quickstart: Extract and transform your data into objects and events.

Tip

Before you start working with custom object types, event types, relationships, and transformations, we recommend you complete the Celonis Academy training track Implement Object-Centric Process Mining, which teaches you how to model objects and events. The course OCPM in Action - Build Object-Centric Data Models (2 hours) has step by step examples for you to work through.

In the SQL editor, you can view and create scripts in these categories:

  • Attribute scripts populate the tables of objects and events, which contain their attributes. For an object type, you populate its table using the data you’ve extracted from your source system tables. For an event type, you can choose to use the extracted business data, or use the transformed data from object type tables and change tables.

  • Change scripts populate the tables of changes to the attributes of objects. Change scripts aren't available for event types, as only object types have change tables. Change tables always have the same set of attributes. You can only populate a change table if you have extracted data that captures changes to your objects, such as a change log or audit log. If you don’t have this data, don’t add a change script.

  • Relationship scripts populate the relationship tables for the relationships between object types and event types. You only need a separate relationship script when there’s a many to many relationship that’s implemented on this object type or event type, so there’s a dedicated relationship table to fill. If the object type or event type doesn’t have any relationships like that, you won’t see a category for these scripts. Relationships between objects and events explains how relationships work.

Important

The SQL dialect you need to use in transformations for the Celonis Platform is Vertica SQL syntax. Using Vertica SQL syntax has Vertica recommendations for Celonis, and here's the official documentation:  docs.vertica.com.

To keep the SQL transformations as generic as possible, the SQL editor in the Objects and Events environment only supports a subset of functions and operators of ANSI SQL, and not all Vertica functions are enabled. See Supported functions for object-centric transformations for the list of supported functions that you can use. It's different to the list for case-centric transformations. The topic includes advice on how to replace Vertica-specific functions with suitable alternatives.

The OCDM Schema (which stands for object-centric data model) is the database in the OCPM Data Pool that contains the tables with the transformed data for your objects and events. See Object-centric process mining database tables for more on the structure of the OCDM Schema.

  • For object types and changes, you map columns in your extracted business data to the attributes of the objects and changes. In the OCDM Schema, each row in the object type tables (with an o_ prefix) and the change tables (with a c_ prefix) holds a single object or change. 

  • For event types, you can use the transformed data in the object type tables and change tables as a data source for your events, if you want to. Or you can map your extracted business data directly to the attributes of event types. Either way, the data goes into the event type tables (with an e_ prefix). Each row in the event type tables holds a single event.

  • For a one to one or one to many relationship, the relationship data is held in a foreign key column in the object type table or event type table. In this case, you populate the column as part of the attribute script. The column is named after the relationship, and you populate it with the ID column of the Celonis database table for the target object type.

  • For a many to many relationship where you’re implementing the transformation on this object type or event type, the relationship data is held in a relationship table with the relationship name. Populate this table with the ID of the object or event where this relationship is implemented (in the first column), and the ID of the target object (in the second column). Because a relationship table needs data from two object type or event type tables, you’ll need to use a left join (JOIN clause) to bring in data from both of the tables.

Follow these steps to use the SQL editor to create a custom transformation or overwrite a Celonis transformation. In the instructions, we’ll assume that you’ve extracted the data from your source system into the OCPM Data Pool. If you have any problems, check the solutions in Troubleshooting transformations.

  1. From the Celonis navigation bar, select Data > Objects and Events.

  2. To see the full list of transformations, select Transformations in the top navigation bar. You can search the list, sort it, and filter it by namespace (Celonis or custom), object type or event type, data source, and whether overwrites are present. Click Create transformation to add a new transformation.

  3. To start with an object type or event type that needs a transformation, select Objects or Events in the top navigation bar. From here, you can search and browse your object types or event types.

    1. Select any object type or event type to see its details.

    2. Click the Transformations icon to see the list of the transformations that the object type or event type already has.

    3. Select the transformation you want to edit, or click Add to add a new transformation. If the object type or event type doesn’t have any transformations yet, click Add Transformations to get started.

  4. Choose the data source that’s supplying the data for this transformation, and click Next.

    Tip

    For event types, it's best practice to use the transformed data from the object type and change tables as the data for creating events. This data is in the OCDM Schema. It's also possible to use your extracted business data as the data source, like with object types. Using the transformed data is the recommended approach, though, because it makes event to object relationships and event attributes system agnostic and easier to debug.

  5. In the SQL editor, for a new transformation, we’ve auto-generated a name for it - click the name if you want to edit it. For a Celonis transformation, you can’t change the name.

  6. To add a new custom script for attributes, relationships, or changes, click the plus sign next to any script category in the left navigation. Type a name for the script and select the check button to create it.

    Important

    If you’re populating custom attributes or relationships that you’ve added to a Celonis object type, each extension script needs to handle all of the custom attributes and relationships that you added for the object type. You can’t use one extension script to populate one custom attribute, and another to populate a second custom attribute.

    You can use different extension scripts for data from different source systems - for example, if you have contracts in two different source systems. But you can’t have more than one extension script contributing to the same instance of the object - so for a contract, you couldn’t populate the discount from one source system, and the name of the signatory from another. Each extension script must cover all the custom attributes and relationships for the Contract object type.

  7. To overwrite all or part of a Celonis script, click the context menu (the three vertical dots) and select Partial overwrite or Full overwrite. For a partial overwrite, choose the items you want to overwrite. Then click Create Overwrite to create the overwrite script.

    Tip

    You don't need to create an overwrite script if:

    • There's a suitable local parameter in the Celonis script that you can change, such as the source system name. In that case, change the value and it'll be included when you publish the transformations.

    • You don't need to use this object type or event type at all, or you don’t have data for it, or you're going to build it with data from a set of transformations for another source system. In that case, disable the whole transformation using the toggle next to its name in the lists in the Objects and Events environment. You can re-enable the transformation at any time if you do need it in the future. Be aware that if you disable a transformation that you previously ran, the data for that object type or event type is removed from the table when you run the data job.

  8. Each custom script or overwrite script that you create appears in a new tab, with a pregenerated template that you can use to help build your SQL statement. In each script, you need to build a SELECT statement that transforms the columns from the source table containing the data, into the columns of your object type table, event type table, change table, or relationship table. Here’s some tips to help you use the SQL editor:

    • The pregenerated template includes the attributes you need to map for this object type, event type, relationship, or change table. To use the template, remove the comment markings at the beginning (/*) and end (*/).

    • We show you the attributes that you still need to map in the Preview window along with their data types. A black key symbol means the attribute is the primary key for the table, and a white key symbol means the attribute is a foreign key for a relationship.

    • If you find one or more attributes that you didn’t model but would be useful to have, you can add them now. Select the Add attributes button in the preview pane in the SQL editor to open a new window. Click Add to add each new attribute, name it, and pick a data type. When you save your new attributes, we’ll update the object type or event type, and warn you if other transformations for it besides the one you’re editing might be affected by your changes.

    • The IDs you use in the ID field for objects and events must be unique, otherwise you might end up with inaccurate results for metrics and KPIs. Creating IDs for objects and events at the end of this topic tells you how to create unique and high-performance IDs for objects and events.

    • Not all Vertica functions are enabled for transformations. See Supported functions for object-centric transformations for the list of supported functions that you can use. It's different to the list for case-centric transformations.

    • Use the Data Source explorer in the left navigation pane to search for a table in your selected data source, and see its columns. When you hover over a column name you’ll see an Insert button - click it to add the column to your script.

    • For event types, the OCDM Schema is available as a data source in the Data Source explorer if you want to use the transformed data from the object type and change tables. Or you can populate an event type table using your extracted business data directly.

    • You can type ahead to get autocomplete options for table and column names from your selected data source. You don’t need to specify the schema in your script.

    • You can use NULL in place of a data source column name if you don’t want to map a table attribute right now, or at all.

    • Use double quotes around column names from a source system data table, so that any spaces, special characters, and reserved words get handled correctly. You shouldn’t usually need to quote column names in Celonis database tables, unless they’re reserved words (for example, “Time”).

    • To create and use local parameters in your script, select Parameters in the left navigation pane. Click the + button to add a new key-value pair, and the check mark to save it. When you hover over a parameter name you’ll see an Insert button - click it to add the parameter to your script. The parameters are local to your script and don’t need to be unique.

    • To materialize common table expressions, click the context menu (the three vertical dots) next to a script name in the left navigation pane, and select Advanced Options. Click the Enable materialization slider then the Apply  button. Now you can create temporary result sets that you can reference as often as you need.

    • Click the Save Script button to save and validate an individual script at any time.

    • Click the Preview Script button to run your SQL query against a sample of your real data from the data source, and check that the attributes are mapped as you expect.

  9. If you need to rename or delete a custom script you already made, click the context menu (the three vertical dots) and select Rename or Delete.

  10. When you’ve completed and previewed all your scripts, click the Save button at the top right to save the whole transformation.

  11. Select Cancel or use the breadcrumbs to exit the SQL editor and go back to Objects and Events.

  12. Select Publish > Publish to development to publish your new or edited transformation scripts to the development environment.

When you publish your transformations, Celonis creates complete transformations, using your scripts as the core of them, to populate the tables in the Celonis database in the OCPM Data Pool. The transformations are stored in the data job test:ocpm-data-job when you publish to the development environment, and in ocpm-data-job when you publish to the production environment.

The naming convention for the transformations includes the type of transformation, the namespace (Celonis or custom), and the name of the target object type, event type, relationship or change table.

  • If you create a partial or full overwrite for a Celonis transformation, your custom SQL is joined to or replaces the Celonis transformation. You won’t see it listed separately in the data job.

  • If you create an extension to a Celonis transformation to populate custom attributes or relationships that you added to the Celonis object, your custom SQL is listed separately in the data job. The name of these transformations includes FIELD_EXTENSION.

Running transformations tells you how to view and run the transformation scripts.

Creating IDs for objects and events

The IDs that you use in the ID field for objects and events must be unique. If they aren't, you might end up with inaccurate results for metrics and KPIs, such as an incorrect count of events for an event type. Look out for warning messages about this when you load a perspective with your built objects and events (using the Load Data Model button in the data pool). We're soon going to start enforcing this by failing the perspective load if there are any duplicate IDs in an event type.

The best practice is to concatenate the name of the object type or event type with the primary key field (or fields) from the source data, and use the result as the ID. Use single quotes around the name of the object type or event type to identify it as a string. For example:

'Report_' || "Report ID" AS "ID"

Our query engine uses object and event IDs for partitioning. To get the best performance during data retrieval, your IDs should be meaningful and should include expressions that are found in other columns and tables. That way we can relate them to each other to produce more specific partitions. If the source system data uses random UUIDs in the ID column, we recommend you avoid using that alone. Instead, create a new ID column by concatenating it with columns that are present in other tables too.

Important

Though meaningful IDs produce useful partitions, try to keep the overall length of the ID as short as you can - don’t include every possible expression. The ID length has a performance impact for running transformations, not only for creating the objects and events themselves, but also when the ID is used as a foreign key in relationship columns and tables for other objects and events.

If you’re using data from more than one source system to create objects and events, you might have duplicated identifiers across the source systems. You can handle this by concatenating a unique name or identifier for each source system instance as part of your event ID. The Celonis catalog transformations have a local parameter sourceSystem, the value of which is prepended to the ID for an object type or event type, for example:

SELECT <%=sourceSystem%> || 'Contract_' || "EKKO"."MANDT" || "EKKO"."EBELN"
   AS "ID",

If you have multiple copies of the Celonis catalog transformations for the same process for different source system connections, set the sourceSystem parameter to a unique name for each source system instance. For custom transformations, you can do the same thing with your IDs, either using a local parameter or hard-coding a source system name.

Important

Though the source system names need to be unique, remember to keep the names you use as short as possible, to minimize the overall length of the ID string. You don't need to use the full name of the system - a single letter will do as long as it's different.

You might still see duplicate IDs if you have either of these situations:

  • Duplicate records in the source system data.

  • Including the same data in more than one way.

If you think the cause might be duplicate records in your source system data, Troubleshooting perspectives includes a method to check this. If you can't fix the problem in the source system, you can use filtering during data extraction to remove unwanted duplicate records, or remove the duplicates in pre-processing. Troubleshooting data extraction and pre-processing has instructions to set up a pre-processing stage.

Including the same data in more than one way can happen if, for example:

  • You include the same source system data set in two different sets of pre-transformed data, and use them both in transformations to create events.

  • You take data for an attribute of an object type from a different data table, and inadvertently create duplicates during the join - maybe by populating the attribute from the row level data rather than the header data.

Situations like this create a representation of the same real event more than once in the system. This isn't correct, and you should change the setup of your transformations so it doesn't happen - for example, by adding a filter condition.