Commit Policy
The connector accumulates data into files before it uploads it to Celonis Platform. Please check How it works section for details.
The commit policy is a set of rules to be applied by the connector to determine when data is uploaded. The goal is to avoid small files (their file size is in kilobytes) and avoid delaying the records for too long.
There are three configuration parameters to set to control the behavior:
parquet file size
number of records in the file
time since the last write
Once a record has been written to a file associated with a source topic partition, the sink checks if the file should be committed. The file is uploaded if any of the first two criteria are met.
The time since the last write is key to reducing the time for data to be uploaded. There are scenarios where data is not stored in a Kafka topic every few milliseconds or seconds. Depending on the context, there can be a gap of minutes or even hours before new data arrives on a topic. The extreme is for no record to ever arrive at the topic. Since these delays can be common, the first two criteria will take hours to be reached, or it might never be the case. Therefore, any accumulated data should not be delayed from being uploaded to Celonis Platform. Thus, the time since the last write offers a stop-gap makes and ensures the data will always be uploaded.