Custom configuration in the Python tool

The Python tool allows your Process Copilot to do ad-hoc calculations in addition to what is available in the referenced Knowledge Model. With this tool, Process Copilots can execute a custom Python script using input arguments provided by the Large Language Model (LLM).

Example

The following example shows the YAML configuration of a Python tool that computes the amount of unique elements in a column material names (MR.MATERIAL_NAME) and multiplies the result by a number n, that is chosen by the LLM.

 - id: python
    unique_id: count_material_name
    description: Multiply count of distinct material names
    input_schema:
      properties:
        n: <- Becomes an argument for the LLM
          description: Multiplier for the count of distinct material names
          type: integer
    columns:
      - MR.MATERIAL_NAME <- Will be a column in 'df'
      - MR.AMOUNT_IN_STOCK <- Will be a column in 'df'
    adhoc_filtering: false
    code:
      distinct_material_names_count = df['MR.MATERIAL_NAME'].nunique()
      distinct_material_names_count * n

Set up a custom configuration in the Python tool

You need to provide the following unique arguments to the tool for the configuration:

input_schema
The argument schema that the LLM will be able to provide to the Python code.
columns
The KPI and/or Record Attributes that should be available to the Python code via a Pandas dataframe as the df variable.
code
The Python code to execute when the Python tool is called.

Input schema

In the input schema you define which arguments the LLM will need to provide to the Python code. Each input that is defined can be used in the Python code as a regular variable. You should make sure to give the inputs meaningful names and descriptions so that the LLM understands how those inputs can be used.

Supported input variable types

The currently supported input variable types for the Python code are the following which are mapped to the respective Python equivalent:

Input Schema Type	Corresponding Python Type
Array	list
Boolean	bool
Integer	int
Number	float
String	str

Columns

To utilize data within your Python code, you can specify the IDs of the desired KPIs and Record Attributes in the columns section. This will provide you with access to a Pandas DataFrame containing the data you selected. The dataframe will be accessible via the df variable and available for use within your code.

Ad-hoc filtering

Optionally, the adhoc_filtering flag can be set to true. This allows the LLM to apply ad-hoc filters to the data loaded into df. For example, using the previously defined Python tool, you could instruct the LLM to execute the tool while only considering material names with a specific amount in stock.

Supported ad-hoc filter types

The following filter types are currently available and generated automatically in Process Copilots by the LLM:

Filter Type	Description	Examples
StringFilter	Filters a string column by an exact string value, a wildcard string, and case (in)sensitive.	MR.MATERIAL_NAME = ‘Steel’ MR.MATERIAL_NAME LIKE ‘%Steel%’
DateFilter	Filters a date column by a date range.	MR.CREATION_DATE BETWEEN ‘2021-01-01’ AND ‘2021-03-31’
NullFilter	Filters a column by a null check.	MR.MATERIAL_NAME IS NULL MR.MATERIAL_NAME IS NOT NULL
NumericFilter	Filters a numeric column by '=', '!=', '>', '>=', '<', '<=' operators.	MR.VALUE > 10.12 MR.VALUE <= 5.11

Limitations

The dataframe will be populated with pre-filtered data, which has two benefits:

The data loading with pre-filtering is much faster than filtering in Python code.
The Python tool can only load a maximum of 500,000 rows into df. If the subset of the data you want to access is less than 500,000 rows, but the total amount is more than 500,000, then some of the data you want to access might get cut off. Ad-hoc filtering allows you to only load the rows you need into the dataframe.

In the background the LLM will pass filter arguments to the Python tool, which uses them when loading the dataframe.

Code

Your Python code can utilize both the inputs defined in the input_schema and the Pandas dataframe df. It is important to note that the final line of your code must evaluate to a result, such as be an expression. Otherwise, the tool will not return any output. The available libraries within the code environment are restricted to NumPy, Pandas, and standard Python libraries. The code assistant, accessible in the configuration screen, manages these code related requirements and can be used to implement or draft Python functionalities.

Additional Examples

 - id: python
    code: |
      from datetime import datetime
      date_object = datetime.strptime(date, '%Y-%m-%d')
      weekday = date_object.strftime('%A')
      weekday
    unique_id: get_weekday
    description: Given a date, return which weekday it is
    input_schema:
      properties:
        date:
          type: string
          description: The date in 'YYYY-MM-DD' format

Explanation: The LLM can call the Python tool, provide a date, and get the corresponding weekday in return.

 - id: python
    unique_id: count_material_name
    description: Get the count of distinct material names
    columns:
      - MR.MATERIAL_NAME
      - MR.AMOUNT_IN_STOCK
    adhoc_filtering: true
    code: |
      distinct_material_names_count = df[MR.MATERIAL_NAME'].nunique()
      distinct_material_names_count

Explanation: This is the initial example where adhoc_filtering is set to true and the multiplication by n is removed. When the LLM is asked to count the number of distinct material names containing a specific material (e.g., "wood") with at least five units in stock, the LLM generates a filter on the fly while calling count_material_name and as a result, the pre-filtered data is loaded into df.