ESTIMATE_CLUSTER_PARAMS
Description
The ESTIMATE_CLUSTER_PARAMS operator estimates appropriate values for the MIN_PTS
parameter of the CLUSTER_VARIANTS operator.
Warning
Operator Performance This operator performs very expensive computations and requires excessive memory and CPU resources. To avoid running out of memory, this operator is currently limited to 100,000 distinct variants.
Computation Times While executing this operator, users may also experience long computation times and unresponsive analyses as it occupies a vast amount of computation capacity.
The result values are very likely to have a major effect on the result of the CLUSTER_VARIANTS operator.
Syntax
ESTIMATE_CLUSTER_PARAMS ( table.variant_column, epsilon, number_of_values, recursion_depth )
variant_column: The column which stores the result of the VARIANT operator.
epsilon: INT value giving the search radius for measuring the variant density. It is quantified by the number of different relations between two subsequent activities in the variants. The value must be an integer in the range [0, 5]. The higher the value, the more it is likely that all variants are assigned to the same cluster. It is recommended to choose a quite low value (e.g. 2). However, the value of 0 requires equality between variants to be clustered.
number_of_values: INT value giving the number of values to estimate within one recursion. The value must be greater or equal than one.
recursion_depth: INT value giving the maximum number of recursions for the estimation. The value must be greater or equal than one.
Result: An INT column in which each number represents an estimated value for the MIN_PTS parameter of the CLUSTER_VARIANTS operator. The column contains at most number_of_values * recursion_depth
entries. Depending on the clustered data and the values chosen for parameters number_of_values and recursion_depth, the result column may also contain fewer entries.