Benchmarking

class evaluation.benchmark.Benchmark(mlmodel, recourse_method, factuals)

The benchmarking class contains all measurements. It is possible to run only individual evaluation metrics or all via one single call.

For every given factual, the benchmark object will generate one counterfactual example with the given recourse method.

Parameters
mlmodel: carla.models.MLModel

Black Box model we want to explain.

recourse_method: carla.recourse_methods.RecourseMethod

Recourse method we want to benchmark.

factuals: pd.DataFrame

Instances for which we want to find counterfactuals.

Methods

run_benchmark(measures)

Runs every measurement and returns every value as dict.

run_benchmark(measures)

Runs every measurement and returns every value as dict.

Parameters
measuresList[Evaluation]

List of Evaluation measures that will be computed.

Returns
pd.DataFrame
Return type

DataFrame

Distances

class evaluation.catalog.distance.Distance(*args: Any, **kwargs: Any)

Calculates the L0, L1, L2, and L-infty distance measures.

Methods

__call__(*args, **kwargs)

Call self as a function.

get_evaluation

get_evaluation(factuals, counterfactuals)
evaluation.catalog.distance.l0_distance(delta)

Computes L-0 norm, number of non-zero entries.

Parameters
delta: np.ndarray

Difference between factual and counterfactual

Returns
List[float]
Return type

List[float]

evaluation.catalog.distance.l1_distance(delta)

Computes L-1 distance, sum of absolute difference.

Parameters
delta: np.ndarray

Difference between factual and counterfactual

Returns
List[float]
Return type

List[float]

evaluation.catalog.distance.l2_distance(delta)

Computes L-2 distance, sum of squared difference.

Parameters
delta: np.ndarray

Difference between factual and counterfactual

Returns
List[float]
Return type

List[float]

evaluation.catalog.distance.linf_distance(delta)

Computes L-infinity norm, the largest change

Parameters
delta: np.ndarray

Difference between factual and counterfactual

Returns
List[float]
Return type

List[float]

Redundancy

class evaluation.catalog.redundancy.Redundancy(*args: Any, **kwargs: Any)

Computes redundancy for each counterfactual

Methods

__call__(*args, **kwargs)

Call self as a function.

get_evaluation

get_evaluation(counterfactuals, factuals)

Violations

class evaluation.catalog.violations.ConstraintViolation(*args: Any, **kwargs: Any)

Computes the constraint violation per factual as dataframe

Methods

__call__(*args, **kwargs)

Call self as a function.

get_evaluation

get_evaluation(factuals, counterfactuals)
evaluation.catalog.violations.constraint_violation(data, counterfactuals, factuals)

Counts constraint violation per counterfactual

Parameters
data:
counterfactuals:

Normalized and encoded counterfactual examples.

factuals:

Normalized and encoded factuals.

Return type

List[List[float]]

yNN

class evaluation.catalog.ynn.YNN(*args: Any, **kwargs: Any)

Computes y-Nearest-Neighbours for generated counterfactuals

Notes

  • Hyperparams

    • “y”: int

      Number of neighbours to use.

    • “cf_label”: int

      What class to use as a target.

Methods

__call__(*args, **kwargs)

Call self as a function.

get_evaluation

get_evaluation(factuals, counterfactuals)

Time

class evaluation.catalog.time.AvgTime(*args: Any, **kwargs: Any)

Computes average time for generated counterfactual

Methods

__call__(*args, **kwargs)

Call self as a function.

get_evaluation

get_evaluation(factuals, counterfactuals)
Return type

DataFrame