Benchmarking

In this notebook we will show how you can use CARLA for benchmarking. First, we need to load some factuals, and generate counterfactuals for them. For more explanation on how to do this, please take a look at our How to use CARLA notebook.

[1]:
from carla import Benchmark
import carla.evaluation.catalog as evaluation_catalog
from carla.data.catalog import OnlineCatalog
from carla.models.catalog import MLModelCatalog
from carla.models.negative_instances import predict_negative_instances
import carla.recourse_methods.catalog as recourse_catalog

import warnings
warnings.filterwarnings("ignore")

%load_ext autoreload
%autoreload 2
Using TensorFlow backend.
[INFO] Using Python-MIP package version 1.12.0 [model.py <module>]

Generating counterfactuals

Before we can benchmark anything, we need some data, a classification model and a recourse method.

[2]:
data_name = "adult"
dataset = OnlineCatalog(data_name)
[3]:
# load catalog model
model_type = "ann"
ml_model = MLModelCatalog(
    dataset,
    model_type=model_type,
    load_online=True,
    backend="pytorch"
)
[4]:
hyperparams = {
        "data_name": data_name,
        "vae_params": {
            "layers": [len(ml_model.feature_input_order), 512, 256, 8],
        },
    }

# define your recourse method
recourse_method = recourse_catalog.CCHVAE(ml_model, hyperparams)
[INFO] Start training of Variational Autoencoder... [models.py fit]
[INFO] [Epoch: 0/5] [objective: 0.375] [models.py fit]
[INFO] [ELBO train: 0.38] [models.py fit]
[INFO] [ELBO train: 0.13] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] ... finished training of Variational Autoencoder. [models.py fit]
[5]:
# get some negative instances
factuals = predict_negative_instances(ml_model, dataset.df)
factuals = factuals[:100]

# find counterfactuals
counterfactuals = recourse_method.get_counterfactuals(factuals)

Benchmarking

[6]:
# first initialize the benchmarking class by passing
# black-box-model, recourse method, and factuals into it
benchmark = Benchmark(ml_model, recourse_method, factuals)

# now you can decide if you want to run all measurements
# or just specific ones.
evaluation_measures = [
    evaluation_catalog.YNN(benchmark.mlmodel, {"y": 5, "cf_label": 1}),
    evaluation_catalog.Distance(benchmark.mlmodel),
    evaluation_catalog.SuccessRate(),
    evaluation_catalog.Redundancy(benchmark.mlmodel, {"cf_label": 1}),
    evaluation_catalog.ConstraintViolation(benchmark.mlmodel),
    evaluation_catalog.AvgTime({"time": benchmark.timer}),
]

# now run all implemented measurements and create a
# DataFrame which consists of all results
results = benchmark.run_benchmark(evaluation_measures)

display(results.head(5))
y-Nearest-Neighbours L0_distance L1_distance L2_distance Linf_distance Success_Rate Redundancy Constraint_Violation avg_time
0 0.21 10.0 4.348742 4.036693 1.0 1.0 6 1 1.52311
1 NaN 8.0 2.235326 2.016212 1.0 NaN 3 1 NaN
2 NaN 7.0 1.564834 1.106475 1.0 NaN 5 1 NaN
3 NaN 11.0 5.938143 5.252787 1.0 NaN 6 2 NaN
4 NaN 8.0 2.435960 2.054773 1.0 NaN 6 1 NaN
[6]: