Examples

To get a better insight of how to use CARLA for the different purposes, we will provide here some short example implementations.

We also have Tutorial notebooks available, which can be found under contents.

To benchmark an arbitrary recourse method, we provide an example implementation based on Quickstart, in the section Benchmarking

Quickstart

In this case, we want to use a recourse method to generate counterfactual examples with pre-implemented dataset and black-box-model.

 1 from carla import OnlineCatalog, MLModelCatalog
 2 from carla.recourse_methods import GrowingSpheres
 3
 4 # 1. Load data set from the OnlineCatalog
 5 data_name = "adult"
 6 dataset = OnlineCatalog(data_name)
 7
 8 # 2. Load pre-trained black-box model from the MLModelCatalog
 9 model = MLModelCatalog(dataset, "ann")
10
11 # 3. Load recourse model with model specific hyperparameters
12 hyperparameters = {}
13 gs = GrowingSpheres(model, hyperparameters)
14
15 # 4. Generate counterfactual examples
16 factuals = dataset.df.sample(10)
17 counterfactuals = gs.get_counterfactuals(factuals)

Customization

The following examples will contain some pseudo-code to visualize how a Data, Black-Box-Model, or Recourse Method implementation would look like. The structure to generate counterfactuals with these user specific implementations would still resemble Quickstart.

Data

The easiest way to use your own data is using the CsvDataset. For this you need to define the continuous and categorical features, which of those are immutable, and the target. Then you just give the file path to your .csv file and you are good to go!

 1from carla.data.catalog import CsvCatalog
 2
 3continuous = ["age", "fnlwgt", "education-num", "capital-gain","hours-per-week", "capital-loss"]
 4categorical = ["marital-status", "native-country", "occupation", "race", "relationship", "sex", "workclass"]
 5immutable = ["age", "sex"]
 6
 7dataset = CsvCatalog(file_path="adult.csv",
 8                     continuous=continuous,
 9                     categorical=categorical,
10                     immutables=immutable,
11                     target='income')

If you want full control over your dataset, you can also implement it from scratch using our api.

 1from carla import Data
 2
 3# Custom data set implementations need to inherit from the Data interface
 4class MyOwnDataSet(Data):
 5    def __init__(self):
 6        # The data set can e.g. be loaded in the constructor
 7        self._dataset = load_dataset_from_disk()
 8
 9    # List of all categorical features
10    @property
11    def categorical(self):
12        return [...]
13
14    # List of all continuous features
15    @property
16    def continuous(self):
17        return [...]
18
19    # List of all immutable features which
20    # should not be changed by the recourse method
21    @property
22    def immutables(self):
23        return [...]
24
25    # Feature name of the target column
26    @property
27    def target(self):
28        return "label"
29
30    # The full dataset
31    @property
32    def df(self):
33        return self._dataset
34
35    # The training split of the dataset
36    @property
37    def df_train(self):
38        return self._dataset_train
39
40    # The test split of the dataset
41    @property
42    def df_test(self):
43         return self._dataset_test
44
45    # Data transformation, for example normalization of continuous features
46    # and encoding of categorical features
47    def transform(self, df):
48         return transformed_df
49
50    # Inverts transform operation
51    def inverse_transform(self, df):
52         return original_df

For reference you can always take a look at the data api. In addition we also have a concrete example of an implementation of our data api in our DataCatalog.

Black-Box-Model

 1 from carla import MLModel
 2
 3 # Custom black-box models need to inherit from
 4 # the MLModel interface
 5 class MyOwnModel(MLModel):
 6     def __init__(self, data):
 7         super().__init__(data)
 8         # The constructor can be used to load or build an
 9         # arbitrary black-box-model
10         self._mymodel = load_model()
11
12     # List of the feature order the ml model was trained on
13     @property
14     def feature_input_order(self):
15         return [...]
16
17     # The ML framework the model was trained on
18     @property
19     def backend(self):
20         return "pytorch"
21
22     # The black-box model object
23     @property
24     def raw_model(self):
25         return self._mymodel
26
27     # The predict function outputs
28     # the continuous prediction of the model
29     def predict(self, x):
30         return self._mymodel.predict(x)
31
32     # The predict_proba method outputs
33     # the prediction as class probabilities
34     def predict_proba(self, x):
35         return self._mymodel.predict_proba(x)

See below a concrete example on how to use a custom model in our framework. Note that the tree_iterator method is specific for tree methods, and is not used for other recourse methods.

 1from carla import MLModel
 2import xgboost
 3
 4class XGBoostModel(MLModel):
 5    """The default way of implementing XGBoost
 6    https://xgboost.readthedocs.io/en/latest/python/python_intro.html"""
 7
 8    def __init__(self, data):
 9        super().__init__(data)
10
11        # get preprocessed data
12        df_train = self.data.df_train
13        df_test = self.data.df_test
14
15        # we only use the continuous features here
16        # so you might want to also include the categorical
17        # features
18        x_train = df_train[self.data.continuous]
19        y_train = df_train[self.data.target]
20        x_test = df_test[self.data.continuous]
21        y_test = df_test[self.data.target]
22
23        # you can not only use the feature input order to
24        # order the data but also to e.g. restrict the input
25        # to only the continous features
26        self._feature_input_order = self.data.continuous
27
28        param = {
29            "max_depth": 2,  # determines how deep the tree can go
30            "objective": "binary:logistic",  # determines the loss function
31            "n_estimators": 5,
32        }
33        self._mymodel = xgboost.XGBClassifier(**param)
34        self._mymodel.fit(
35                x_train,
36                y_train,
37                eval_set=[(x_train, y_train), (x_test, y_test)],
38                eval_metric="logloss",
39                verbose=True,
40            )
41
42    @property
43    def feature_input_order(self):
44        # List of the feature order the ml model was trained on
45        return self._feature_input_order
46
47    @property
48    def backend(self):
49        # The ML framework the model was trained on
50        return "xgboost"
51
52    @property
53    def raw_model(self):
54        # The black-box model object
55        return self._mymodel
56
57    @property
58    def tree_iterator(self):
59        # make a copy of the trees, else feature names are not saved
60        booster_it = [booster for booster in self.raw_model.get_booster()]
61        # set the feature names
62        for booster in booster_it:
63            booster.feature_names = self.feature_input_order
64        return booster_it
65
66    # The predict function outputs
67    # the continuous prediction of the model
68    def predict(self, x):
69        return self._mymodel.predict(self.get_ordered_features(x))
70
71    # The predict_proba method outputs
72    # the prediction as class probabilities
73    def predict_proba(self, x):
74        return self._mymodel.predict_proba(self.get_ordered_features(x))

Recourse Method

This short code example shows you how to implement a recourse method. Any configuration options should be passed to the initializer, not the get_counterfactuals method, trough the hyperparameters dictionary. Also note that the recourse method has access to the mlmodel, which in turn has access to the data object. So for example, if you want some property of the scaler, you can access that by mlmodel.data.scaler.

 1from carla import RecourseMethod
 2
 3 # Custom recourse implementations need to
 4 # inherit from the RecourseMethod interface
 5 class MyRecourseMethod(RecourseMethod):
 6     def __init__(self, mlmodel, hyperparameters):
 7         super().__init__(mlmodel)
 8         # the constructor can be used to load the recourse method,
 9         # or construct everything necessary
10
11     # Generate and return encoded and
12     # scaled counterfactual examples
13     def get_counterfactuals(self, factuals: pd.DataFrame):
14         # This property is responsible to generate and output
15         # encoded and scaled (i.e. transformed) counterfactual examples
16         # as pandas DataFrames.
17         # Concretely this means that e.g. the counterfactuals should have
18         # the same one-hot encoding as the factuals, and e.g. they both
19         # should be min-max normalized with the same range.
20         # It's expected that there is a single counterfactual per factual,
21         # however in case a counterfactual cannot be found it should be NaN.
22             [...]
23         return counterfactual_examples

For lots of different example of how to do this, you can take a look at all the methods in our recourse catalog. For example the Wachter method is a clean example.

Benchmarking

 1 from carla import Benchmark
 2
 3 # 1. Initilize the benchmarking class by passing
 4 # black-box-model, recourse method, and factuals into it
 5 benchmark = Benchmark(model, gs, factuals)
 6
 7 # 2. Either only compute the distance measures
 8 distances = benchmark.compute_distances()
 9
10 # 3. Or run all implemented measurements and create a
11 # DataFrame which consists of all results
12 results = benchmark.run_benchmark()