Black-Box-Model

This black-box-model wrapper contains the possibilities to either

  • use pre-trained Pytorch or Tensorflow models or

  • user-specified models by inheriting the abstract class.

Example implementations for both use-cases can be found in our section Examples.

Interface

class models.api.mlmodel.MLModel(data)

Abstract class to implement custom black-box-model for a given dataset with encoding and scaling processing.

Parameters
data: Data

Dataset inherited from Data-wrapper

Returns
None
Attributes
backend

Describes the type of backend which is used for the classifier.

data

Contains the data.api.Data dataset.

feature_input_order

Saves the required order of features as list.

raw_model

Contains the raw ML model built on its framework

Methods

predict:

One-dimensional prediction of ml model for an output interval of [0, 1].

predict_proba:

Two-dimensional probability prediction of ml model.

abstract property backend

Describes the type of backend which is used for the classifier.

E.g., tensorflow, pytorch, sklearn, xgboost

Returns
str
property data: carla.data.api.Data

Contains the data.api.Data dataset.

Returns
carla.data.Data
Return type

Data

abstract property feature_input_order

Saves the required order of features as list.

Prevents confusion about correct order of input features in evaluation

Returns
list of str
get_mutable_mask()

Get mask of mutable features.

For example with mutable feature “income” and immutable features “age”, the mask would be [True, False] for feature_input_order [“income”, “age”].

This mask can then be used to index data to only get the columns that are (im)mutable.

Returns
mutable_mask: np.array(bool)
get_ordered_features(x)

Restores the correct input feature order for the ML model, this also drops the columns not in the feature order. So it drops the target column, and possibly other features, e.g. categorical.

Only works for encoded data

Parameters
xpd.DataFrame

Data we want to order

Returns
outputpd.DataFrame

Whole DataFrame with ordered feature

abstract predict(x)

One-dimensional prediction of ml model for an output interval of [0, 1].

Shape of input dimension has to be always two-dimensional (e.g., (1, m), (n, m))

Parameters
xnp.Array or pd.DataFrame

Tabular data of shape N x M (N number of instances, M number of features)

Returns
iterable object

Ml model prediction for interval [0, 1] with shape N x 1

abstract predict_proba(x)

Two-dimensional probability prediction of ml model.

Shape of input dimension has to be always two-dimensional (e.g., (1, m), (n, m))

Parameters
xnp.Array or pd.DataFrame

Tabular data of shape N x M (N number of instances, M number of features)

Returns
iterable object

Ml model prediction with shape N x 2

abstract property raw_model

Contains the raw ML model built on its framework

Returns
object

Classifier, depending on used framework

Catalog

class models.catalog.catalog.MLModelCatalog(data, model_type, backend, cache=True, models_home=None, load_online=True, **kws)

Use pretrained classifier.

Parameters
datadata.catalog.DataCatalog Class

Correct dataset for ML model.

model_type{‘ann’, ‘linear’, ‘forest’}

The model architecture. Artificial Neural Network, Logistic Regression, and Random Forest respectively.

backend{‘tensorflow’, ‘pytorch’, ‘sklearn’, ‘xgboost’}

Specifies the used framework. Tensorflow and PyTorch only support ‘ann’ and ‘linear’. Sklearn and Xgboost only support ‘forest’.

cacheboolean, default: True

If True, try to load from the local cache first, and save to the cache if a download is required.

models_homestring, optional

The directory in which to cache data; see get_models_home().

kwskeys and values, optional

Additional keyword arguments are passed to passed through to the read model function

load_online: bool, default: True

If true, a pretrained model is loaded. If false, a model is trained.

Returns
None
Attributes
backend

Describes the type of backend which is used for the ml model.

feature_input_order

Saves the required order of feature as list.

model_type

Describes the model type

raw_model

Returns the raw ML model built on its framework

tree_iterator

A method needed specifically for tree methods.

Methods

predict:

One-dimensional prediction of ml model for an output interval of [0, 1].

predict_proba:

Two-dimensional probability prediction of ml model

property backend: str

Describes the type of backend which is used for the ml model.

E.g., tensorflow, pytorch, sklearn, …

Returns
backendstr

Used framework

Return type

str

property feature_input_order: List[str]

Saves the required order of feature as list.

Prevents confusion about correct order of input features in evaluation

Returns
ordered_featureslist of str

Correct order of input features for ml model

Return type

List[str]

property model_type: str

Describes the model type

E.g., ann, linear

Returns
backendstr

model type

Return type

str

predict(x)

One-dimensional prediction of ml model for an output interval of [0, 1]

Shape of input dimension has to be always two-dimensional (e.g., (1, m), (n, m))

Parameters
xnp.Array, pd.DataFrame, or backend specific (tensorflow or pytorch tensor)

Tabular data of shape N x M (N number of instances, M number of features)

Returns
outputnp.ndarray, or backend specific (tensorflow or pytorch tensor)

Ml model prediction for interval [0, 1] with shape N x 1

Return type

Union[ndarray, DataFrame, Tensor, Tensor]

predict_proba(x)

Two-dimensional probability prediction of ml model

Shape of input dimension has to be always two-dimensional (e.g., (1, m), (n, m))

Parameters
xnp.Array, pd.DataFrame, or backend specific (tensorflow or pytorch tensor)

Tabular data of shape N x M (N number of instances, M number of features)

Returns
outputnp.ndarray, or backend specific (tensorflow or pytorch tensor)

Ml model prediction with shape N x 2

Return type

Union[ndarray, DataFrame, Tensor, Tensor]

property raw_model: Any

Returns the raw ML model built on its framework

Returns
ml_modeltensorflow, pytorch, sklearn model type

Loaded model

Return type

Any

train(learning_rate=None, epochs=None, batch_size=None, force_train=False, hidden_size=[18, 9, 3], n_estimators=5, max_depth=5)
Parameters
learning_rate: float

Learning rate for the training.

epochs: int

Number of epochs to train for.

batch_size: int

Number of samples in each batch

force_train: bool

Force training, even if model already exists in cache.

hidden_size: list[int]

hidden_size[i] contains the number of nodes in layer [i]

n_estimators: int

Number of estimators in forest.

max_depth: int

Max depth of trees in the forest.

property tree_iterator

A method needed specifically for tree methods. This method should return a list of individual trees that make up the forest.