Data

This data wrapper contains the possibilities to

  • use pre-processed datasets from our Catalog or

  • implement every possible dataset by inheriting our Interface

Example implementations for both use-cases can be found in our section Examples.

Interface

class data.api.data.Data

Abstract class to implement arbitrary datasets, which are provided by the user.

Attributes
categoricals

Provides the column names of categorical data.

continous

Provides the column names of continuous data.

immutables

Provides the column names of immutable data.

raw

The raw Dataframe without encoding or normalization

target

Provides the name of the label column.

abstract property categoricals

Provides the column names of categorical data. Column names do not contain encoded information as provided by a get_dummy() method (e.g., sex_female)

Label name is not included.

Returns
list of Strings

List of all categorical columns

abstract property continous

Provides the column names of continuous data.

Label name is not included.

Returns
list of Strings

List of all continuous columns

abstract property immutables

Provides the column names of immutable data.

Label name is not included.

Returns
list of Strings

List of all immutable columns

abstract property raw

The raw Dataframe without encoding or normalization

Returns
pd.DataFrame

Tabular data with raw information

abstract property target

Provides the name of the label column.

Returns
str

Target label name

Catalog

class data.catalog.catalog.DataCatalog(data_name)

Use already implemented datasets.

Parameters
data_name{‘adult’, ‘compas’, ‘give_me_some_credit’}

Used to get the correct dataset from online repository.

Returns
None
Attributes
categoricals

Provides the column names of categorical data.

continous

Provides the column names of continuous data.

immutables

Provides the column names of immutable data.

raw

The raw Dataframe without encoding or normalization

target

Provides the name of the label column.

property categoricals: List[str]

Provides the column names of categorical data. Column names do not contain encoded information as provided by a get_dummy() method (e.g., sex_female)

Label name is not included.

Returns
list of Strings

List of all categorical columns

Return type

List[str]

property continous: List[str]

Provides the column names of continuous data.

Label name is not included.

Returns
list of Strings

List of all continuous columns

Return type

List[str]

property immutables: List[str]

Provides the column names of immutable data.

Label name is not included.

Returns
list of Strings

List of all immutable columns

Return type

List[str]

property raw: pandas.core.frame.DataFrame

The raw Dataframe without encoding or normalization

Returns
pd.DataFrame

Tabular data with raw information

Return type

DataFrame

property target: str

Provides the name of the label column.

Returns
str

Target label name

Return type

str

Causal Model

class data.causal_model.causal_model.CausalModel(scm_class)

Class with topological methods given a structural causal model. Uses the StructuralCausalModel and CausalGraphicalModel from https://github.com/ijmbarr/causalgraphicalmodels

Parameters
scm_class: str

Name of the structural causal model

Attributes
scm: StructuralCausalModel
cgm: CausalGraphicalModel
scm_class: str

Name of the structural causal model

structural_equations_np: dict

Contains the equations for the features in Numpy format.

structural_equations_ts: dict

Contains the equations for the features in Tensorflow format.

noise_distributions: dict

Defines the noise variables.

Methods

generate_dataset(size)

Generates a Data object using the structural causal equations

get_ancestors(node)

Returns all nodes having a path to node.

get_children(node)

Returns an iterator over successor nodes of n.

get_descendents(node)

Returns all nodes reachable from node.

get_non_descendents(node)

Returns all nodes not reachable from node.

get_parents(node[, return_sorted])

Returns an set over predecessor nodes of n.

get_topological_ordering([node_type])

Returns a generator of nodes in topologically sorted order.

visualize_graph([experiment_folder_name])

Visualize the causal graph.

property cgm: causalgraphicalmodels.cgm.CausalGraphicalModel
Returns
CausalGraphicalModel
Return type

CausalGraphicalModel

generate_dataset(size)

Generates a Data object using the structural causal equations

Parameters
size: int

Number of samples in the dataset

Returns
ScmDataset

a Data object filled with samples

Return type

ScmDataset

get_ancestors(node)

Returns all nodes having a path to node.

Parameters
nodestr

A node in the graph

Returns
set()

The ancestors of node

Return type

set

get_children(node)

Returns an iterator over successor nodes of n.

A successor of n is a node m such that there exists a directed edge from n to m.

Parameters
node: str

A node in the graph

Return type

set

get_descendents(node)

Returns all nodes reachable from node.

Parameters
nodestr

A node in the graph

Returns
set()

The descendants of node

Return type

set

get_non_descendents(node)

Returns all nodes not reachable from node.

Parameters
nodestr

A node in the graph

Returns
set()

The non-descendants of node

Return type

set

get_parents(node, return_sorted=True)

Returns an set over predecessor nodes of n.

A predecessor of n is a node m such that there exists a directed edge from m to n.

Parameters
nodestr

A node in the graph

return_sortedbool

Return the set sorted

get_topological_ordering(node_type='endogenous')

Returns a generator of nodes in topologically sorted order.

A topological sort is a nonunique permutation of the nodes such that an edge from u to v implies that u appears before v in the topological sort order.

Parameters
node_type: str

“endogenous” or “exogenous”, i.e. nodes with “x” or “u” prefix respectively

Returns
iterable

An iterable of node names in topological sorted order.

property noise_distributions: dict

Defines the noise variables.

Returns
dict
Return type

dict

property scm: causalgraphicalmodels.csm.StructuralCausalModel
Returns
StructuralCausalModel
Return type

StructuralCausalModel

property scm_class: str

Name of the structural causal model used to define the CausalModel

Returns
str
Return type

str

property structural_equations_np: dict

Contains the equations for the features in Numpy format.

Returns
dict
Return type

dict

property structural_equations_ts: dict

Contains the equations for the features in Tensorflow format.

Returns
dict
Return type

dict

visualize_graph(experiment_folder_name=None)

Visualize the causal graph.

Parameters
experiment_folder_name: str

Where to save figure.

Synthetic Data

class data.causal_model.synthethic_data.ScmDataset(scm, size)

Generate a dataset from structural equations

Parameters
scmCausalModel

Structural causal model

sizeint

Number of samples in the dataset

Attributes
categoricals

Provides the column names of the categorical data.

continous

Provides the column names of the continuous data.

immutables

Provides the column names of the immutable data.

raw

The raw Dataframe without encoding or normalization

target

Provies the name of the label column.

Methods

__call__(*args, **kwargs)

Call self as a function.

property categoricals: List[str]

Provides the column names of the categorical data.

Returns
List[str]
Return type

List[str]

property continous: List[str]

Provides the column names of the continuous data.

Returns
List[str]
Return type

List[str]

property immutables: List[str]

Provides the column names of the immutable data.

Returns
List[str]
Return type

List[str]

property raw: pandas.core.frame.DataFrame

The raw Dataframe without encoding or normalization

Returns
pd.DataFrame
Return type

DataFrame

property target: str

Provies the name of the label column.

Returns
str
Return type

str