Multidimensional Treatments

Use this guide when treatment is stored in more than one column, such as a numeric dose together with a categorical regime.

When To Use This Guide

Start here if your treatment table has multiple columns and you want one end-to-end example that stays close to the production API.

What we’re estimating

With multiple treatment coordinates, the target is an Average Response Surface rather than a single curve:

\[ \mu(t_1, \ldots, t_k) = \mathbb{E}_P[Y^*(t_1, \ldots, t_k)] \]

In this example the treatment table has two columns:

t_0: one continuous dose
t_0_bin: one categorical regime label

So the estimand is

\[ \mu(t, c) = \mathbb{E}_P[Y^*(t, c)] \]

The output is one average response per requested row of a treatment grid. Here we will visualize that surface as one dose-response line per categorical level.

Tip

Think in terms of a treatment table, not a special-case estimator API:

fit(X, t, y) takes covariates X, a multi-column treatment table t, and outcomes y
grid is another treatment table with the same columns and compatible dtypes
predict(t=grid) returns one estimated average response per row in grid

This page compares four estimators that accept the same mixed treatment table:

Estimator	Role in this tutorial
`DirectNoCovariates`	Observational baseline: regress `Y` on the treatment table only
`DirectRegressor`	Outcome regression: model `E[Y \| X, T]` and average over `X`
`GPS`	Score-adjustment method: add a stabilized treatment score to the outcome model features
`DoublyRobustPseudoOutcome`	Outcome regression plus score correction, followed by a final smoother over treatment

Note

All causal interpretations below assume consistency, no unmeasured confounding after conditioning on X, and overlap across the treatment combinations you want to evaluate.

Step 1: Import dependencies

Code

import numpy as np
import polars as pl
import matplotlib.pyplot as plt

from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, PolynomialFeatures, SplineTransformer

from skcausal.causal_estimators import (
    DirectNoCovariates,
    DirectRegressor,
    DoublyRobustPseudoOutcome,
    GPS,
)
from skcausal.datasets import Synthetic2MultidimDataset
from skcausal.density.permutation_weighting import PermutationWeighting
from skcausal.plotting import plot_joint_curves, use_theme
from skcausal.utils.lookup import (
    all_causal_average_response_estimators,
    all_density_estimators,
)
from skcausal.utils.treatment_grid import make_cartesian_treatment_grid

use_theme()

Step 2: Load a confounded mixed-treatment dataset

Code

dataset = Synthetic2MultidimDataset(
    n=2000,
    n_features=5,
    n_categorical_treatments=2,
    mutual_info=0.7,
    categorical_effect_scale=1.0,
    random_state=0,
)
X, t, y = dataset.load()

X = X.rename({column: f"x{index}" for index, column in enumerate(X.columns)})
continuous_col, categorical_col = t.columns
y = y.rename({y.columns[0]: "y"})

pl.concat([X.head(5), t.head(5), y.head(5)], how="horizontal")

shape: (5, 8)

x0	x1	x2	x3	x4	t_0	t_0_bin	y
f64	f64	f64	f64	f64	f64	cat	f64
0.361595	1.304	0.947081	-0.703735	-1.265421	0.69931	"bin_1"	-1.216821
-0.623274	0.041326	-2.325031	-0.218792	-1.245911	0.350571	"bin_0"	-2.704428
-0.732267	-0.544259	-0.3163	0.411631	1.042513	0.3322	"bin_0"	-1.50206
-0.128535	1.366463	-0.665195	0.35151	0.90347	0.391606	"bin_1"	-1.141317
0.094012	-0.743499	-0.921725	-0.457726	0.220195	0.406004	"bin_0"	-2.443491

The table has five observed covariates, one continuous treatment column, one categorical treatment column, and one outcome column.

Optional: How this synthetic dataset is constructed

Synthetic2MultidimDataset starts from the confounded continuous-treatment benchmark SyntheticDataset2, then adds a categorical treatment coordinate by binning the continuous treatment into n_categorical_treatments levels.

The parameter mutual_info=0.7 weakens the link between the two treatment coordinates without removing it completely, and categorical_effect_scale=0.2 makes the categorical coordinate change the response surface directly.

This gives a benchmark where:

the continuous dose matters
the categorical label matters
treatment assignment is still confounded through X

Important

DirectNoCovariates accepts the mixed treatment table, but it does not adjust for confounding. In this dataset, any method that ignores X is still observational.

Step 3: Define practical model builders

The nuisance models below use the same expressive forest defaults. The only deliberate exception is the final doubly robust smoother, which uses a spline basis over the continuous treatment coordinate and a linear regressor over the expanded treatment features.

Code

def make_forest_mixed_treatment_regressor(categorical_column):
    return Pipeline(
        [
            (
                "encode_categorical_treatment",
                ColumnTransformer(
                    [
                        (
                            "categorical_treatment",
                            OneHotEncoder(handle_unknown="ignore", sparse_output=False),
                            [categorical_column],
                        )
                    ],
                    remainder="passthrough",
                    verbose_feature_names_out=False,
                ),
            ),
            (
                "regressor",
                RandomForestRegressor(
                    n_estimators=50,
                    min_samples_leaf=10,
                    random_state=0,
                    n_jobs=-1,
                ),
            ),
        ]
    )

Code

def make_forest_permutation_weighting():
    return PermutationWeighting(
        classifier=RandomForestClassifier(
            n_estimators=50,
            min_samples_leaf=10,
            random_state=0,
            n_jobs=-1,
        ),
        max_trials=50,
        random_state=0,
    )

Code

def make_spline_mixed_treatment_regressor(continuous_column, categorical_column):
    return Pipeline(
        [
            (
                "expand_mixed_treatment",
                ColumnTransformer(
                    [
                        (
                            "expand_continuous_treatment",
                            SplineTransformer(
                                n_knots=8,
                                degree=3,
                                include_bias=False,
                            ),
                            [continuous_column],
                        ),
                        (
                            "encode_categorical_treatment",
                            OneHotEncoder(
                                drop="first",
                                handle_unknown="ignore",
                                sparse_output=False,
                            ),
                            [categorical_column],
                        ),
                    ],
                    remainder="drop",
                    verbose_feature_names_out=False,
                ),
            ),
            (
                "interaction_terms",
                PolynomialFeatures(
                    degree=2,
                    interaction_only=True,
                    include_bias=False,
                ),
            ),
            ("regressor", LinearRegression()),
        ]
    )

Code

def mean_absolute_error(left, right):
    return float(np.mean(np.abs(np.asarray(left) - np.asarray(right))))

Step 4: Build the intervention grid

Code

grid = make_cartesian_treatment_grid(
    t,
    n_continuous_points=10,
)

grid.head(8)

shape: (8, 2)

t_0	t_0_bin
f64	cat
3.3844e-7	"bin_1"
3.3844e-7	"bin_0"
0.111111	"bin_1"
0.111111	"bin_0"
0.222222	"bin_1"
0.222222	"bin_0"
0.333333	"bin_1"
0.333333	"bin_0"

make_cartesian_treatment_grid replaces each continuous treatment column with evenly spaced points, keeps the observed categorical levels, and returns the Cartesian product. In this example that means 10 dose values times 2 categorical levels.

Note

The prediction contract is unchanged from the single-treatment guides: pass a treatment table to predict, and get back one average response per row of that table.

Step 5: Compute the truth

truth_curve = dataset.predict(X, grid).flatten()
truth = grid.with_columns(pl.Series("truth", truth_curve))

Because this is a synthetic dataset, we can evaluate the true average potential outcome surface exactly for every row in grid. That makes the rest of the tutorial a benchmarking exercise, not just an API example.

Step 6: Inspect the raw observational surface

Code

naive = DirectNoCovariates(
    outcome_regressor=make_forest_mixed_treatment_regressor(categorical_col)
)
naive.fit(X, t, y)
naive_curve = naive.predict(t=grid).flatten()

fig, axes = plt.subplots(
    1,
    len(dataset.levels_),
    figsize=(11, 4.5),
    sharey=True,
    constrained_layout=True,
)
axes = np.atleast_1d(axes)
plot_joint_curves(
    grid,
    {
        "Raw observational surface": naive_curve,
        "Truth": truth_curve,
    },
    ax=axes,
    separate_axes_by_group=True,
)
fig.suptitle("Raw observational surface before causal adjustment")
axes[0].legend(ncols=2)
plt.show()

DirectNoCovariates is a useful observational baseline, but it is not a causal estimate. The model ignores X, so even after splitting by t_0_bin, the continuous dose remains confounded by the covariates.

Step 7: Fit DirectRegressor

DirectRegressor uses the same mixed-treatment outcome model, but now the model sees both X and the treatment table. Prediction averages over the observed covariate sample at each requested treatment row.

direct = DirectRegressor(
    outcome_regressor=make_forest_mixed_treatment_regressor(categorical_col)
)
direct.fit(X, t, y)
direct_curve = direct.predict(t=grid).flatten()

Step 8: Fit GPS

GPS augments the outcome model with a treatment score computed from predict_density(X, t). In this example that score comes from PermutationWeighting, which returns a stabilized density-ratio-style quantity. So this is best read as a score-adjustment example, not as a parametric conditional-density example.

gps = GPS(
    density_regressor=make_forest_permutation_weighting(),
    outcome_regressor=make_forest_mixed_treatment_regressor(categorical_col),
    cv=2,
    max_samples_predict=500,
    random_state=0,
)
gps.fit(X, t, y)
gps_curve = gps.predict(t=grid).flatten()

Step 9: Fit DoublyRobustPseudoOutcome

DoublyRobustPseudoOutcome uses the same forest outcome model and the same stabilized score model, then smooths the pseudo-outcomes over treatment with a spline basis over the continuous coordinate and encoded indicators for the categorical coordinate.

dr = DoublyRobustPseudoOutcome(
    density_estimator=make_forest_permutation_weighting(),
    outcome_regressor=make_forest_mixed_treatment_regressor(categorical_col),
    pseudo_outcome_regressor=make_spline_mixed_treatment_regressor(
        continuous_col,
        categorical_col,
    ),
    cv=2,
    cross_fit=True,
    random_state=0,
)
dr.fit(X, t, y)
dr_curve = dr.predict(t=grid).flatten()

Step 10: Compare estimates

comparison = grid.with_columns(
    pl.Series("truth", truth_curve),
    pl.Series("DirectNoCovariates", naive_curve),
    pl.Series("DirectRegressor", direct_curve),
    pl.Series("GPS", gps_curve),
    pl.Series("DoublyRobustPseudoOutcome", dr_curve),
)

comparison.head(8)

shape: (8, 7)

t_0	t_0_bin	truth	DirectNoCovariates	DirectRegressor	GPS	DoublyRobustPseudoOutcome
f64	cat	f64	f64	f64	f64	f64
3.3844e-7	"bin_1"	-1.999153	-1.878706	-2.125528	-2.065738	-1.938303
3.3844e-7	"bin_0"	-2.593063	-1.872583	-2.642717	-2.570411	-2.380102
0.111111	"bin_1"	-2.139635	-2.400981	-2.132084	-2.091885	-2.276765
0.111111	"bin_0"	-2.77528	-3.115715	-2.733742	-2.740464	-2.746407
0.222222	"bin_1"	-2.180113	-2.249773	-2.117787	-2.084515	-2.311381
0.222222	"bin_0"	-2.827783	-2.650613	-2.744197	-2.708539	-2.786131
0.333333	"bin_1"	-2.12773	-1.89681	-2.102549	-1.923527	-2.06814
0.333333	"bin_0"	-2.759838	-2.003625	-2.710003	-2.752775	-2.665716

Code

fig, axes = plt.subplots(
    1,
    len(dataset.levels_),
    figsize=(12, 5.2),
    sharey=True,
    constrained_layout=True,
)
axes = np.atleast_1d(axes)
plot_joint_curves(
    grid,
    {
        "Truth": truth_curve,
        "DirectNoCovariates": naive_curve,
        "DirectRegressor": direct_curve,
        "GPS": gps_curve,
        "DoublyRobustPseudoOutcome": dr_curve,
    },
    ax=axes,
    separate_axes_by_group=True,
)
handles, labels = axes[0].get_legend_handles_labels()
axis_legend = axes[0].get_legend()
if axis_legend is not None:
    axis_legend.remove()
fig.suptitle("Estimated multidimensional response surface")
fig.legend(
    handles,
    labels,
    loc="outside lower center",
    ncols=3,
    frameon=False,
)
plt.show()

pl.DataFrame(
    {
        "estimator": [
            "DirectNoCovariates",
            "DirectRegressor",
            "GPS",
            "DoublyRobustPseudoOutcome",
        ],
        "mean_absolute_error": [
            mean_absolute_error(naive_curve, truth_curve),
            mean_absolute_error(direct_curve, truth_curve),
            mean_absolute_error(gps_curve, truth_curve),
            mean_absolute_error(dr_curve, truth_curve),
        ],
    }
).sort("mean_absolute_error")

shape: (4, 2)

estimator	mean_absolute_error
str	f64
"DoublyRobustPseudoOutcome"	0.10216
"DirectRegressor"	0.105087
"GPS"	0.130103
"DirectNoCovariates"	0.278844

Treat the MAE table as a compact summary, not the whole story. The faceted plot is still the best place to see whether an estimator misses one categorical level or only part of the dose range.

Note

With shared forest nuisance models in place, the remaining differences are more about estimator structure and stochastic variation than about an obviously underpowered outcome or score model. The doubly robust estimator still uses a different final smoother by design.

Optional: Verify discovered support with lookup

The main tutorial above uses four estimators because those are the current average-response estimators tagged as supporting both multidimensional treatments and mixed continuous/categorical treatment types.

estimator_catalog = all_causal_average_response_estimators(
    as_dataframe=True,
    filter_tags={"capability:multidimensional_treatment": True},
    return_tags=["capability:t_type", "capability:multidimensional_treatment"],
)

mixed_treatment_estimators = estimator_catalog[
    estimator_catalog["capability:t_type"].apply(
        lambda values: {"continuous", "categorical"}.issubset(set(values))
    )
][["name", "capability:t_type", "capability:multidimensional_treatment"]].reset_index(
    drop=True
)

mixed_treatment_estimators

	name	capability:t_type	capability:multidimensional_treatment
0	DirectNoCovariates	[continuous, categorical]	True
1	DirectRegressor	[continuous, categorical]	True
2	DoublyRobustPseudoOutcome	[continuous, categorical]	True
3	GPS	[continuous, categorical]	True

density_catalog = all_density_estimators(
    as_dataframe=True,
    filter_tags={"capability:multidimensional_treatment": True},
    return_tags=["capability:t_type", "capability:multidimensional_treatment"],
)

mixed_treatment_densities = density_catalog[
    (density_catalog["name"] != "Pipeline")
    & density_catalog["capability:t_type"].apply(
        lambda values: {"continuous", "categorical"}.issubset(set(values))
    )
][["name", "capability:t_type", "capability:multidimensional_treatment"]].reset_index(
    drop=True
)

mixed_treatment_densities

	name	capability:t_type	capability:multidimensional_treatment
0	CompositeFactorizedDensityEstimator	[continuous, categorical]	True
1	NaiveDensityEstimator	[continuous, categorical]	True
2	PermutationWeighting	[continuous, categorical]	True