Semi-synthetic continuous-treatment dataset from a regression problem.
The dataset starts from a supervised regression sample (X, y) returned by _load_dataset or by the optional load_dataset callable. The features and target are standardized, a clone of the supplied scikit-learn regressor is fit on the normalized regression problem, and the fitted predictions become the observed treatment values. The structural response then adds a random spline effect of the treatment to the fitted regression mean.
If the raw regression sample is \((X_i^{\mathrm{raw}}, y_i^{\mathrm{raw}})\) for \(i = 1, \ldots, n\), the dataset first standardizes each feature column and the target:
$$
X_{ij} =
\frac{X_{ij}^{\mathrm{raw}} - \mu_j}{s_j},
\qquad
y_i = \frac{y_i^{\mathrm{raw}} - \mu_y}{s_y},
$$
where zero empirical standard deviations are replaced by 1 in the code.
A cloned regressor is then fit on the normalized regression task and its fitted predictions define the treatment:
$$
\hat{m} = \operatorname{fit}(\text{regressor}, X, y),
\qquad
T_i = \hat{m}(X_i).
$$
Let \(B(t) \in \mathbb{R}^K\) denote the spline basis produced by SplineTransformer after fitting on the realized treatments \(T_1, \ldots, T_n\). The random spline coefficients are sampled as
$$
\beta_k \stackrel{\mathrm{iid}}{\sim} \mathcal{N}\!\left(0, \frac{1}{K}\right),
\qquad k = 1, \ldots, K,
$$
and the treatment effect is centered over the observed treatment sample:
$$
g(t) = \lambda
\left(
B(t)^\top \beta
- \frac{1}{n} \sum_{i=1}^n B(T_i)^\top \beta
\right),
$$
where \(\lambda =\) treatment_effect_scale. If all realized treatments are identical, the implementation skips the spline fit and uses \(g(t) \equiv 0\).
The noiseless response surface exposed by predict_y is therefore
$$
\mu(x, t) = \hat{m}(x) + g(t).
$$
The observed outcomes returned by load are sampled at the realized treatments with additive Gaussian noise:
$$
Y_i = \mu(X_i, T_i) + \varepsilon_i,
\qquad
\varepsilon_i \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(0, \sigma^2),
$$
where \(\sigma =\) outcome_noise_scale is the noise standard deviation used in the implementation.