Nonparametric estimation
If time allows, this will be covered on Feb 4. Here are slides
Under the assumption of conditional exchangeability given a sufficient adjustment set \(\vec{X}\), the average causal effect within subgroups defined by \(\vec{X}\) is identified by the difference in means across the treatment \(A\) within this subgroup.
This page walks through how to estimate that difference in means and (optinally) re-aggregate the conditional average causal effects to average causal effect estimates. We illustrate in a simple case represented by the DAG below.
In our illustration, the treatment a
is binary and the outcome y
is numeric. The confounder x
is a sufficient adjustment set and takes the value 0, 1, or 2.
You can simulate a dataset of 1,000 observations by running the code below.
library(tidyverse)
<- tibble(id = 1:1000) |>
simulated mutate(
x = rbinom(n(), 2, .5),
pi = case_when(
== 0 ~ .3,
x == 1 ~ .5,
x == 2 ~ .8
x
),a = rbinom(n(), 1, pi),
y = rnorm(n(), mean = x*a),
# For illustration, we include a sampling weight variable
# because in real settings there may be sampling weights
sampling_weight = 1
)
Estimate CATEs by subgroup means
To estimate the conditional average effect, we first group the data by the confounder x
and treatment a
. By conditional exchangeability, the rows with a == TRUE
and the rows with a == FALSE
are each a simple random sample of all rows within the subgroup defined by x
, so we can estimate the mean outcome \(\E(Y^a\mid X = x)\) by the subgroup sample mean, \(\E(Y\mid A = a, X = x)\).
\[ \hat{\text{E}}\left(Y^a\mid X = x\right) = \underbrace{\frac{\sum_{i:X_i=x,A_i=a}Y_i}{n_{x,a}}}_{\substack{\text{Sample mean}\\\text{in subgroup}\\X = x, A = a}} \]
<- simulated |>
average_potential_outcomes # Group by confounders and treatment
group_by(x,a) |>
# Summarize weighted mean of Y, weighted by sampling weights
summarize(
mean_y = weighted.mean(y, w = sampling_weight),
.groups = "drop"
|>
) print()
# A tibble: 6 × 3
x a mean_y
<int> <int> <dbl>
1 0 0 -0.0384
2 0 1 0.188
3 1 0 0.0994
4 1 1 0.953
5 2 0 -0.0851
6 2 1 1.95
As expected, the code above produces estimates for the mean of y
within subgroups defined by x
and a
. The conditional average causal effect is difference in y
across values of a
, within subgroups defined by x
. To take this difference, we pivot wider.
<- average_potential_outcomes |>
cate # Pivot wider and difference over A to estimate CATE
pivot_wider(
names_from = a,
names_prefix = "mean_y",
values_from = "mean_y"
|>
) mutate(cate = mean_y1 - mean_y0) |>
print()
# A tibble: 3 × 4
x mean_y0 mean_y1 cate
<int> <dbl> <dbl> <dbl>
1 0 -0.0384 0.188 0.226
2 1 0.0994 0.953 0.854
3 2 -0.0851 1.95 2.03
As expected, the code above estimates the Conditional Average Treatment Effect (CATE) within each population subgroup defined by the pre-treatment variable x
.
Re-aggregating to ATE
To determine the overall average treatment effect, we can re-aggregated the CATE estimates weighted by the size of each stratum: how many people have x == 1
, x == 2
, and x == 3
. First, determine the size of each stratum.
\[ \hat{\text{P}}\left(X = x\right) = \frac{1}{n}\sum_i \mathbb{I}(X_i = x) \]
<- simulated |>
stratum_sizes # Count sum of sampling weight in each stratum
count(x, wt = sampling_weight) |>
# Convert count to a proportion of the population
mutate(stratum_size = n / sum(n)) |>
print()
# A tibble: 3 × 3
x n stratum_size
<int> <dbl> <dbl>
1 0 244 0.244
2 1 479 0.479
3 2 277 0.277
Then merge and take the weighted mean of CATE over the strata.
\[ \hat{\text{E}}\left(Y^1-Y^0\right) = \sum_x \hat{\text{P}}(X = x)\left[\hat{\text{E}}\left(Y^1\mid X = x\right) - \hat{\text{E}}\left(Y^0\mid X = x\right)\right] \]
<- cate |>
ate left_join(stratum_sizes, by = join_by(x)) |>
summarize(ate = weighted.mean(cate, w = stratum_size)) |>
print()
# A tibble: 1 × 1
ate
<dbl>
1 1.03
Estimate by treatment weights
One can equivalently take a sampling view of causal inference. When I observe a unit \(i\) with outcome \(Y_i = Y_i^{A_i}\), the probability of observing this outcome is the product of the probability that the unit was sampled multiplied by the probability that the unit received treatment value \(A_i\), conditional on confounders. Just as one generates sampling weights for descriptive population inference by the inverse of the probability of sample inclusion, one can generate inverse probability of treatment weights for causal inference. The full weight for each unit \(i\) is then the product of these two weights:
\[ w_i = \text{SamplingWeight}_i\times \frac{1}{\text{P}(A_i\mid X_i)} \]
The code below constructs these weights.
<- simulated |>
data_weighted # Within confounder subgroups, determine the probability
# of the observed treatment
group_by(x) |>
mutate(probability_of_a = case_when(
# For treated units, proportion treated
== 1 ~ mean(a),
a # For untreated units, proportion untreated
== 0 ~ mean(1 - a)
a |>
)) # Calculate the total weight as the product
# of the sampling weight and the inverse probability of treatment
mutate(
total_weight = sampling_weight * (1 / probability_of_a)
|>
) ungroup() |>
print()
# A tibble: 1,000 × 8
id x pi a y sampling_weight probability_of_a total_weight
<int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.5 0 0.969 1 0.547 1.83
2 2 2 0.8 1 3.48 1 0.819 1.22
3 3 1 0.5 1 1.71 1 0.453 2.21
4 4 0 0.3 0 -0.797 1 0.709 1.41
5 5 0 0.3 0 -0.603 1 0.709 1.41
6 6 1 0.5 0 -1.98 1 0.547 1.83
7 7 1 0.5 0 0.216 1 0.547 1.83
8 8 2 0.8 1 1.65 1 0.819 1.22
9 9 0 0.3 0 -0.416 1 0.709 1.41
10 10 2 0.8 1 1.51 1 0.819 1.22
# ℹ 990 more rows
Once we have weights, we can directly estimate the Average Treatment Effect (ATE) by the mean difference in the weighted average outcomes in each treatment group,
\[ \hat{\text{E}}(Y^a) = \frac{\sum_{i:A_i=a} w_i Y_i}{\sum_{i:A_i=a}w_i} \] and report the difference in the estimated \(\hat{E}(Y^1)\) and \(\hat{E}(Y^0)\).
|>
data_weighted # Summarize weighted mean outcomes within treatment groups
group_by(a) |>
summarize(
estimate = weighted.mean(
y, w = total_weight
)|>
) # Difference across treatment groups and estimate the ATE
pivot_wider(names_from = a, names_prefix = "mean_y", values_from = estimate) |>
mutate(ate = mean_y1 - mean_y0) |>
print()
# A tibble: 1 × 3
mean_y0 mean_y1 ate
<dbl> <dbl> <dbl>
1 0.0147 1.04 1.03
Comparing the two estimates
Above, we illustrated two strategies to estimate the Average Treatment Effect.
- Estimate the CATE by subgroup means of \(Y\) within \(X\) and \(A\), then aggregate across strata weighted by size.
- Estimate the probability of treatment \(A\) given \(X\), then estimate the ATE by a weighted mean.
You may notice that the estimates by the two approaches are mathematically identical. At least under nonparametric estimation, one can show that these two estimators are two different ways of thinking about the exact same estimation process.
Concluding thoughts
Nonparametric estimation is worth knowing because of its simplicity: estimate the conditional average causal effect within subgroups by taking the difference in mean outcomes within subgroups. However, nonparametric estimation only works in practice when the confounding variables take only a few discrete values (in this example, x
was always 0, 1, or 2). In realistic settings, there are often many confounding variables that take many values each. For this reason, causal effects are most often estimated in practice by the model-based methods that we will learn next.