Problem Set 9: Outcome Models for Causal Inference

This problem set uses the same data as Problem Set 8. To learn about the data, see that page.

This problem set is entirely coding. You will submit an .R script to a programming assignment in Gradescope. You should start by loading the data.

library(tidyverse)
motherhood_simulated <- read_csv("https://soc114.github.io/data/motherhood_simulated.csv")
  1. Create factual and counterfactual datasets. Use the filter() function to create two data objects: one named mothers which contains all observations with treated == TRUE and one named nonmothers which contains all observations with treated == FALSE. If you need help with filter(), see R4DS 3.2.1.

  2. Estimate a model. Use the lm() function to create an model for the probability of employment among non-mothers. Store your model in an object named model_among_nonmothers.

This model should:

  1. Predict counterfactuals. Now take the mothers data. Create a new column y_as_nonmother containing the predicted value of employment if this mother were counterfactually a non-mother. You might use mutate() and predict() in this step. Store your new dataset (with this one additional column) in an object called mothers_predicted.

  2. Summarize by a weighted mean. Estimate the factual employment of mothers and their counterfactual employment that would be realized if they were non-mothers. Store your result in an object called estimates which will be a tibble with with 1 row and 2 columns named y and y_as_nonmother. To create this result,

Back to top