library(tidyverse)
motherhood_simulated <- read_csv("https://soc114.github.io/data/motherhood_simulated.csv")Problem Set 9: Outcome Models for Causal Inference
This problem set uses the same data as Problem Set 8. To learn about the data, see that page.
This problem set is entirely coding. You will submit an .R script to a programming assignment in Gradescope. You should start by loading the data.
Create factual and counterfactual datasets. Use the
filter()function to create two data objects: one namedmotherswhich contains all observations withtreated == TRUEand one namednonmotherswhich contains all observations withtreated == FALSE. If you need help withfilter(), see R4DS 3.2.1.Estimate a model. Use the
lm()function to create an model for the probability of employment among non-mothers. Store your model in an object namedmodel_among_nonmothers.
This model should:
- be estimated using the
lm()function. - use this model formula:
y ~ race + pre_age + pre_educ + pre_marital + pre_employed + pre_fulltime + pre_tenure + pre_experience - for the
dataargument, use your data containing non-mothers. - you will need the argument
weights = sampling_weightto specify to weight the model by thesampling_weightvariable
Predict counterfactuals. Now take the
mothersdata. Create a new columny_as_nonmothercontaining the predicted value of employment if this mother were counterfactually a non-mother. You might usemutate()andpredict()in this step. Store your new dataset (with this one additional column) in an object calledmothers_predicted.Summarize by a weighted mean. Estimate the factual employment of mothers and their counterfactual employment that would be realized if they were non-mothers. Store your result in an object called
estimateswhich will be atibblewith with 1 row and 2 columns namedyandy_as_nonmother. To create this result,
- use
summarize()to take the mean of the factual outcomeyand the predicted counterfactualy_as_nonmotherin yourmothers_predicteddata - make sure to use
sampling_weightto account for the fact that mothers are sampled from the population with unequal probabilities