Empirical Questions
Topic for 1/5. Slides
Data science should begin with clarity about what questions data can (and cannot) answer. The questions data science can most directly answer are empirical questions, which involve quantities that could (at least hypothetically) be measured in the world. Empirical questions include:
- What proportion of U.S. adults were employed last month?
- What is the median household income in the U.S.?
- To what degree would employment and median household income increase if we adopted a tax policy to incentivize hiring?
Empirical questions: Descriptive and causal
This class will focus on two categories of empirical questions. Descriptive questions describe the world as it is. The first two questions above are descriptive: they summarize the proportion employed and the median household income. If we everyone would tell us their employment and income, we could estimate these quantities directly.
Causal questions involve counterfactual outcomes that do not exist, but would be realized if some aspect of the world were different in a specific way. The third question above is causal because it involves a counterfactual state of the world: what would happen if a new tax policy were implemented.
Non-empirical questions
Other questions are not empirical. One common category of non-empirical questions is normative questions that involve a judgment about what is right or what ought to be. Normative questions include:
- The median household income in the U.S. is too low.
- The U.S. should address inequality with the policies common in Sweden.
Data alone cannot answer non-empirical questions. It is impossible to prove with data that the median household income is too low, for example. You might personally believe that it is too low, and your own subjective beliefs might motivate your analysis. If you want to convince people that median incomes are too low, you might answer empirical questions about the median income and the median value of various costs such as rent, educational expenses, and food. Data science provides empirical answers to these questions. But data science stops short of value judgments about the results.
This class takes inequality as a motivating class of examples. Inequality is a topic for which many people bring normative commitments. As a scholar of inequality, you might hope that your objective evidence might spur people to action to promote equality and fairness. Our own normative commitments are often a good motivator for new empirical investigations. But for the credibility of the analysis it is also important to keep normative judgments separate from empirical claims. The most credible data science claims are precise and focus on the questions that data can most directly answer.