Discussion Meetings

This page contains a general plan for the topics of the 10 discussion section meetings that will occur this quarter.

Jan 7–9

In this discussion, we will introduce ourselves and ensure our computers are prepared for the class. This will involve checking that we have R, RStudio, tinytex, and tidyverse installed. See R basics.

Jan 14–16

Due to the fires, this week’s discussions became Zoom office hours.

Jan 21–23

In this discussion, you will register for an account to access the Current Population Survey data. Click here for detailed instructions.

Your TA may walk through how to look at the survey documentation for one or more of the variables. If time allows, we will break into small groups to look for other variables that might be interesting in a project, and then come back together to tell the class about them.

Jan 28–30

You should read the following paper before this discussion:

England, Paula, Andrew Levine, and Emma Mishel. 2020. Progress toward gender equality in the United States has slowed or stalled, PNAS 117(13):6990–6997.

The problem set reproduces findings from this paper. The discussion will focus on conceptual questions from the reading. The questions below are only guidelines, and your TA might have other topics to discuss.

1. The authors write that “change in the gender system has been deeply asymmetric.” Explain this in a sentence or two to someone who hasn’t read the article.

2. The authors discuss cultural changes that could lead to greater equality. Propose a question that could (hypothetically) be included in the CPS-ASEC questionnaire to help answer questions about cultural changes.

Tip

If you are not sure how to word a survey question, here are some examples from the American Time Use Survey, Current Population Survey, and General Social Survey.

3. The authors discuss institutional changes that could lead to greater equality. Propose a question that could (hypothetically) be included in the CPS-ASEC questionnaire to help answer questions about institutional changes.

4. What was one fact presented in this paper that most surprised you?

5. What about the graphs in this paper is compelling? What would you improve or change if you were producing these graphs?

Feb 4–6

Your TA will introduce the final project, which will be carried out in groups of about 5 students within your discussion section. Your TA will help you to form groups with common interest areas.

Possible topics include (but are not limited to) inequality by race or gender, trends in poverty or inequality over time, and the causal effect of education or other life experiences on subsequent outcomes. Another possibility is to form groups centered on a particular dataset, with the topic to be decided later.

All subsequent discussions are devoted to group work on the final project.

Feb 11–13

By the end of this discussion, your group should select a dataset for the final project. If you aren’t sure where to start, we suggest the options at ipums.org.

If you finish early, you are always welcome to move on to the tasks below.

Feb 18–20

By the end of this discussion, your group should

select an outcome variable
define the unit of analysis (e.g., a person, a school, a state) for which that outcome variable is defined
determine how you plan to aggregate the outcome variable across units (e.g., by a mean, median, proportion)

If you finish early, you are always welcome to move on to the tasks below.

Feb 25–27

By the end of this discussion, your group should discuss the target population and whether you will need to use sampling weights in your analysis.

If you finish early, you are always welcome to move on to the tasks below.

Mar 4–6

This discussion is open for your group to work on any remaining tasks to finish up your final project. Recall that PDF slides and writeup are due by 5pm on Mar 7, with one submission per group.

Mar 11–13

Your group will present your final project in this discussion. During each presentation by a group that is not your own, you will be asked to note something you appreciated about their presentation.

Other discussion possibilities

This part of the page has other possibilities for discussion topics. We may use these in one of the discussions depending on how the course material is progressing over the quarter.

Explain a ggplot function

Last week’s lecture introduced visualization with the ggplot() function in the tidyverse package. This discussion explores more of what ggplot() can do.

Form groups of about 4 students
Introduce yourself to your group members
Choose a graph that your group likes in Ch 1 of R For Data Science.
- Look at the code for that graph
- Discuss what each element of the code is doing
- Choose someone to present to the class
At the end of discussion, we will regroup. Each group will present the code for their chosen graph to the class

Explain a tidyverse function

This discussion explores more of what the tidyverse can do.

Form groups of about 4 students
Introduce yourself to your group members
Choose a function in Ch 3 of R For Data Science that we did not cover explicitly in class.
- Possibilities include: arrange(), distinct(), rename(), relocate(), the slice_ functions, ungroup()
- Find an example of the function in the chapter.
- Discuss how the code works.
- Prepare to explain to the class.
At the end of discussion, we will regroup. Each group will present the code for their chosen graph to the class

Write a function

Now that we have worked with functions in R, it is time to understand them on a deeper level. In this exercise, we will write our own functions.

A basic function

You can store a function as an object in your environment, just like any other object. The function below accepts a numeric variable x and returns twice the value of x.

double_x <- function(x) {
  doubled <- 2 * x
  return(doubled)
}

There are a few pieces to the code above

We created a function that has a name: double_x
Our function that takes one argument named x
The body of the function is the lines within the {}. These lines take the argument, do some things, and then return() a result. The object within return() is what R sends back after running the function.

Once you create that function, you can run it just like any other function.

double_x(x = 2)

[1] 4

A function with two arguments

A function can also take multiple arguments, such as one that adds x and y.

add_x_y <- function(x, y) {
  added <- x + y
  return(added)
}

which works as follows.

add_x_y(x = 2, y = 3)

[1] 5

Challenge: Write your own function

A function does not have to just take numbers as an argument. It can also take a dataset as an argument. Sometimes, we might want an estimator(data) function that takes data as an argument and applies an estimator() to that data, to return an estimate of something. You will create one such function here.

As an example, suppose three surveys ask people if they prefer chocolate ice cream (prefers_chocolate = TRUE) or vanilla ice cream (prefers_chocolate = FALSE). The survey also records whether the respondent is a child age = "child" or an adult age = "adult".

library(tidyverse)
sample_a <- tibble(
  age = c("child","child","child","adult","adult"),
  prefers_chocolate = c(T,T,F,F,F)
)
sample_b <- tibble(
  age = c("child","child","adult","adult"),
  prefers_chocolate = c(T,F,T,F)
)
sample_c <- tibble(
  age = c("child","child","child","adult","adult","adult"),
  prefers_chocolate = c(T,T,F,F,F,T)
)

We want to know the proportion preferring chocolate among children and adults in each sample. To estimate in sample_a, we would write

estimate <- sample_a |>
  group_by(age) |>
  summarize(prefers_chocolate = mean(prefers_chocolate))

Now write that within a function.

estimator <- function(data) {
  # do some things to data
  # return() your estimate
}

and apply the estimator to sample_a, sample_b, and sample_c.