Using Propensity Scores

Lucy D’Agostino McGowan

Wake Forest University

2021-09-01

Propensity scores

Matching

Weighting

Stratification

Direct Adjustment

Estimands

Greifer, N., & Stuart, E. A. (2021). Choosing the estimand when matching or weighting in observational studies. arXiv preprint arXiv:2106.10577. See also Choosing Estimands in our book.

Propensity scores

Matching

Weighting

Stratification

Direct Adjustment

Target estimands

Average Treatment Effect (ATE)

\[\tau = E[Y(1) - Y(0)]\]

Target estimands

Average Treatment Effect among the Treated (ATT)

\[\tau = E[Y(1) - Y(0) | Z = 1]\]

Matching in R (ATT)

library(MatchIt)
m <- matchit(qsmk ~ sex + 
    race + age + I(age^2) + education + 
    smokeintensity + I(smokeintensity^2) + 
    smokeyrs + I(smokeyrs^2) + exercise + active + 
    wt71 + I(wt71^2), 
  data = nhefs_complete)
m
A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score
             - estimated with logistic regression
 - number of obs.: 1566 (original), 806 (matched)
 - target estimand: ATT
 - covariates: sex, race, age, I(age^2), education, smokeintensity, I(smokeintensity^2), smokeyrs, I(smokeyrs^2), exercise, active, wt71, I(wt71^2)

Matching in R (ATT)

matched_data <- get_matches(m, id = "i")
glimpse(matched_data)
Rows: 806
Columns: 71
$ i                 <chr> "11", "1220", "15", "1082", "18"…
$ subclass          <fct> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6,…
$ weights           <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ seqn              <dbl> 428, 23045, 446, 22294, 596, 140…
$ qsmk              <dbl> 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,…
$ death             <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,…
$ yrdth             <dbl> NA, NA, 88, NA, NA, NA, NA, NA, …
$ modth             <dbl> NA, NA, 1, NA, NA, NA, NA, NA, N…
$ dadth             <dbl> NA, NA, 3, NA, NA, NA, NA, NA, N…
$ sbp               <dbl> 135, 159, 141, 113, 151, NA, 125…
$ dbp               <dbl> 89, 91, 79, 73, 80, NA, 71, 85, …
$ sex               <fct> 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0,…
$ age               <dbl> 43, 49, 71, 36, 48, 51, 56, 40, …
$ race              <fct> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ income            <dbl> 19, 22, 17, 21, 18, 22, 20, 18, …
$ marital           <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ school            <dbl> 12, 12, 0, 12, 12, 9, 12, 10, 17…
$ education         <fct> 3, 3, 1, 3, 3, 2, 3, 2, 5, 5, 3,…
$ ht                <dbl> 176.5938, 160.2812, 147.0938, 17…
$ wt71              <dbl> 63.96, 47.29, 75.64, 68.38, 62.0…
$ wt82              <dbl> 79.83226, 53.07031, 56.69905, 73…
$ wt82_71           <dbl> 15.8722571, 5.7803073, -18.94095…
$ birthplace        <dbl> 42, 30, NA, 19, 36, 47, NA, 42, …
$ smokeintensity    <dbl> 30, 20, 40, 9, 2, 5, 20, 5, 30, …
$ smkintensity82_71 <dbl> -30, 0, -40, 1, -2, 1, -20, 0, -…
$ smokeyrs          <dbl> 24, 29, 41, 30, 30, 29, 11, 20, …
$ asthma            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ bronch            <dbl> 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,…
$ tb                <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ hf                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ hbp               <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 1,…
$ pepticulcer       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ colitis           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ hepatitis         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ chroniccough      <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,…
$ hayfever          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ diabetes          <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0,…
$ polio             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ tumor             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ nervousbreak      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ alcoholpy         <dbl> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,…
$ alcoholfreq       <dbl> 3, 0, 4, 0, 1, 2, 3, 1, 3, 0, 2,…
$ alcoholtype       <dbl> 3, 3, 4, 1, 2, 3, 4, 1, 2, 3, 3,…
$ alcoholhowmuch    <dbl> 2, 2, NA, 6, 1, 3, NA, 5, 1, 3, …
$ pica              <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0,…
$ headache          <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,…
$ otherpain         <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ weakheart         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ allergies         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ nerves            <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ lackpep           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ hbpmed            <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 1,…
$ boweltrouble      <dbl> 0, 2, 1, 2, 0, 0, 0, 0, 0, 2, 0,…
$ wtloss            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ infection         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ active            <fct> 1, 2, 1, 0, 1, 0, 0, 0, 0, 1, 2,…
$ exercise          <fct> 1, 1, 1, 0, 1, 0, 2, 0, 0, 2, 2,…
$ birthcontrol      <dbl> 2, 0, 0, 2, 0, 2, 0, 0, 2, 0, 2,…
$ pregnancies       <dbl> NA, 4, 15, NA, 3, NA, 4, 2, NA, …
$ cholesterol       <dbl> 173, 279, 229, 200, 225, 199, 23…
$ hightax82         <dbl> 0, 0, NA, 0, 0, 0, NA, 0, 0, 0, …
$ price71           <dbl> 2.346680, 2.104980, NA, 2.199707…
$ price82           <dbl> 1.797363, 1.698242, NA, 1.847900…
$ tax71             <dbl> 1.3649902, 1.0498047, NA, 1.1022…
$ tax82             <dbl> 0.5718994, 0.4399414, NA, 0.5718…
$ price71_82        <dbl> 0.54931641, 0.40686035, NA, 0.35…
$ tax71_82          <dbl> 0.7929688, 0.6099854, NA, 0.5303…
$ id                <int> 11, 1274, 15, 1135, 18, 564, 23,…
$ censored          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ older             <dbl> 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
$ distance          <dbl> 0.2384597, 0.2381918, 0.2090935,…

Target estimands

Average Treatment Effect among the Controls (ATC)

\[\tau = E[Y(1) - Y(0) | Z = 0]\]

Matching in R (ATC)

library(MatchIt)
m <- matchit(qsmk ~ sex + 
    race + age + I(age^2) + education + 
    smokeintensity + I(smokeintensity^2) + 
    smokeyrs + I(smokeyrs^2) + exercise + active + 
    wt71 + I(wt71^2), 
  data = nhefs_complete,
  estimand = "ATC")
m
A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score
             - estimated with logistic regression
 - number of obs.: 1566 (original), 806 (matched)
 - target estimand: ATC
 - covariates: sex, race, age, I(age^2), education, smokeintensity, I(smokeintensity^2), smokeyrs, I(smokeyrs^2), exercise, active, wt71, I(wt71^2)

Target estimands

Average Treatment Effect among the Matched (ATM)

Matching in R (ATM)

library(MatchIt)
m <- matchit(qsmk ~ sex + 
    race + age + I(age^2) + education + 
    smokeintensity + I(smokeintensity^2) + 
    smokeyrs + I(smokeyrs^2) + exercise + active + 
    wt71 + I(wt71^2), 
  data = nhefs_complete,
  link = "linear.logit", 
  caliper = 0.1) 
m

Observations with propensity scores (on the linear logit scale) within 0.1 standard errors (the caliper) will be discarded

Matching in R (ATM)

A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score [caliper]
             - estimated with logistic regression and linearized
 - caliper: <distance> (0.063)
 - number of obs.: 1566 (original), 780 (matched)
 - target estimand: ATT
 - covariates: sex, race, age, I(age^2), education, smokeintensity, I(smokeintensity^2), smokeyrs, I(smokeyrs^2), exercise, active, wt71, I(wt71^2)

Matching in R (ATM)

matched_data <- get_matches(m, id = "i")
glimpse(matched_data)
Rows: 780
Columns: 71
$ i                 <chr> "11", "1220", "15", "1082", "18"…
$ subclass          <fct> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6,…
$ weights           <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ seqn              <dbl> 428, 23045, 446, 22294, 596, 140…
$ qsmk              <dbl> 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,…
$ death             <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,…
$ yrdth             <dbl> NA, NA, 88, NA, NA, NA, NA, NA, …
$ modth             <dbl> NA, NA, 1, NA, NA, NA, NA, NA, N…
$ dadth             <dbl> NA, NA, 3, NA, NA, NA, NA, NA, N…
$ sbp               <dbl> 135, 159, 141, 113, 151, NA, 125…
$ dbp               <dbl> 89, 91, 79, 73, 80, NA, 71, 85, …
$ sex               <fct> 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,…
$ age               <dbl> 43, 49, 71, 36, 48, 51, 56, 40, …
$ race              <fct> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ income            <dbl> 19, 22, 17, 21, 18, 22, 20, 18, …
$ marital           <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,…
$ school            <dbl> 12, 12, 0, 12, 12, 9, 12, 10, 17…
$ education         <fct> 3, 3, 1, 3, 3, 2, 3, 2, 5, 5, 2,…
$ ht                <dbl> 176.5938, 160.2812, 147.0938, 17…
$ wt71              <dbl> 63.96, 47.29, 75.64, 68.38, 62.0…
$ wt82              <dbl> 79.83226, 53.07031, 56.69905, 73…
$ wt82_71           <dbl> 15.8722571, 5.7803073, -18.94095…
$ birthplace        <dbl> 42, 30, NA, 19, 36, 47, NA, 42, …
$ smokeintensity    <dbl> 30, 20, 40, 9, 2, 5, 20, 5, 30, …
$ smkintensity82_71 <dbl> -30, 0, -40, 1, -2, 1, -20, 0, -…
$ smokeyrs          <dbl> 24, 29, 41, 30, 30, 29, 11, 20, …
$ asthma            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ bronch            <dbl> 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,…
$ tb                <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ hf                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ hbp               <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0,…
$ pepticulcer       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ colitis           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ hepatitis         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ chroniccough      <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,…
$ hayfever          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ diabetes          <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0,…
$ polio             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ tumor             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ nervousbreak      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ alcoholpy         <dbl> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,…
$ alcoholfreq       <dbl> 3, 0, 4, 0, 1, 2, 3, 1, 3, 0, 1,…
$ alcoholtype       <dbl> 3, 3, 4, 1, 2, 3, 4, 1, 2, 3, 3,…
$ alcoholhowmuch    <dbl> 2, 2, NA, 6, 1, 3, NA, 5, 1, 3, …
$ pica              <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0,…
$ headache          <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,…
$ otherpain         <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ weakheart         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ allergies         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ nerves            <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ lackpep           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ hbpmed            <dbl> 0, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0,…
$ boweltrouble      <dbl> 0, 2, 1, 2, 0, 0, 0, 0, 0, 2, 0,…
$ wtloss            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ infection         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ active            <fct> 1, 2, 1, 0, 1, 0, 0, 0, 0, 1, 1,…
$ exercise          <fct> 1, 1, 1, 0, 1, 0, 2, 0, 0, 2, 2,…
$ birthcontrol      <dbl> 2, 0, 0, 2, 0, 2, 0, 0, 2, 0, 0,…
$ pregnancies       <dbl> NA, 4, 15, NA, 3, NA, 4, 2, NA, …
$ cholesterol       <dbl> 173, 279, 229, 200, 225, 199, 23…
$ hightax82         <dbl> 0, 0, NA, 0, 0, 0, NA, 0, 0, 0, …
$ price71           <dbl> 2.346680, 2.104980, NA, 2.199707…
$ price82           <dbl> 1.797363, 1.698242, NA, 1.847900…
$ tax71             <dbl> 1.3649902, 1.0498047, NA, 1.1022…
$ tax82             <dbl> 0.5718994, 0.4399414, NA, 0.5718…
$ price71_82        <dbl> 0.54931641, 0.40686035, NA, 0.35…
$ tax71_82          <dbl> 0.7929688, 0.6099854, NA, 0.5303…
$ id                <int> 11, 1274, 15, 1135, 18, 564, 23,…
$ censored          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ older             <dbl> 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
$ distance          <dbl> -1.1611429, -1.1626186, -1.33039…

Your Turn 1

10:00

Using the propensity scores you created in the previous exercise, create a “matched” data set using the ATM method with a caliper of 0.2.

Propensity scores

Matching

Weighting

Stratification

Direct Adjustment

Target estimands: ATE

Average Treatment Effect (ATE)

\[\Large w_{ATE} = \frac{Z_i}{p_i} + \frac{1-Z_i}{1 - p_i}\]

(Z / p) + ((1 - Z) / (1 - p))

Target estimands: ATT & ATC

Average Treatment Effect Among the Treated (ATT) \[\Large w_{ATT} = \frac{p_i Z_i}{p_i} + \frac{p_i (1-Z_i)}{1-p_i}\]

((p * Z) / p) + ((p * (1 - Z)) / (1 - p))

Target estimands: ATT & ATC

Average Treatment Effect Among the Controls (ATC) \[\Large w_{ATC} = \frac{(1-p_i)Z_i}{p_i} + \frac{(1-p_i)(1-Z_i)}{(1-p_i)}\]

(((1 - p) * Z) / p) + (((1 - p) * (1 - Z)) / (1 - p))

Target estimands: ATM & ATO

Average Treatment Effect Among the Evenly Matchable (ATM) \[\Large w_{ATM} = \frac{\min \{p_i, 1-p_i\}}{Z_ip_i + (1-Z_i)(1-p_i)}\]

pmin(p, 1 - p) / (Z * p + (1 - Z) * (1 - p))

Target estimands: ATM & ATO

Average Treatment Effect Among the Overlap Population \[\Large w_{ATO} = (1-p_i)Z_i + p_i(1-Z_i)\]

(1 - p) * Z + p * (1 - Z)

Histogram of propensity scores

ATE

ATT

ATC

ATM

ATO

ATE in R


Average Treatment Effect (ATE) \(w_{ATE} = \frac{Z_i}{p_i} + \frac{1-Z_i}{1 - p_i}\)

library(propensity)
df <- propensity_model |>
  augment(type.predict = "response", data = nhefs_complete) |>
  mutate(w_ate = wt_ate(.fitted, qsmk)) 

Your Turn

10:00

Using the propensity scores you created in the previous exercise, add the ATE weights to your data frame

Stretch: Using the same propensity scores, create ATM weights