Propensity Score Diagnostics

Checking balance

Love plots (Standardized Mean Difference)
ECDF plots

Standardized Mean Difference (SMD)

SMD in R

Calculate standardized mean differences

library(halfmoon)
library(tidyverse)

smds <- tidy_smd(
  df,
  .vars = c(confounder_1, confounder_2, ...),
  .group = exposure,
  .wts = wts # optional,
  make_dummy_vars = TRUE # optional
)library(halfmoon)
library(tidyverse)

smds <- tidy_smd(
  df,
  .vars = c(confounder_1, confounder_2, ...),
  .group = exposure,
  .wts = wts # optional,
  make_dummy_vars = TRUE # optional
)library(halfmoon)
library(tidyverse)

smds <- tidy_smd(
  df,
  .vars = c(confounder_1, confounder_2, ...),
  .group = exposure,
  .wts = wts # optional,
  make_dummy_vars = TRUE # optional
)library(halfmoon)
library(tidyverse)

smds <- tidy_smd(
  df,
  .vars = c(confounder_1, confounder_2, ...),
  .group = exposure,
  .wts = wts # optional,
  make_dummy_vars = TRUE # optional
)library(halfmoon)
library(tidyverse)

smds <- tidy_smd(
  df,
  .vars = c(confounder_1, confounder_2, ...),
  .group = exposure,
  .wts = wts # optional,
  make_dummy_vars = TRUE # optional
)library(halfmoon)
library(tidyverse)

smds <- tidy_smd(
  df,
  .vars = c(confounder_1, confounder_2, ...),
  .group = exposure,
  .wts = wts # optional,
  make_dummy_vars = TRUE # optional
)

Calculating SMDs

vars <- c(
  "sex", "race", "age", "education",
  "smokeintensity", "smokeyrs", 
  "exercise", "active", "wt71"
)

smds <- tidy_smd(
  nhefs_complete_wts,
  .vars = all_of(vars),
  .group = qsmk,
  .wts = w_ate,
  make_dummy_vars = TRUE
)

smds

Calculating SMDs

# A tibble: 28 × 4
   variable       method   qsmk      smd
   <chr>          <chr>    <chr>   <dbl>
 1 sex1           observed 1      0.160 
 2 race1          observed 1      0.177 
 3 age            observed 1     -0.282 
 4 education2     observed 1      0.112 
 5 education3     observed 1      0.0472
 6 education4     observed 1      0.0270
 7 education5     observed 1     -0.166 
 8 smokeintensity observed 1      0.217 
 9 smokeyrs       observed 1     -0.159 
10 exercise1      observed 1     -0.0398
# ℹ 18 more rows

Plotting SMDs

Plot them! (in a Love plot!)

ggplot(
  data = smds,
  aes(
    x = abs(smd), 
    y = variable, 
    group = method, 
    color = method
  )
) +  
  geom_love()ggplot(
  data = smds,
  aes(
    x = abs(smd), 
    y = variable, 
    group = method, 
    color = method
  )
) +  
  geom_love()ggplot(
  data = smds,
  aes(
    x = abs(smd), 
    y = variable, 
    group = method, 
    color = method
  )
) +  
  geom_love()ggplot(
  data = smds,
  aes(
    x = abs(smd), 
    y = variable, 
    group = method, 
    color = method
  )
) +  
  geom_love()

Love plot

Your turn 1

06:00

Create a Love Plot for the propensity score weighting you created in the previous exercise

ECDF

For continuous variables, it can be helpful to look at the whole distribution pre and post-weighting rather than a single summary measure

ECDF

Unweighted ECDF

ggplot(nhefs_complete_wts, aes(x = wt71, color = factor(qsmk))) + 
  geom_ecdf() +
  scale_color_manual(
    "Quit smoking", 
    values = c("#5154B8", "#5DB854"),
    labels = c("Yes", "No")
  ) + 
  xlab("Weight in Kg in 1971") + 
  ylab("Proportion <= x") ggplot(nhefs_complete_wts, aes(x = wt71, color = factor(qsmk))) + 
  geom_ecdf() +
  scale_color_manual(
    "Quit smoking", 
    values = c("#5154B8", "#5DB854"),
    labels = c("Yes", "No")
  ) + 
  xlab("Weight in Kg in 1971") + 
  ylab("Proportion <= x") ggplot(nhefs_complete_wts, aes(x = wt71, color = factor(qsmk))) + 
  geom_ecdf() +
  scale_color_manual(
    "Quit smoking", 
    values = c("#5154B8", "#5DB854"),
    labels = c("Yes", "No")
  ) + 
  xlab("Weight in Kg in 1971") + 
  ylab("Proportion <= x")

Unweighted ECDF

Weighted ECDF

ggplot(nhefs_complete_wts, aes(x = wt71, color = factor(qsmk))) +
  geom_ecdf(aes(weights = w_ate)) +
  scale_color_manual(
    "Quit smoking", 
    values = c("#5154B8", "#5DB854"),
    labels = c("Yes", "No")
  ) + 
  xlab("Weight in Kg in 1971") + 
  ylab("Proportion <= x (Weighted)") ggplot(nhefs_complete_wts, aes(x = wt71, color = factor(qsmk))) +
  geom_ecdf(aes(weights = w_ate)) +
  scale_color_manual(
    "Quit smoking", 
    values = c("#5154B8", "#5DB854"),
    labels = c("Yes", "No")
  ) + 
  xlab("Weight in Kg in 1971") + 
  ylab("Proportion <= x (Weighted)")

Weighted ECDF

Your turn 2

06:00

Create an unweighted ECDF examining the `park_temperature_high` confounder by whether or not the day had Extra Magic Hours.

Create a weighted ECDF examining the `park_temperature_high` confounder

Bonus! Weighted Tables in R

1. Create a “design object” to incorporate the weights

library(survey)

svy_des <- svydesign(
  ids = ~ 1,
  data = df,
  weights = ~ wts
)

2. Pass to `gtsummary::tbl_svysummary()`

library(gtsummary)
tbl_svysummary(svy_des, by = x) |>
  add_difference(everything() ~ "smd")
# modify_column_hide(ci) to hide CI column

Characteristic	0 N = 1,565 ¹	1 N = 1,561 ¹	Difference ²	95% CI ^2,3
WEIGHT IN KILOGRAMS IN 1971	69 (60, 80)	69 (59, 79)	0.01	-0.06, 0.08
0: WHITE 1: BLACK OR OTHER IN 1971			0.01	-0.06, 0.08
0	1,359 (87%)	1,352 (87%)
1	206 (13%)	209 (13%)
AGE IN 1971	43 (33, 52)	43 (33, 53)	-0.01	-0.08, 0.06
0: MALE 1: FEMALE			0.00	-0.07, 0.07
0	764 (49%)	764 (49%)
1	802 (51%)	797 (51%)
NUMBER OF CIGARETTES SMOKED PER DAY IN 1971	20 (10, 25)	20 (10, 30)	0.02	-0.05, 0.09
YEARS OF SMOKING	24 (15, 33)	24 (14, 33)	0.00	-0.07, 0.07
IN RECREATION, HOW MUCH EXERCISE? IN 1971, 0:much exercise,1:moderate exercise,2:little or no exercise			0.04	-0.03, 0.11
0	302 (19%)	294 (19%)
1	665 (42%)	691 (44%)
2	599 (38%)	576 (37%)
IN YOUR USUAL DAY, HOW ACTIVE ARE YOU? IN 1971, 0:very active, 1:moderately active, 2:inactive			0.03	-0.04, 0.10
0	700 (45%)	684 (44%)
1	718 (46%)	738 (47%)
2	147 (9.4%)	138 (8.9%)
¹ Median (Q1, Q3); n (%)
² Standardized Mean Difference
³ CI = Confidence Interval