Stanford University
04-dags-exercises.qmd
)dagify()
. Write your assumption that smoking
causes cancer
as a formula.ggdag()
05:00
02-dags-exercises.qmd
)Ok, correlation != causation. But why not?
We want to know if x -> y
…
But other paths also cause associations
ggdag_paths()
Identify “backdoor” paths
tidy_dagitty()
on coffee_cancer_dag
to create a tidy DAG, then pass the results to dag_paths()
. What’s different about these data?ggdag_paths()
. (Just give it coffee_cancer_dag
rather than using dag_paths()
; the quick plot function will do that for you.) Remember, since we assume there is no causal path from coffee to lung cancer, any open paths must be confounding pathways.04:00
# A DAG with 4 nodes and 3 edges
#
# Exposure: coffee
# Outcome: cancer
#
# A tibble: 5 × 11
set name x y direction to xend yend
<chr> <chr> <dbl> <dbl> <fct> <chr> <dbl> <dbl>
1 1 addictive -1.59 -2.26 -> coffee -2.72 -1.83
2 1 addictive -1.59 -2.26 -> smoki… -0.334 -2.73
3 1 cancer 0.801 -3.16 <NA> <NA> NA NA
4 1 coffee -2.72 -1.83 <NA> <NA> NA NA
5 1 smoking -0.334 -2.73 -> cancer 0.801 -3.16
# ℹ 3 more variables: circular <lgl>, label <chr>,
# path <chr>
We need to account for these open, non-causal paths
Randomization
Stratification, adjustment, weighting, matching, etc.
ggdag_adjustment_set()
to visualize the adjustment sets. Add the arguments use_labels = "label"
and text = FALSE
.lm()
or glm()
04:00
# A tibble: 500 × 4
addictive cancer coffee smoking
<dbl> <dbl> <dbl> <dbl>
1 0.569 3.11 -0.326 -1.29
2 0.411 1.52 0.330 -1.57
3 1.20 1.06 -0.557 -2.40
4 -0.782 -0.504 -0.148 0.376
5 0.0357 -0.709 -0.342 -1.53
6 1.96 1.05 -1.90 -0.823
7 1.13 0.211 -0.581 -0.534
8 0.697 0.892 -1.36 -0.267
9 -0.779 0.748 0.455 0.302
10 -1.13 0.930 0.568 0.742
# ℹ 490 more rows
Recreate the DAG we’ve been working with using time_ordered_coords()
, then visualize the DAG. You don’t need to use any arguments for this function, so coords = time_ordered_coords()
will do.
02:00
coffee_cancer_dag_to <- dagify(
cancer ~ smoking,
smoking ~ addictive,
coffee ~ addictive,
exposure = "coffee",
outcome = "cancer",
coords = time_ordered_coords(),
labels = c(
"coffee" = "Coffee",
"cancer" = "Lung Cancer",
"smoking" = "Smoking",
"addictive" = "Addictive \nBehavior"
)
)
ggdag(coffee_cancer_dag_to, use_labels = "label", text = FALSE)
Adjustment sets and domain knowledge
Conduct sensitivity analysis if you don’t have something important
Using prediction metrics
The 10% rule
Predictors of the outcome, predictors of the exposure
Forgetting to consider time-ordering (something has to happen before something else to cause it!)
Selection bias and colliders (more later!)
Incorrect functional form for confounders (e.g. BMI often non-linear)