
Stanford University






04-dags-exercises.qmd)dagify(). Write your assumption that smoking causes cancer as a formula.ggdag()05:00 02-dags-exercises.qmd)Ok, correlation != causation. But why not?
We want to know if x -> y…
But other paths also cause associations
ggdag_paths()Identify “backdoor” paths
tidy_dagitty() on coffee_cancer_dag to create a tidy DAG, then pass the results to dag_paths(). What’s different about these data?ggdag_paths(). (Just give it coffee_cancer_dag rather than using dag_paths(); the quick plot function will do that for you.) Remember, since we assume there is no causal path from coffee to lung cancer, any open paths must be confounding pathways.04:00 # DAG:
# A `dagitty` DAG with: 4 nodes and 3 edges
# Exposure: coffee
# Outcome: cancer
# Paths: 1 open path: {coffee <- addictive -> smoking -> cancer}
#
# Data:
# A tibble: 5 × 11
set name x y direction to xend yend
<chr> <chr> <dbl> <dbl> <fct> <chr> <dbl> <dbl>
1 1 addictive -1.59 -2.26 -> coffee -2.72 -1.83
2 1 addictive -1.59 -2.26 -> smoki… -0.334 -2.73
3 1 cancer 0.801 -3.16 <NA> <NA> NA NA
4 1 coffee -2.72 -1.83 <NA> <NA> NA NA
5 1 smoking -0.334 -2.73 -> cancer 0.801 -3.16
# ℹ 3 more variables: label <chr>, path <chr>,
# path_type <chr>
#
# ℹ Use `pull_dag() (`?pull_dag`)` to retrieve the DAG object and `pull_dag_data() (`?pull_dag_data`)` for the data frame
We need to account for these open, non-causal paths
Randomization
Stratification, adjustment, weighting, matching, etc.

ggdag_adjustment_set() to visualize the adjustment sets. Add the arguments use_labels = "label" and text = FALSE.lm() or glm()04:00 
# A tibble: 500 × 4
addictive cancer coffee smoking
<dbl> <dbl> <dbl> <dbl>
1 0.0708 2.87 -0.565 -1.79
2 0.626 1.63 0.434 -1.35
3 1.98 1.43 -0.182 -1.62
4 -0.198 -0.223 0.133 0.960
5 1.66 0.0696 0.437 0.0954
6 1.37 0.765 -2.18 -1.41
7 0.791 0.0460 -0.745 -0.876
8 0.531 0.813 -1.44 -0.432
9 -0.861 0.708 0.415 0.220
10 -1.40 0.801 0.439 0.472
# ℹ 490 more rows
Recreate the DAG we’ve been working with using time_ordered_coords(), then visualize the DAG. You don’t need to use any arguments for this function, so coords = time_ordered_coords() will do.
02:00 coffee_cancer_dag_to <- dagify( cancer ~ smoking, smoking ~ addictive, coffee ~ addictive, exposure = "coffee", outcome = "cancer", coords = time_ordered_coords(), labels = c( "coffee" = "Coffee", "cancer" = "Lung Cancer", "smoking" = "Smoking", "addictive" = "Addictive \nBehavior" ) ) ggdag(coffee_cancer_dag_to, use_labels = "label", text = FALSE)coffee_cancer_dag_to <- dagify( cancer ~ smoking, smoking ~ addictive, coffee ~ addictive, exposure = "coffee", outcome = "cancer", coords = time_ordered_coords(), labels = c( "coffee" = "Coffee", "cancer" = "Lung Cancer", "smoking" = "Smoking", "addictive" = "Addictive \nBehavior" ) ) ggdag(coffee_cancer_dag_to, use_labels = "label", text = FALSE)

Adjustment sets and domain knowledge
Conduct sensitivity analysis if you don’t have something important
Using prediction metrics
The 10% rule
Predictors of the outcome, predictors of the exposure
Forgetting to consider time-ordering (something has to happen before something else to cause it!)
Selection bias and colliders (more later!)
Incorrect functional form for confounders (e.g. BMI often non-linear)