In the 1970s, a scientific controversy took place with valuable lessons that are still relevant today. It started with a campaign that shared the benefits of using estrogen for the symptoms of menopause. Shortly after, women that received estrogen experienced a higher incidence of being diagnosed with endometrial cancer. This set the scene for a decades-long debate. Did estrogen accelerate the diagnosis of endometrial cancer? Or, did estrogen cause endometrial cancer? Unfortunately, this dilemma could have been avoided, and many lives saved.

Does estrogen accelerate the diagnosis of endometrial cancer?

Two researchers from Yale, Feinstein and Horwitz, suggested that estrogen leads to uterine bleeding and therefore led women on estrogens to increased medical attention and screening for cancer. In turn, they proposed this ascertainment bias led to increased detection of the pre-stages of endometrial cancer and performed a case-control study on women who received treatment due to uterine bleeding.

Does estrogen cause endometrial cancer?

On the other hand, two researchers from Harvard, Hutchinson and Rothman, suggested that estrogen use leads to an increased risk of endometrial cancer. They suggested that a restricted analysis of women who have uterine bleeding is susceptible to selection bias that distorted the relationship between estrogen and endometrial cancer.

So, who was right?

To help answer this battle between the biases, I need to introduce the key concepts of directed acyclic graphs (DAG). This will help to illustrate the dilemma and showcase how to design a study to understand the association between estrogen use and endometrial cancer, and furthermore, the different underlying assumptions that each researcher made before undertaking their analysis.

What is a DAG?

A DAG is a pictorial representation of variables and their associations defined as arrows starting from causes and leading to effects. Simply, a DAG follows 2 rules:

  1. A DAG must be acyclic (as the name implies) where you can’t follow the arrows and end up back at the variable you started at.
  2. For a DAG to be complete, the shared cause of any 2 or more variables must be included.

Here’s an example DAG using the example of estrogen use and endometrial cancer. In this DAG, estrogen use is a confounder between uterine bleeding and endometrial cancer since it affects both uterine bleeding and endometrial cancer.

What to look for in a DAG?

Firstly, when reading or drawing a DAG, pay attention to the pathways. A simple pathway connects three variables describing causality between an exposure and an outcome, where the third variable has three potential effects on this relationship:

Subject matter expertise is needed as a prerequisite to designing a DAG. Once a DAG is complete, researchers can use the information to identify biases and avoid them by changing the status of a pathway from open to closed (or vice versa) through conditioning on a variable (i.e., through study design or statistical methods), not conditioning on a variable or even eliminating the pathway.

How does learning about DAGs help resolve the researcher’s problem?

Below is the entire DAG showing the relationship between the variables in question along with their assumed causal associations as suggested from subject matter experts. Notice how considering all four variables jointly in a causal pathway has changed the relationship between estrogen and diagnosis of cancer. Uterine bleeding is now both a mediator between estrogen and diagnosis of cancer and a collider for estrogen and cancer (unmeasured), while Cancer (unmeasured) is a confounder between uterine bleeding and diagnosis of cancer.

The DAG helps highlight that the pathway in contention here is the confounding relationship cancer (unmeasured) has on the mediator uterine bleeding and diagnosis of cancer.  The researchers from Yale selected women who bleed, effectively conditioning on the mediator of estrogen and diagnosis of cancer, but since it is also a collider between estrogen and unmeasured cancer, it had the dual consequence of opening a blocked pathway and inducing a spurious association into the causal model. Unfortunately, this allowed the Yale researchers to conclude that detection bias (estrogen use led to uterine bleeding and therefore more women on estrogen use were screened for endometrial cancer) overestimated the association between estrogen use and endometrial cancer.

While the researchers from Harvard were correct in the assumption that selecting patients based on uterine bleeding is not controlling for the effect of detection bias. Instead, it is introducing a larger selection bias that reduces the effect between estrogen use and endometrial cancer. 

Now, after looking at the entire DAG, a single well-designed study could have avoided the dispute where the causal pathway between uterine bleeding and diagnosis of cancer does not exist. An example of a study that eliminates this pathway would be one that screens women for endometrial cancer at regular intervals whether they bleed or not (i.e., not conditioning on uterine bleeding). That would result in a DAG that looks something like this:

In this study, if we don’t find an association between estrogen and diagnosis of cancer, because there is no causal pathway from estrogen to diagnosis of cancer through cancer (unmeasured), we have no evidence that estrogen causes cancer. Alternatively, if there is an association between estrogen and diagnosis of cancer in this study, then there is evidence that estrogen causes cancer.

This short story illustrates how DAGs can be used to draw assumptions, understand a problem, and explore different solutions using study design and statistical methods. These are all great features that should be included in study protocols, statistical analyses and other documents intended to communicate the underlying aims and scientific understanding of a proposed or published study.

Are you interested in conducting a causal analysis?

Veramed is a CRO specialised in statistics and programming. Our Evidence and Value Generation team has extensive experience in designing and conducting Real World Evidence studies and is happy to help support your causal analysis, including DAG design, consultations, and writing of protocols, SAPs and analytics.

Contact us


Hernán MA. Edx course “Causal Diagrams: Draw Your Assumptions before Your Conclusions”

Horowitz RI, Feinstein AR. Alternative analytic methods for case-control studies of estrogens and endometrial cancer. N Engl J Med 1978; 299: 1089–1094.

Hutchison GB, Rothman KJ. Correcting a bias? N Engl J Med 1978; 299: 1129–1130.

Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology 2001; 11: 313-320.