Transforming observational data into an actionable causal inference model

Peter Haenni

Nov 24, 2023 • 1 min read

Smoking and lung cancer

Let's consider three typical examples of causal inference scenarios, transforming observational data into an actionable causal inference model.

1. Smoking and Lung Cancer

Causal Diagram: Smoking => Lung Cancer

Observation: P(LungCancer∣Smoking)

This notation represents the probability distribution of lung cancer when smoking is observed without any external intervention.

Intervention: P(LungCancer∣do(Smoking=value))

This notation represents the counterfactual probability distribution of lung cancer if we were to actively intervene and set the smoking variable to a specific value as e.g. make someone smoke or not smoke.

2. Education and Income

Causal Diagram: Education => Income

Observation: P(Income∣Education)

This notation represents the probability distribution of income when education is observed without any external intervention.

Intervention: P(Income∣do(Education=value))

This notation represents the counterfactual probability distribution of income if we were to actively intervene and set the education variable to a specific value as e.g. provide a certain level of education.

3. Exercise and Weight Loss

Causal Diagram: Exercise => Weight Loss

Observation: P(WeightLoss∣Exercise)

This notation represents the probability distribution of weight loss when exercise is observed without any external intervention.

Intervention: P(WeightLoss∣do(Exercise=value))

This notation represents the counterfactual probability distribution of weight loss if we were to actively intervene and set the exercise variable to a specific value as e.g. enforce or prevent exercise.

In each example, the notation for observation (P(Outcome∣Variable) represents the probability distribution of the outcome when the variable is observed without external intervention.

The notation for intervention (P(Outcome∣do(Variable=value)) represents the counterfactual probability distribution of the outcome under an active intervention setting the variable to a specific value.

The next level counterfactual question would for example be:

What if I had acted differently ?