The difference between correlation and causation

Understanding the difference between correlation and causation – of shark attacks and ice cream sales


By Candela Iglesias and Joanne Fielding

The graph above shows that there are more shark attacks when more ice cream is sold, so to stop the attacks, let’s stop eating ice cream.

Sounds preposterous? It is. It is also a very useful example of when correlation is not causation.

With so much #research and information being spread about #COVID19, we at Alanda Health thought it would be important to come back to the difference between #correlation and #causation. Incorrectly interpreting a correlation as a causal relationship is a common source of confusion and data misinterpretations.

As in the shark and ice-cream example, we humans naturally tend to interpret correlation as causation. That is, we tend to think that when two variables (for example ice-cream sales and shark attacks) change in relationship to each other (e.g. shark attacks increase when ice-cream sales increase), it is because one is causing the other (ice-cream eating is somehow causing the shark attacks).

Correlation is about how strongly a pair of values are related and how they change together over time (e.g. when one increases, the other also increases, or viceversa). But correlation doesn’t tell you anything about the WHY or HOW of the relationship. It just expresses that a relationship exists. It could even be due to pure chance, and in many cases it is. (If you want to see some funny spurious (e.g. due to chance) correlations, check out this website.)

Causation takes an extra step in analysing the relationship and says that any change of one value will CAUSE a change in the value of the other (for example, a higher number of bathers will result in increased shark attacks). This means one value directly makes the other happen.

To prove a causal relationship, we need very well designed studies (such as randomized control trials or RCTs), and we need to check for the Bradford-Hill criteria (for example, is it plausible that one variable causes the other, is there a biological gradient, are the results reproducible, etc).

In the shark and ice-cream sales example, we are seeing a correlation, not a causal relationship (e.g. increase in ice-cream sales is associated with, but DOES NOT CAUSE increased shark attacks). It is possible that both increase at the same time because of a third variable, namely, increased number of bathers on the beaches due to summer weather.

So next time you see an article about #COVID19 out there and some condition or drug that seems to be associated with it, pause to think about whether there is enough data to prove causality or whether it is just shark attacks and ice-cream sales.

If you liked this post, you will also enjoy…

Leave a Comment

Your email address will not be published. Required fields are marked *