You’ve just seen a report that concludes what you ‘already knew’. Maybe it’s that having more diversity on a board increases profit, or that eating butter causes heart disease, or that first born children make better CEOs. You pass on, feeling suitably confirmed in your world view. But how reliable is that conclusion? Perhaps, like me, you are not a statistician, so you feel you can’t challenge the complex mathematics? How does a lay person check that a report is likely to reliable?
Here’s a few tips on what to look out for:
1. ‘Chinese Whispers’ (‘Telephone game’)
Are you reading the original research, a tweet, a Linkedin post written by someone who may have just read the headline or conclusion, or even a media report based on an article about a research study?
Tip: Look to see if the original research is sourced or linked to the article, and at least read the original research summary, rather than relying on someone else’s interpretation.
2. Correlation isn’t causation
Even if the research finds a ‘statistically significant’ correlation between A and B, that is not proof that A causes B. In fact, B might cause A, or C might cause both A and B. For example, supposedly sleeping with shoes on correlates with waking up with a headache. Does this mean that wearing shoes in bed is cause of headaches? Make sure that the research has explicitly considered causation (and not just assumed it) and has at least provided a hypothesis to explain it. You could choose any two dozen random variables at random, and probably you’d get at least one significant correlation. But this is meaningless unless you started out with an idea of why and how that correlation could exist.
Tip: If a study doesn’t provide explain how it thinks the causation works, treat it as an interesting statistical finding, not proof of anything.
3. How big is the sample?
The rule of thumb in opinion polls is that you need a minimum of about a thousand people to get a sample with robust results. There are a lot of studies that use much smaller samples. Newspapers are full of sensational research based on fewer than a hundred people. You need an extremely powerful effect to be clearly visible and significant in small sample sizes…and there aren’t many of those in new research.
Tip: Check the sample size and if it is less than a thousand, it’s unlikely to be really proving anything.
4. How big is the effect?
You might get a statistically significant correlation, but the effect (ie the correlation coefficient) is so small that it’s not really of any value. Imagine a study that concluded that there is a relationship between wearing metal zips in your clothing and being struck by lightning (with a 95% statistical significance). Would you discard your zips in a storm? You would need to know how much zips increase the chances of a lightening strike. If they increased your risk by only 5%, would you really worry? Then again, your chances of being hit by lightning are about 0.0003% in the UK. So even a 50% increase in your exposure gives you an incremental risk of 0.00015%. Even a high probability of a very rare event is still a very rare event.
Tip: Always check how big the effect is. A high certainty of a marginal effect is still just a marginal effect.
5. Is the sample random?
Statistical significance works only on random samples. If the report uses self-selected people (eg internet polls), or discards outlying results, or fails to check that it has not inadvertently picked up a biased sample, then you can’t draw any conclusion on statistical significance. For example, the FTSE-100 is not a random sample of listed companies, as it is only the larger, surviving ones.
Tip: Look at how the report selected its sample and whether it excluded any results.
6. Does the author really want to prove something?
Of course, every author of a report wants to conclude something positive. A major study that failed to show anything new would not get published in an academic journal nor reported in a newspaper. But the fact is that some authors really want to arrive at a particular conclusion, especially if funded by a lobby group or interested party.
Tip: Check who paid the piper before buying the tune. Exercise extra caution to conclusions from authors who appear to be very keen to come to a particular result. Be even more cautious if that conclusion fits with your own view!
7. Is the report p-hacking?
Every statistical report should start with a clear hypothesis about what relationship it wants to investigate and how that relationship works (ie causation). It should then try to disprove its own hypothesis. If it can’t disprove it, then it should conclude that it might be true. However, in practice, reports just try to ‘prove’ their hypothesis by finding a statistical relationship that is unlikely to be by chance. Some reports go much further: they set out multiple resulting variables that may indicate the relationship and see if any are ‘statistically significant’. They highlight the one that is and discard the others. However, if you use 10 results, there is a 50% chance that one will randomly appear significant at 95% (ie 1 in 20) probability.
Tip: If a report looks for multiple relationships and highlights only the ones that are ‘significant’ and ignores the rest, throw the report away.
8. How many statistics are there?
Don’t be frightened into submission by tables of numbers and long statistical terms. Generally, the more complex the statistics, the ‘dirtier’ the numbers (ie there is more noise and relationships are less clear). So the more maths, the less likely it is that there is a strong relationship that matters. However if what you read has no actual statistics quoted and doesn’t explain how it reached its conclusions, you can place no reliance on it either.
Tip: At least check that the report; only uses 95% or higher as a test for significance and lists out each relationship it tested and each coefficient.
Test of statistical significance and correlations are superb tools to further our understanding of relationships. However, they seem to be abused more and more, often as part of confirmation bias or satisfying paymasters. If you can’t satisfy yourself on these questions here, please don’t believe the reports, and please don’t promote them on social media!