Statistics: Are They Significant?

Yo! How are we doing stats nerds? Good? Great…

                              SIGNIFICANCE

                                                                POWER

Wildcard week is a bit of a bitch as I’m actually forced to think of something interesting to blog about, but I persevere! Whilst randomly searching things like ‘statistics, statistics = stupid, p = I hate liiiife’ into google I came across an idea I’d never considered before. What if I don’t need statistics? Hear me out, you may go through the same existential, life-evaluating crisis I did but please try to hold on to your sanity. Since I started studying psychology, statistics have been everywhere and like me you may have thought that science = stats. I’m joking, of course you wouldn’t! You’re clever, critical and beautiful whilst I am ugly and reduce everything to its bare bones to make things easy for myself. Incidentally, does anyone think that stats genuinely do equal science? Let me know. As far as I’m concerned, it’s what the statistics represent that make science so scientific; we can show through statistics that our results are unlikely to have happened by chance i.e. in the case of rejecting a null hypothesis, there is less than a 5% chance of the null hypothesis ACTUALLY being true and of a Type I error occuring (false positive). Lovely, science has been done! Our conclusions are based on rock-solid numbers and probabilities rather than guesstimates and intuition.

But, should statistical significance be significant? There has been a number of concerns raised over the limitations of this approach. A significant result allows us to say ‘these results are unlikely if the null hypothesis is true’, however, this leads us into the trap of inferring that there is less than a 5% chance of the null hypothesis being true. This is not the case as our statistics are based on the data, and do not test the null hypothesis. It’s sort of like saying:

Hypothesis: All psychology students are awesome

Stats: There is less than a 5% chance we would have found the results we did if psychology students weren’t actually awesome

THEREFORE

There is less than a 5% chance that psychology students aren’t awesome

Our tests allow us to make decisions about our data, but they do not allow us to do anything more than infer conclusions about the null hypothesis.

Perhaps the biggest problem is that a significant treatment effect doesn’t necessarily indicate a practical treatment effect. Would you want to use a drug that has been shown to have shown some effect but not a very large one? McClean and Ernest (1998) advocate the usefulness of effect size tests and stated that they could not find an article that argued against their usefulness, which suggests they have greater validity than the ‘controversial’ significance tests. However, Robinson and Levin (1997) have argued that statistical significance must always be tested for first before testing for effect size. As a psychology student, you may feel this is the way you have been taught to operate as well. Thompson (1998) considers whether you should not publish the findings of your study if your results show a moderate effect but are only significant to p < .06. And further goes on to suggest that this way of thinking is the fault of the publishing body (APA) and will discourage researchers from publishing effective studies if they are not significant enough.

Nakagawa (2004) contests that the rigorous testing of the significance of a result is leading to more Type II errors (false negatives). For example, he suggests that there is no consensus on when to use bonferroni tests (Perneger, 1998) and that due to their tendency to make significant results less significant, they add to a culture in which researchers are reluctant to report nonsignificant findings (Jennions & Moller, 2002). Sadly, we as humans are all too ready to beat ourselves up, when in fact a nonsignificant finding can tell us something useful. Most obviously, it can show us that something DOESN’T work, which can be useful in itself. Or, as Nakagawa goes on to explain, your findings could be found to be significant in the future.

Let me set the scene.

You’ve done an experiment, this experiment tests 10 variables (because you’re a clever bastard).

2 out of the 10 variables are found to be statistically significant.

Do you just publish your findings regarding the 2 variables or throw the whole darn kitchen sink at the publisher?

It may be tempting to only report your significant successes for fear of criticism.

But Nakagawa says that the full paper is of far greater use to the scientific community as your research adds to the pile and subject to a meta-analysis your work could help in discovering a significant effect, which is only apparent in the context of other research.

So where does this leave us? Is statistical significance important? Or are there just too many problems with the process? Is it a logical measure of probability? The fact that the debate has not been settled can only indicate that there are problems with it. Maybe effect size is a more useful indicator of efficacy than significance, and should be used exclusively?

The debate won’t be settled any time soon but based on current evidence we can only conclude that effect size and testing are at least as important as each other and it would be interesting to see whether they could survive without each other.

That was far too heavy for my liking, so to make up for it here’s a cat doing an impression of me when I was reading all this crap

Leave a comment