Thursday, May 4, 2017

Stop saying your results are insignificant!

A short complaint today.  I am reading student papers in which they are reporting on a psychology experiment the students designed and performed in groups.  This assignment is the final project in a senior capstone course, and draws on many skills the students have learned over their 4 years as Psychology majors.

One component of the report is to prepare an APA-style Results section.  Students have to deal with their own data, figure out how best to display summaries of the data in tables and graphs, and run appropriate statistical tests.  Since these are experimental studies, the appropriate statistical tests are usually t-tests or analyses of variance.

It is not uncommon for student projects like this to fail to find differences between groups or conditions in an experiment.  Even if the underlying hypothesis has merit, the experiment may be inadequate to test the hypothesis due to the limitations of student projects - not enough time to research the best dependent variables, not enough resources to run enough participants, not enough control to limit nuisance variables that obscure main effects.

From my perspective, that's understandable, and it provides the students plenty to speculate on in their Discussion sections.

What's frustrating, though, is that students don't know how to report the lack of significant effects in their Results section.  All too often I see vague and misleading statements such as "The results were not significant" or "The data were insignificant".

Let's take a specific example.  Imagine an experiment in which 8-year old and 10-year old children are given a logic puzzle, with the hypothesis that 10-year old children will have developed sufficient cognitive skills to perform the puzzle faster than 8-year old children.  The independent variable is Age with two levels (8, 10).  The dependent variable is time to complete the puzzle in seconds.  Twelve children of each age are tested.

The results indicate that 8-year old children solve the puzzle in an average of 67 seconds.  The 10-year old children solve the puzzle in an average of 53 seconds.  Despite the difference in means, there is quite a bit of variability in each group, and the t-test provides a p-value of 0.09, higher than the usual threshold of p < 0.05; not significant.

But what is it that is not significant?  Can we say "the data are not significant"?  That phrase implies that the data are meaningless, but the data aren't meaningless.  We now know it takes about a minute for kids to complete this particular logic puzzle.  The t-test doesn't test the significance of the data; if it did, we should throw up our hands and say "I have no idea how long it takes kids to complete this puzzle.  Maybe an hour, maybe a day, maybe they take one look at it and know the answer immediately."

Can we say "the results are not significant"?  If "results" just means "data", then my explanation above indicates this phrasing is wrong.  But if by results we mean "8-year old and 10-year old children solve this puzzle in about the same amount of time" that result may indeed be significant in the sense that we didn't expect it to be that way, so we've learned something new.  Maybe we've learned that 8-year old children have surprising cognitive ability.  Maybe we've learned that 10-year old children haven't developed substantially in 2 years on this particular puzzle.  Maybe we've learned this puzzle isn't a sensitive index of the cognitive skills we're interested in.  Any one of these conclusions might indeed be important, interesting, worthwhile.  If you say the result is insignificant, that's like saying the whole stupid experiment was a waste of time and nothing new was learned.

So what do we say then?  To describe the t-test result correctly, you need to know what a t-test does.  The t-statistic compares the difference in two means relative to a measure of within-group variability.  If the absolute value of the t-statistic is large, the p-value will be small, and the conclusion will be "statistically significant".  What is statistically significant?  The difference between the two means being compared.  Not the data.  Not the results.  The difference between the means.  Likewise, if the absolute value of the t-statistic is small, the p-value will be large, and the conclusion is "not statistically significant" (a better phrase than "insignificant").  What is not significant?  Not the data.  Not the results.  The difference between the means.

And so the correct way to write this is:  "We found no difference between 8-year old and 10-year old children in solving times on this puzzle (t(22) = 1.78, p = 0.09, n.s.)."

If the design had instead included 3 groups, let's say 8-, 10-, and 12-year old children, and the means were as follows:  67 seconds, 53 seconds, and 58 seconds (12-year old children fell in the middle, unexpectedly), we would instead have to run an analysis of variance.  If our p-value came out to be p = 0.14, we would say this is "not significant."  But again, what isn't significant?

An ANOVA relies on an F-ratio, which is a comparison of the variability between groups relative to the variability within groups.  If the F-ratio is large, the p-value will be small, and we will conclude that the independent variable "affects" the dependent variable.  If the F-ratio is small, the p-value will be large, and we will not be able to conclude that the independent variable affects the dependent variable.  Remembering that the dependent variable is puzzle solving time, and the independent variable is Age with 3 levels (8, 10, 12), we can correctly summarize this as:

"There was no significant main effect of age on solving time (F(2,33) = 2.07, p = 0.14, n.s.)."

A less specific description ("the results were insignificant") is just plain wrong.