Daniel Lakens has just started a MOOC in Coursera to share his view on statistical inferences. I will keep you in suspense by not unveiling all the mystery of the class… But I wish I could have followed such a class in Bachelor because, to my view, slight improvements of your statistical and methodological skills can change drastically the way you produce inferences (even tough, ideally, we should understand not only conceptually but technically every tool we’re using to defend strong claims…).

The class starts with the so-called “precognition” capacity supposedly revealed in social psychology. The lack of replications of those studies set the problem of the way to generate hypothesis. We won’t go too far on that but the idea is that, because of statistical noise and problem linked to post-hoc analysis (see last post), it’s difficult to draw hypothesis based on statistical data only. You should always have a minimum of theory that back you up and if you don’t, the complete exploratory process should be mentioned as so and lead to theoretical analysis leading to new studies with ad-hoc hypothesis. You can base your hypothesis on the existing literature in your field but, in my opinion, you should also evaluate interdisciplinary approaches and different levels of organization to see how likely is your hypothesis (not matter how small are effect sizes in interdisciplinary studies).

To evaluate the quality of data then, we usually use different form of validity (internal, external, construct, ecological, etc.) but Abelson (1995) proposed the easy acronym MAGIC that stand for :

- Magnitude: in quantitative studies, this can stand by the effect sizes for instance. To evaluate the rhetorical impact of a research result, Abelson proposed the concept of “causal efficacy”= raw effect size/cause size. The idea is that a large effect from small variation in the cause (few differences between conditions) is the most impressive. In 1979, the Food and Drug Administration conducted in the U.S. a study investigating the effect of saccharin on bladder cancer. In animal study involving rats, they showed that, among the experimental rats (gulping down 7.5% of saccharin daily during 2 years), 7 of 23 had contracted bladder cancer for only 1 of 25 in the saccharin free rats group. The press and the industry mocked the study, explaining that the dose was equivalent of 800 cans of soda a day for a human and that anyone would die from cardiovascular complexity long before cancer… With such a small sample, despite an important effect size, the imprecision of the measure is important (i.e., the confidence interval is reallyyy large) and divided by an ever larger cause size (due to huge differences between the condition), the impact on the community was really limited.
- Articulation: how much comprehensible detail you can give from your data. if you have 3 means outcome A, B, C, can you come out with statistical differences between those means? (A>B&C for instance). In the study of menstrual cycle for example, giving the different hormonal profiles of the different phase (ovulation, menstrual, follicular, luteal), it’s really important to make such precise predictions cause the theory tells you there is underpinning biological factors to consider.
- Generality: To defend broad conclusions, you need to include a wide range of variation or, ideally, different studies as done in a meta-analysis. You need general effect, high-quality evidence, important sample size, precise variable manipulation (with ideally physiological control when possible).
- Interestingness: Abelson proposed that a statistical story is theoretically interesting when it changes people’s beliefs about an important issue, the importance being function of the number of propositions needing modification considering the new results.
- Credibility: “methodological soundness and theoretical coherence” is a key to the believability of a research claim according to Abelson. Is the experimental manipulation precise enough? Is it coherent from an interdisciplinary point of view? Is there a plurality of measures that can attest of the effects? Are the statistical analysis the correct ones to run? Whether you’re the critic or not, it’s important to ask yourself those questions.

Lakens considers there is at least 3 ways of drawing statistical inferences. You can use the p value (along with effect sizes), p < a leads to the rejection of H0, p > a lead to conserve H0. Or, you can use the likelihood ratio or even Bayesian if you have precise prior beliefs on the subject. Then, it reminds use that the p value is the probability of observing DATA assuming H0 is true (and not the probability of an hypothesis), the type I and II errors with good examples and tables.

Finally, he proposed a quick simulation about p-values on R, showing, for instrance, than when H0 is true, the p-value is uniformly distributed (getting back to our debate in our last article). All in all, it seems as a good introductory class and I look forward to see what’s next.

JB