Models are wrong, .... but some are useful (G. Box): Confidence intervals: am I uncousciously a Bayesian?

Confidence intervals: am I uncousciously a Bayiesian?

Last week I made a survey among the students attending my course in 'Experimental Methods in Agriculture'. One of the questions was:

QUESTION: “I sampled 100 seeds from a big population of Holy Clover (Onobrychis viciifoliae Scop) and found that their average weight was $\bar{Y}$ = 15.5 mg. The confidence interval for the population mean was 13-18. What is the meaning of such a statement?”.

The possible answers were:

There is 95% probability that the population mean is in the interval 13 to 18;
If we sample repeatedly from our population of Holy Clover, the estimated confidence intervals will contain the true mean in 95% of cases;
The true population mean is certainly between 13 and 18;
The true population mean can take any value between 13 and 18.

I asked my students to select the correct answer without looking at textbooks or class notes, just using their memory and intuition. This survey came after the first half of the course, approximately one month later than the lecture about point and interval estimation.

In the end, 75% of my students choose answer (1), while none of them choose answer (2). This came out quite as a shock to me: in a frequentist setting, the correct answer is clearly (2). Indeed, it should be intuitively clear that there is a 'true' (fixed) average weight $\mu$ for my seed population, but the problem is that I will never come to known it exactely, as the population is too big for me to measure its weight. Therefore, I am forced into taking a small sample and measuring its average weight. My intuition suggests that further samples will show different average weights, but the true $\mu$ will always be there, unchanged and unknown. Recalling the frequentist definition of probability (from Wikipedia: “the limit of the relative frequency of an event in a large number of trials”) it would seem pretty clear that it makes no sense to attach any sort of probability to the true value of $\mu$ , as this is not going to change at all, during my sampling! This is why answer (1) is nonsensical in a frequentist setting. Furthermore, the confidence interval (13 to 18) that I built from my sample may either contain $\mu$ or not, but I have no way to favour one of the two situations. And the extremes of the interval (13 and 18) are actually meaningless: when I repeat the sampling I'll very likely get different values and a different interval.

If the above reasoning is so clear, why does not answer (2) come out as a natural choice for students? Why are they intuitively embracing the Bayesian perspective of answer (1), even though I am pretty sure that they have never been exposed to Bayesian thinking at all (as all agriculture students, at least in Italy)? It is clear that I have not done a good job in conveying the correct message, during my lecture! To use the same words as Dennis (Discussion: Should Ecologists become Bayesians. Ecological applications, 6, 1095-1103), I was probably suggesting more than a frequentist confidence interval delivers.

I am not the only one in this position: I am sure than most of my collegues biologists see confidence intervals very much like it is described in answer (1). I do not think we are to blame. Indeed, I went through the books I used to study when I was a PhD student and found, for example, that Sokal and Rohlf (Biometry. W.H. Freeman and Company, 1981) at page 141, report this equation (7.4):

$P\{ \bar{Y} - 1.96 \sigma _{\bar{Y}} \leq \mu \leq \bar{Y} + 1.96 \sigma_{\bar{Y}} \}$

and comment:“Thus the probability, P, is 0.95 that the term $\bar{Y} - 1.96 \sigma _{\bar{Y}}$ is less then or equal to the parametric mean $\mu$ and that the term $\bar{Y} + 1.96 \sigma_{\bar{Y}}$ is greater than or equal to $\mu$ .” Similar statements can be seen in Snedecor and Cochran (Statistical methods. IOWA State University Press, 1991) and in almost all biometry books I have at hand.

The above equation is derived by using simple math and it is certainly correct. But I am wondering: does it make sense in a frequentist setting? I'll leave the answer to the statisticians. As a biologist, I have to admit that, like my students and my collegues, I feel somewhat disappointed by the crude meaning of frequentist confidence intervals. Though the correct answer is (2), I find myself wishing it were (1): that would really be satisfactory! This more or less unconscious feeling may have influenced my lecture about point and interval estimation.

In the end, considering the ironic claim of IJ Good (“People who do not know they area Bayesians are called non-Bayesians”; cited in Kery, 2010. Introduction to WinBugs for ecologists. Academic Press) I am asking myself: am I one of those who are Bayesians, but do not know, yet?

Models are wrong, .... but some are useful (G. Box)

Wednesday, 16 April 2014

Confidence intervals: am I uncousciously a Bayesian?

1 comment: