War must not be left to the generals; and policy-making about public health must not be left to experts in a particular discipline, as a recent book about mismanagement of COVID [1] makes abundantly clear.
Statistics, however, must be left to the professionals who understand it properly. Statistical analysis is part-and-parcel of almost all research in medicine and in social science, because the numbers of factors or variables are too large tor observations or experiments to be capable of delivering yes-or-no answers.
Unfortunately, elementary courses in statistics are presented to many future doctors and social scientists, and those courses typically feature rather simple formulas for calculating something loosely called “statistical significance”. Moreover, many courses of study in the social sciences, and perhaps also in some medical programs, feature a few courses on statistical analysis that are all too frequently presented by people who are not themselves professional statisticians.
Quite often, then, researchers in social science or in medicine imagine that they can perform the statistical analysis of their results by simple calculations using standard formulas. That is one of the reasons, perhaps even the most important reason, why concern is often expressed about a lack of reproducibility in social science and in medical research.
John Ioannidis, who is a professional biostatistician, has pointed out that a very great proportion of research conclusions are simply wrong [2], Including those reported in some of the most highly cited articles [3], for a variety of reasons that could all, or nearly all, be ascribed to incompetent gathering and analysis of statistical data.
During my time as a Dean, in a College of Arts & Sciences that included Departments of Social Sciences and also Statistics, I often received complaints from the statisticians that researchers in the social sciences would frequently consult them for help in interpreting their data instead of consulting them first for help in designing their research protocols: it is not really possible to infer the significance of statistical results with any confidence without knowing precisely many of the details of the research protocols, including details whose importance might not be obvious to non-statisticians.
Standard courses in elementary statistics typically describe a form of statistical analysis called frequentist or Fisherian, the latter stemming from the groundbreaking original ideas and research carried out by R. A. Fisher; “frequentist” means that the underlying concept of probability relies on something like counting how often a tossed coin lands “Heads” or “tails”.
Nowadays, the typical calculations carried out by frequentist or Fisherian methodology compare experimental results or observations to the so-called “null hypothesis”, namely, that the results were obtained purely by chance.
During much of Fisher's work, he was analyzing data collected over many years at an agricultural research station. In circumstances like research in agriculture, it is possible to be rather certain of the factors involved: they certainly include genetics — the types of seeds used, for example — and amounts of water, sunshine or other illumination, and other nutrients, all of which can be controlled in greenhouses or other special environments. It is highly unlikely that there are other factors of importance, say, phases of the Moon. By changing only one factor at a time, it is then reasonable to presume that differences in results are caused by the single factor that was changed. Therefore, if the calculation makes the null hypothesis sufficiently unlikely, one can reasonably conclude that the changes were caused by the factor that had been deliberately altered.
But such a conclusion Is not warranted when the number of factors or variables is large or if it is not known whether there are further, yet unknown, influences: and that is certainly the case when dealing with human beings, be it their physiology or their general state of health or their individual psychology or their social or political behavior. In those circumstances, that a particular set of statistical data seem unlikely to have come about purely by random chance says nothing about the reason(s) for that. Yet medical or social-science researchers typically conclude that a “non-chance” result supports the hypothesis that interested them and that they thought they were testing.
That is unwarranted. It is usually possible to think up other possible reasons. It is this flawed use of frequentist statistics that does much to explain why results are so often not repeatable. That is one of the reasons why many statisticians regard the Bayesian approach [4, p. 147 ff.] as superior to the commonly used (frequentist or Fisherian) one
But of at least equal importance is the initial design of the experiments and the need for professional statistical insights about such things as randomness, independence of variables, and subtle biases.
Unsophisticated thinking about or application of “statistics” can mislead in a large number of ways, many of them described at book-length by various authors [5].
For example [4, ch. 8]:
· Giving relative rather than absolute numbers. Thus drugs halving a death-rate is much more impressive than reducing the risk by 1 in 10,000 (say), from 2 in 10,000 down to 1 in 10,000
· Correlation is not causation. For example, the risk of dying correlates with high blood pressure, because mortality and BP both increase with age. The needed comparison seeking possible causes needs comparing at a given age. High “correlation coefficients” only show that two things increase or decrease together.
· Statistics need dis-aggregation. For example, comparing overall wealth of two social groups can be greatly misleading if such factors as race or education (and many more) are not taken into account. Correlation of mortality and BP (above) must be dis-aggregated by taking age into account.
· Effect size must always be considered, because “statistical significance” (p<0.05, or any other number) can always be reached if the sample is made large enough.
· Sampling must avoid quite a number of potential pitfalls.
If a sample is to yield results that are valid for the whole population, the sample must be properly drawn; ideally by including the same proportion of every possible group, or more usually by drawing a sample randomly (and professional statisticians know that true randomness is not easily achieved).
One illustration has remained in my memory it because I heard it from someone who had participated in the management of a number of clinical trials.
The initial test of most new drugs is of their safety, and so a control group of healthy individuals is needed when looking for side effects. Participating in such a control group is quite attractive for homeless people, as they get a stay in comfortable surroundings with ample food provided. Companies that carry out clinical trials for pharmaceutical companies become aware, through experience, that a number of such individuals are unusually healthy, so those become increasingly used in such control groups (making them obviously not a random selection) — thereby decreasing the likelihood of detecting possible side-effects.
Another such wrinkle, when comparing two drugs, is to use maximally high doses of the drug to be superseded but minimal doses of the drug the company would like to appear as preferable.
Those are just a few of the reasons why professional statisticians (or biostatisticians) should always be involved already in the design of research protocols whenever results will call for the interpretation of statistical data.
************************************************************************************************************
[1] Stephen Macedo & Frances Lee, In Covid's Wake — How Our Politics Failed Us, Princeton University Press, 1925
[2] John P. A. Ioannidis, “Why most published research findings are false”, PLoS Medicine, 2 (2005) e124, 0696-0701
[3] J. P. A. Ioannidis, “Contradicted and initially stronger effects in highly cited clinical research”, JAMA, 294 (2005) 218–28
[4] Henry H. Bauer, Science Is Not What You Think: How It Has Changed, Why We Can’t Trust It, How It Can Be Fixed, McFarland, 2017
[5] For example,
J. Best, Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists, University of California Press, 2001; More Damned Lies and Statistics: How Numbers Confuse Public Issues, 2004
D. Huff, How to Lie with Statistics, W. W. Norton,1954
It is precisely because statisticians will help design studies that yield significant, accurate data that they are omitted from the process. Corporate scientists want data that will support the use of the products they are developing. Data that shows otherwise are most unwelcome.
"There are three kinds of lies: lies, damned lies, and statistics."