A common folk saying has it that “numbers cannot lie”.
However, accepting that very saying helps those who want to mislead us: they make the lies they tell seemingly more convincing by using numbers.
Another common saying is, “lies, damned lies, and statistics”; which recognizes how numbers can be used in statistical contexts to misinform and mislead. A number of books describe the many ways in which statistical data can be — and often is — presented to mislead deliberately [1].
Even without the use of numbers, a misleading impression can be given by just saying that a certain result is “statistically significant”. That is intended to be convincing, and probably is to many people; but in reality the criterion for “statistical significance” is purely arbitrary; and the most commonly used one, p<0.05 , is wrong at least 5% of the time.
Another common mis-leader in reports of polls and surveys is to provide “margins of error”. That signals that differences larger than such a margin are definitely real; but margins of error are just as arbitrary as are criteria of significance.
The most important thing to know and remember about statistical data and statistical analysis is that they can never deliver certainty. No matter how “statistically significant”, or “outside the margin of error”, statistics can deliver no more than an estimate, an informed guess, of a probability.
Nowadays the warnings about lying statistics ought to be extended to computers, whose outputs are no less untrustworthy than are the individuals who write the programs and devise the algorithms and put in the data.
But even perfectly simple numbers can be used very effectively to give entirely misleading impressions.
The pharmaceutical industry does this all the time. For example, it can boast mighty impressively that a new medication or treatment has halved the mortality rate; but perhaps the mortality rate was decreased from two in 10,000 to one in 10,000. That would be far less impressive, especially bearing in mind that unwanted “side” effects typically occur at rates of several percent, hundreds in 10,000. It is not lying to say that mortality rate was halved, but it would be entirely misleading; and as Paul Halmos pointed out, lying is quite permissible [we call them “white lies”] provided one doesn’t mislead [2].
Spurious accuracy
Everyone ought to have been taught, in the most rudimentary math classes, the meaning of “significant” figures. One should not extend numbers to decimal points beyond the presumable accuracy of the data. But innumerable examples from “the real world” illustrate how that principle is disregarded and disobeyed routinely, presumably because several numbers behind the decimal point might seem to emphasize the “scientific” accuracy and certainty of what is being presented.
A rather recent egregious example is the survey of global happiness disseminated by the Gallup organization [3]. The happiness of people aged more than 60 in different countries is listed like this:
1. Denmark (7.916)
2. Finland (7.912)
3. Norway (7.660)
4. Sweden (7.588)
5. Iceland (7.585)
6. New Zealand (7.390)
7. Netherlands (7.360
8. Canada (7.343)
9. Australia (7.304)
10. United States (7.258)
11. United Arab Emirates (7.248)
It is entirely possible, indeed it is very likely, that various mass and social media will cite only the rankings without indicating what numbers were actually supposed to differentiate among the countries; yet it is entirely absurd to use three decimal places. It calls for very little consideration of how “ happiness” could be defined, let alone measured, to recognize that numbers with even one decimal place should be open to considerable skepticism and swallowed — if indeed at all — only with large helpings of salt. That people are more contented in Scandinavian countries than in many other places is perfectly believable and accords with many anecdotes; but whether they are really significantly happier than New Zealanders or Australians is harder to believe, no matter that 7.390 and 7.304 are less than Iceland’s 7.585.
Unfortunately, those who should know better are also guilty of trying to impress by citing quite unrealistic numbers:
“We just experienced the hottest February on record, with the global average temperature rising 1.77oC above the pre-industrial average for the month, according to the European Union’s Copernicus Climate Change Service (C3S)” [4].
That number can only have originated with climate scientists themselves. One can speculate about the extent to which those scientists take the implied precision seriously, but quite ordinary common sense understands that the actual truth is unlikely to be more precise than “February's average global temperature seems to have been possibly as much as a couple of degrees above pre-industrial conditions”.
In reality, there are excellent grounds for disbelieving the whole claim that global warming and climate change are being caused by human activities, in particular emission of carbon dioxide and other “greenhouse” gases. For details of the grounds for disbelief, see my summary of temperatures and carbon-dioxide amounts in [5] but most of all read the authoritative book-length discussion by Steven Koonin [6].
There is a very generally useful line of thought to invoke when presented with something that seems hard to believe, even when the claim is being made by supposedly authoritative technical experts about something technically complicated: just ask, “ How could that ever be known?”.
Applied to global warming and climate change, one would first want to know how a meaningful global average temperature can even be calculated. After all, temperatures typically change by the hour and minute everywhere; and with the seasons, which are simultaneously different in different areas of the world: How does one average that out? Moreover, the temperature at any given moment at any given spot on Earth is different at different heights above the water or ground: How does one average that out?
Simply asking those questions makes plain that any number cited for an “average global temperature” has been obtained by making a whole host of assumptions and approximations.
It should also be obvious that those assumptions and approximations can hardly be applicable in the same way — if indeed at all —for every time period since pre-industrial times.
A similar train of thought is applicable whenever the topic involves a large number of variables, which is often the case in medical matters or over socio-political issues. Try to imagine, for example, how to control all other variables in order to find out how one particular substance or procedure concerning human diet or exercise brings good, bad, or insignificant consequences. That little thought experiment makes it easy to understand why we have been sometimes told that coffee is good for us, but at other times that it is bad; that eggs can be dangerous, or that the benefit is greater than the risk; and so on.
More generally, this is one reason why the poohbahs and the chattering classes talk about a “crisis of reproducibility”: supposedly “statistically significant” results very often cannot be replicated. That’s simply because statistics never provides a certain answer, as well as the many common deficiencies produced by incompetent statistical applications [7, 8].
************************************************************************************************************
[1] J. Best, Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists, University of California Press, 2001; More Damned Lies and Statistics: How Numbers Confuse Public Issues, 2004;
D. Huff, How to Lie with Statistics, W. W. Norton, 1954.
[2] Henry H. Bauer, To Rise Above Principle: The Memoirs of an Unreconstructed Dean, University of Illinois Press 1988 (under the pen-name ‘Josef Martin’); 2nd ed. with added material, Wipf & Stock, 2012; p. 168, citing Paul R. Halmos, I Want to Be a Mathematician, (Springer, 1985) pp. 113-14.
[3] Helliwell, J. F., Layard, R., Sachs, J. D., De Neve, J.-E., Aknin, L. B., & Wang, S. (Eds.), World Happiness Report (University of Oxford: Wellbeing Research Centre, 2024).
[4] New Scientist, 16 March 2024, p. 10; on-line 7 March, by James Dinneen; https://www.newscientist.com/article/2421106-the-world-just-experienced-the-hottest-february-on-record
[5] Dogmatism in Science and Medicine: How dominant theories monopolize research and stifle the search for truth, McFarland, 2012; 18-26.
[6] Steven E. Koonin, Unsettled: What Climate Science Tells Us, What It Doesn’t, and Why It Matters, BenBella Books, 2021; for my review of it see https://www.scientificexploration.org/docs/35/jse_35_4_Bauer_on_Koonin.pdf.
[7] Stuart Ritchie, Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth, Metropolitan Books, 2020.
[8] John P. A. Ioannidis, “Why Most Published Research Findings Are False”, PLoS Med. 2 (2005) e124; doi: 10.1371/journal.pmed.0020124.
Exactly, so to speak
Numbers never lie in the context of diseases as well, sugar levels, viral load etc