I’ve been immersed again recently in Judea Pearl’s The Book of Why, as electrifying as it was when I last read it, and reading it inspired me to consider writing a Substack article covering it, that would function slightly more broadly as a history of the science of causation.
In some respects, this must seem like the most obviously Substackian use of a Substack imaginable, a complete festa of pandering — so many people around these parts speak using a vocabulary partially adapted from work by Pearl and his contemporaries, and are generally never happier than when talking of patently contingent concepts in terms of Bayesian probability.
Nevertheless, I should think a short history of causation would be useful for the following reasons:
Because an impartial survey of that history, from the 1834 Founding of the Royal Statistical Society of London (which affirmed data’s primacy over interpretation) to Sewall Wright’s seminal guinea pig trials to his 60-odd years of tribulation afterwards, through the heyday of structural equation modelling to our big data present, reveals two things:
One, the importance of grasping causation to understanding almost anything;
Two, how maniacally focused wider science has been on discrediting or misusing every proven technique for establishing causation, and thereafter for belittling the need for proper understanding of causation beyond data calculation.
Because science is generally considered to be a field in which progress is inevitable, where practitioners look for their mistakes and concede them, and where new epistemological techniques, however controversial, will inevitably be accepted if they are useful. The widespread misuse and disparagement of Wright’s proven techniques shows that this presumption is false.
Because we live in a time where there is more disinterest in causation perhaps even than there was in Wright’s heyday. Systematisers working in highly systematised fields believe that truth is “all in the data” and despise causational techniques because they disrupt the pretence of objectivity. In AI, this pathological belief in the power of data to contain the whole truth is epitomised by large parts of the field’s wholly unscientific belief that they can replicate intelligent behaviour without knowing, discovering, or mapping the causes of it. On the other side of the aisle, where the soft sciences and humanities sit, causation is looked on with fretful antipathy as prone to reveal things that defile the purity of the ideologies to which many people in these fields subscribe - the idea of causational methods revealing, for instance, the essential role played by free markets in reducing global poverty rates, or that the cause of event x is the z of group y, where ‘y’ is a protected identity, would be nightmarish to those who naturally dislike these ideas. Systematisers do not want to have to deal with noisy interpretative tools that will oblige them recognise that their preferred discrete thought processes will not get them to the truth. Identitarians do not want to allow the proliferation of any truth that injures their vanity. So it is that two very powerful interests, utterly dominant of most of our major institutions and knowledge fields when taken together, distrust the complications of causal methods, methods which do not have many powerful friends elsewhere either.
Because I hate that phrase “Correlation is not causation”, when, as the great Sewall Wright would have corrected you, “… but some correlation is causation”.
I am writing this short history of causation because, in short, nothing is ever so useful as a reintroduction to that with which we are most familiar. When disaffected long-term spouses get reintroduced to one another, they fall in love all over again. So do I intend to make you fall in love with the science of causation here.
“Lucky is He…”
Given that trying to grasp the cause of things is an immutable human trait that has presumably been driving human thought since the first time pithecanthropus erectus found himself pondering what made the sun rise in the morning and set at night, it seems odd conceptually to even think there could be a beginning to studies of causation. And yet, causation is actually a surprisingly young science.
Pearl cites mentions of causation going back to Virgil (“Lucky is he who has been able to understand the causes of things”) in 29 BC, but otherwise gives credit for the foundation of causation to the inventors of modern statistics, Francis Galton and Karl Pearson. However, his accreditation of them is necessarily complex; because, as Pearl points out, while Galton and Pearson determined that scientific questions can be answered using population data, they also did an enormous amount to discredit the importance of causation and expunge it from statistics altogether, doing huge damage to the wider knowledge estate in the process.
In Pearl’s words, “Francis Galton started out to find causation and ended up discovering correlation.” A gentleman scientist, explorer of Africa, and the inventor of fingerprinting, who in his spare time enjoyed being a cousin of Charles Darwin, in 1877 Galton presented a paper called “Typical Laws of Heredity” at the Royal Institution of Great Britain. With him, he brought the plinko machine from the Price is Right.
Into it he released a cache of little metal balls. He showed that if you released enough balls into this ‘Galton board’ (or, to give it its unfathomably rude-sounding proper title, the ‘quincunx’), you would find the balls distributing themselves into a bell curve. This was an illustration of Laplace’s early 19th century theorem that the sum of all probabilities of a random process as it is repeated will lead to a probability distribution that resembles a bell curve. Galton, armed with data on the height of French military recruits, suggested that bell curve distribution held for the heredity principle as it did for little balls in a quincunx.
Armed next with a more voluminous sheaf of data charting the respective pedigrees of eminent English families, however, Galton showed something curious. Piling high-end inputs together (like very tall people having children, or very intelligent people having children) didn’t result in an endless series of generational optimisations, but rather prompted a reversion to the mean — people get a little bit taller over time, but doesn’t result in the production of 9ft tall humans, for instance, as the principle suggests it should. Likewise, if IQ had been a thing back then, Galton would probably have endeavoured to demonstrate that very high-IQ parents don’t necessarily give rise to ever-more high-IQ kids, but rather that those kids regress to the mean.
Galton thought he had discovered a causal principle, a rule of physics — a regression mechanism used by nature to keep quantities like intelligence and height constant from one generation to the next, a counterpart to Hooke’s law (which is derived from observations of the way springs, when stretched, return to equilibrium length). What he had in fact done was lay the foundations of the scientific study of correlation. By 1889, Galton had observed a number of statistical patterns that seemed intertwined — like the fact that very tall men also had longer-than-average forearms — but clearly were not the cause of one another. He noted that these factors were merely ‘co-related’. So raised the first cry of the infant statistics, and so was consecrated too its divorce from causation.
Observing the way in which regression to the mean manifested in a population, Galton brought to us the spectacle of the regression line, which you can see above (look for the line ‘OM’). In a population study, such as the heights of fathers measured against the heights of sons, the regression line showed that the sons of tall fathers were not taller on average, nor were the sons of short fathers shorter on average, but accorded to a basically constant distribution of heights, explaining why we do not after successive generations split into a society of 9ft’ers and 2ft’ers. Crucially, the law of regression still holds when you make the comparative variables x and y non-uniform — you see it, for instance, if you compare height against IQ. These two factors appear correlated even though the two factors cannot possibly be said to ‘cause’ each other. The constancy of the law of regression is the motor behind that ubiquitous phrase:
Correlation is not causation
This sentiment would become troublesome during the reign of Galton’s disciple, Karl Pearson, who would take the regression line and develop from it a correlation coefficient, still used to this day to determine how strongly two variables in a data set are related. Why troublesome? Because, through Pearson’s dogmatism, the precisely scientific coefficient would be used to disparage the utility of measuring messy, inelegant causation altogether.
Pearson saw causation as nothing more than “a special case of correlation” wherein the correlation coefficient is 1 or -1, wherein two variables x and y have a deterministic relationship — redundant scientifically, because you’d only find it when, for instance, both x and y measure the same value. He held causation up like a true disciple of David Hume. Just as Hume asserted that no thing seen to cause another thing could really be called a ‘cause’ pending the unreliability of our observing senses and the incompleteness of our knowledge, Pearson suggested that 1-to-1 repetition alone could account for causation and no such perfect repetition could ever really be observed scientifically. “Force as a cause of motion,” he wrote, “is exactly on the same footing as a tree-god as a cause of growth.” A positivist, who believed that the world was a mere manifestation of thought (and science a means of observation of that thought), Pearson found man’s powers of observation too limited for causation to be captable by them. He held that patterns of observation are all that can be relied upon, which can be completely contained in data and described by correlation. Thus, as Pearl observed, did Pearson give rise to the ethos that “data is all you need.”
Pearson didn’t do this out of low-mindedness. He did it because he believed that by intensifying means of drawing correlation and limiting the agnosticism of causation he could lay the groundwork for mathematising unfathomably complex sciences like biology and psychology. But he was a terror of a personality, totally without scientific disinterest. As his biographer wrote, “Pearson’s statistical movement had aspects of a schismatic sect. He demanded the loyalty and commitment of his associates and drove dissenters from the church biometric.”
This dogmatism led to Pearson discovering certain principles of causation by accident, which he then dismissed with contempt. For example, a ‘spurious correlation’ observed by Pearson and his assistant Udny Yule led to the discovery of the ‘confounder’, a crucial element in causation — for instance, the factors of ‘wealth’ and ‘location’ are likely to confound the fun (if highly spurious) causal contention that there is a link between chocolate-consumption-per-capital and Nobel-awards-per-capita, because they (wealth and location) explain the end result (winning Nobels) better than the posited factor (chocolate-eating).
Similarly, Pearson almost discovered the vital importance of aggregated data in illuminating causal principles, but poo-poo’d this as well. He computed skull length and -breadth, finding negligible correlations when male skulls and female skulls were not analysed together, but finding a statistically significant correlation (of coefficient 0.197) when the populations were mixed, showing that smaller skull length predicts being female and thus that breadth will be accordingly smaller. Pearson recognised that these results showed a critical cause-and-effect relationship; and, because he despised anything suggestive of cause and effect, he rubbished the results and told his students to never aggregate data. Some listened; others, including Yule, did not. Pearl cites this failure of imagination as a key moment in the developing history of causation.
Hello Vicar
While Galton and Pearson may be the most important early figures in causation’s history, there are more figures of great importance in its pre-history, one of whom is better known these days as an adjective than as a man. Thomas Bayes, the man after whom Judea Pearl named Bayesian networks when the latter invented them in 1985, was not so much Mr. Bayes as he was Reverend Bayes. Are you surprised that the namesake of Bayesian probability was a man of God? I was. And then I reflect on the Scottish Enlightenment and its inherent ties to Presbyterian values and I am ashamed of my surprise.
A dissenter from the Church of England who was the first in a long line of patron rebels of the science of causation (see next section), Bayes was interested in the causal relationship between two notions — an hypothesis, and the evidence associated to it. He learned maths in the University of Scotland and started pitting it against theology. Proving again the value of David Hume as the motivating antichrist of all causal science, Bayes wrote a paper disputing Hume’s assertion that eyewitness testimony could not by itself prove the fact or the nature of a holy miracle. Hume maintained that natural law (like ‘dead men stay dead’) could not be overturned by fallible subjective assertions like ‘…but I saw Jesus rise again.’
Bayes absorbed this sentiment and wondered where the boundary lay — how much evidence would it take for us to be convinced that the deeply improbable had actually happened? The matter of the paper in which he considered this, An essay towards solving a problem in the doctrine of chances, established a framework for deducing cause from effect that still undergirds the field today. Where forward probability — estimating the chance of an effect given a cause — is easy, Bayes focused on inverse probability. If we know the length L of a billiards table, we can estimate the probability of a ball stopping within x feet of its end as x/L. However, if we know the final position of the ball is x = 1, but are not given the length of the table, we find it much harder to assess mentally the length of the table. That is because we are in a relative information deficit relative to cause when we know only the effect, rather than when information about potential cause is available to us.
For dealing with inverse probability, Bayes developed the theorem named after him.
It tells us that if we know the probability of A (say, that a person buys an apple from the shop) given B (say, that a person buys a banana from the shop), we should be able to figure out the probability of B given A. Maths allows us to derive conditional probability in non-obvious directions, and to update our estimation of an hypothesis’ probable rightness given new information. We may not begin with any conception of the probabilistic relation between apple purchase and banana purchase, but if we start to see that almost every person who buys two apples also buys a banana, even though those who buy but a single apple don’t, we can update our assessment of probable behaviours.
Thus was born the Bayesian prior, a technique an awful lot of very ostensibly bright people nowadays use to provide chassis to their obviously insupportable ideas.
Bayes’ rule would in the later history of causation give rise to Bayesian networks. They were developed in response to the limitations, very clear in the 1980s (if they are not indeed clearer now), enjoyed by computers when it came to making effective inferences from partial or contingent knowledge. This issue with inferential reasoning was a considerable bugbear for early AI enthusiasts, wherein computers could not deal with both diagnostic and predictive reasoning tasks. Probability modelling was distrusted as a potential solution to the inference problem because of the presumption that it could only be done by suffering huge memory costs incurred by putting everything in massive tables. Then, along came Judea Pearl, carrying in his one hand the weighty and profound overarching premise that “any artificial intelligence would have to model itself on what we know about human neural information processing”, and carrying in his other hand the smaller and intensely applicable daughter-concept of creating a hierarchy of conditional probabilities linked by likelihood ratios to help machines reason probabilistically.
This parent-and-child-node structure allowed for the sequential tracking of a variable whereby each node updated its neighbours about its belief in the validity of the variable. This application of Bayes’ rule, whereby a parent passing a message to a child causes belief updating by conditional probabilities and a child passing a message to a parent triggers belief updating by likelihood ratio formation, is known as belief propagation. It became one of the keystones of machine learning, and the anchor of real-world applications like spam filtering and voice recognition.
It also proved an excellent conceptual tool for determining causal effects from data, making it easier to model key causal notions like:
Confounders — factors that statistically correlate factors without a causal link, in the way that ‘smoke’ confounds ‘fire’ and ‘alarm’ even though a fire can’t directly cause an alarm to go off, and…
Colliders — elements (let’s use ‘athletic success’ as our example) that rely on two or more causes (let’s say ‘genetic aptitude’ and ‘training’) that are otherwise unrelated to each other
Its real-world applications have continued to multiply and make lots of people lots of money, from forensic experts using Bonaparte to identify criminal suspects and human remains, to major consumer giants using Bayesian predictive analytics to help serve you yet another shit, time-munching TV series that’s sufficiently tailored to your taste in shit TV to keep your subscription active.
Of course, while there is generally what I would consider a deficit in proper honouring of the importance of causality, and a corresponding lack of subtle understanding of methods of establishing causality, one could also allege that the intuitive and model-friendly nature of Bayesianism has resulted in an over-investiture of faith in its abilities. Notwithstanding its limitations in computing very small probabilities and the practical limitations on distribution types, the computational limits of the number of variables as can be parsed through Bayesian networks encourages one of the common foibles of systematic intelligence, which is to proactively exclude extra-systemic elements where possible to protect the integrity of the system.
This can be seen in the landscape of predictive-analytics-driven subscription TV programming. Bayesian networks allow for refined prediction within very narrow bounds; this makes user preference easy to predict so long as the parameters that shape the prediction are kept limited enough to maintain coherence in the network. But it also means that, given the network’s inability to make bold new associations, you will progressively watch a narrower and narrower diet of shows that are more similar to each other so long as you accord to the network’s suggestions. And because the infinite bookshelf is extremely hard and unrewarding to browse without a Bayesian search function, this has meant that user taste profiles and preferences have grown less and less diverse and interesting the more predictive data service providers hold.
‘Phoenix’ Wright
The most interesting and probably the most decisive figure in the history of the science of causation is a man named Sewall Wright. A flavourful excerpt from Pearl to get us started:
My admiration for Wright’s precision is second only to my admiration for his courage and determination. Imagine the situation in 1921. A self-taught mathematician faces the hegemony of the statistical establishment alone. They tell him “Your method is based on a complete misapprehension of the nature of causality in the scientific sense.” And he retorts, “Not so! My method is important and goes beyond anything you can generate.”
Indeed, Wright — a graduate from a relatively unfancied American university whose primary calling was genetics — was a kind of epistemological warrior who, but for the absence of the Spanish Inquisition, comes off in the telling as a sort of Galileo of the 20th century, who faced truly tremendous opposition to almost all of his main ideas from powerful entrenched interests, and suffered considerable unmerited ignominies and attempts at discredit. Through his tribulation he gave birth to a new discipline. A gentleman and a fighter.
And what was this fighter’s weapon of choice? Guinea pigs.
It’s hard to imagine that discipline-defining, decades-spanning academic controversy, one with severe ramifications on the progress trajectories of not one but several sciences, could have begun with guinea pigs, or with a man so vocationally (and, as anyone with a heart could only presume, affectionately) taken with them, but such was Sewall Wright. His contemporary update on the Thalesian theme of the absent-minded philosopher falling into a well while looking up at the stars was answering a student’s question before a blackboard, an eraser in one hand and a guinea pig in the other, before turning, mid-reply, and wiping the blackboard clean with the guinea pig.
Whatever that anecdote might otherwise portend, Wright was anything but absent-minded or imprecise. Inspired into genetics by the rediscovery of Gregor Mendel’s theory of genes, Wright noted in the coat of the humble guinea pig lay an intractable challenge to Mendel’s theory. However hard you tried to breed an all-one-colour guinea pig, even the most inbred family lines showed variation in coat colour. Wright’s hunch was that developmental factors in the womb prompted the variation, and set out to determine how. To do so, he developed the first mathematical method for answering causal questions from data — the path diagram.
In a path diagram, you assign a symbol to a quantity, like ‘d’ (which above stands for developmental factors with impact on coat colour). You assign symbols to all the other quantities you can think of. Then, you express what you know about known quantities in simple algebra. Finally, presuming you have enough data and enough equations, you can solve for your quantity of interest.
In the case of the guinea pig’s coat, d was assessed against h (unknown hereditary factors), as well as e (environmental factors after birth), and g (genetic input from parents). The two little buggers on the right are the offspring. By the solving of the algebra in concert with comparison between highly inbred and non-inbred populations, Wright discovered that in non-inbred guinea pigs 42% of coat variation was due to heredity while 58% was in-utero-developmental, while among the inbred 92% of variation was in-utero-developmental. Thus, even in those inbred populations where genetic variation had been all-but eliminated, differences persisted. ‘d’, Wright observed, must be driving the distinction.
Wright’s work had shown that developmental events and coat colour were not merely correlated, but were linked by drivers of causation. Where correlation had been held never to imply causation, Wright had found that some correlations do in fact imply causation.
Wright’s reward for his 1920 paper in which this novel thesis, and accordingly revolutionary method for computing the effect of input factors on outcomes using pathway coefficients, was to draw the ire of the entire statistical establishment. His 1921 follow-up paper, “Correlation and Causation”, was savaged by Pearson’s students. Henry Niles, one such student, asserted that ‘causation’ was a meaningless notion that simply stood for “perfect correlation.” Niles went on to rubbish the pathway diagram as a methodology for building a picture of causal inputs, and despite his lack of care in performing Wright’s calculations properly, his assertion — that causation could never be understood, because scientists never have a complete command of the number of potential variables in the picture - was typical of the scientific establishment’s and would hold sway for more than 50 years. Not until the work of Herbert Simon, 1978 Nobel laureate in economics, would Wright’s path diagrams become sexy again.
This resulted in a crucial loss to the analytical vocabulary that I feel we are still smarting from today — the rejection of the causal hypothesis. Wright’s theorem was bold not merely because it challenged an existing orthodoxy but because it was happy to try and make use of that which the Pearsonian point of view most feared — contingency. Don’t have all the data? Wright suggested you could compensate for this with causal hypotheses, what we would crudely call ‘assumptions’. This intrusion of qualitative thinking — even when such ‘prejudices’ as this qualitative thinking ostensibly qualifies as present in extremely obvious assertions, like ‘the colour of a guinea pig’s coat does not influence its parents coat’ — into statistical parley was unthinkable for an adherent of Pearson’s view. The fact that Wright had taken just such base, crude, prejudicial assumptions and used them to arrive at quantitative, novel, and replicable results was of no interest, because the method of arriving there was simply too vulgar. It was vulgar because it unabashedly risked being wrong, and implicitly credited a willingness to be wrong, a courage in striking out from under the sheltering canopy of data into more unbound inquiry, that Pearson regretted.
Had Wright’s thesis prevailed, I believe we might have seen a much more generous allowance be returned to the role of abstract, qualitative, conceptual thinking in the development of the sciences. But a Pearson-type thesis did in fact prevail, and I believe that this, by its dissemination through so many subsequent generations of thinkers, is responsible for our ultra-systematic present mode of science, which believes everything is in the data, fears and reviles anything that isn’t, conceives of causation as a tiresome annoyance, and generally will refuse to believe that the sky is blue without three peer-reviewed studies to hand to back such an assertion up. The debacle of New York City paying McKinsey millions to prove that trash cans improve incidence of littering is a direct manifestation of a curiously Pearsonian scientific psychology.
Let’s not let Wright off the hook too lightly, though — he asked for the decades of reputational purgatory he subsequently enjoyed for having the temerity to mix epistemological modes, using a diagrammatic mode to illuminate data and data to justify the use of a diagrammatic presentation. The marriage of qualitative and quantitative methods of knowledge-building is what made causal science so new, so crucial, and what has probably doomed it to the doldrums it has generally occupied. In epochs gone by, finding cells of thinkers — or even individual thinkers — in which qualitative and quantitative instincts mixed was not so difficult. The separation is so hard and acrimonious now that such a mixed-methods science must find it hard to hold a room together.
Like his cousin, the fictional lawyer Phoenix Wright, Sewall was a fundamentally true, honest, and endearing man, possessed of a great power of apprehending the truth. He found himself much embattled as a result of this gift. Like his cousin’s more generalised avian namesake, we must hope that Sewall Wright’s legacy will prove at home in baths of fire, and that his legacy will rise again. It is hard to imagine us making much straightforward progress against so many of the critical, civilisation-shaping unknowns about us unless it does.
Thank You for Smoking
Thankfully for the general knowledge estate, Wright’s theories did emerge eventually from their untimely eclipse. Path analysis was completely absent from the scientific literature between 1920 and 1960, save Wright’s own work and a few curii in the animal breeding space. This is a shame, because it meant that like George Canning (who had some interesting things to say about statistics) missing out on the culmination stage of the Napoleonic wars after jostling with Lord Castlereagh, causation missed its one big chance to shine — in the epochal debate during the 1960s about whether smoking was bad for you.
In the middle of the 20th century, the biostatistical establishment had by-and-large turned to the consensus that smoking was bad for you. Still, the question was live enough to divide families — to divide families of biostatisticians, in fact, as in the case of UC Berkeley’s Yacob Yerushalmy and his profound pro-tobacco stance versus the equally anti-smoking sensibilities of his brother in law Abe Lilienfeld (both men were smokers). They, like the wider scientific establishment, sat around and debating the health drawbacks of smoking, specifically whether smoking caused cancer. After all, many people smoked (and still smoke) all their lives without getting cancer, while some lung cancers develop in people who’ve never once smoked.
The debate could not lean on one of the most reliable weapons in the causateer’s arsenal and in the history of causation as a science — the randomised control trial. It seems unlikely that one could convince non-smokers to smoke for decades for scientific reasons, ruining their health, without incurring ethical difficulties. And the lack of what Pearl calls a willingness to call on “a more principled theory of causation”, owing to the scientific establishment’s reluctance to credit causal methods, really did extend the debate far longer than it might have, occasioning many more deaths and smoking-related health issues than were strictly necessary.
Two British epidemiologists, Richard Doll and Austin Bradford Hill, put their heads together in the late 40s to see if they could figure out why a previously unknown cancer — before 1900 a doctor might encounter lung cancer once in a lifetime — was fast becoming a leading killer of men in America. Their case control studies were “extremely suggestive” of a positive causal relationship between smoking and cancer but were not conclusive. Then an American called Jerome Cornfield (yet another in a long line of seminal self-taught statisticians) coined his eponymous inequality, which discredited the idea that a “smoker’s gene” was the main accounting factor in cancer risk.
Pearl contends that the scientific establishment’s clash with big tobacco represented a key 20th century theatre in the war on “organised denialism”. An inability to establish credit for causal investigative methods and their conclusions, and an inability too to put sufficient emphasis on the rhetorical value of identifying causality, allowed tobacco companies to freely use seemingly contradictory evidence (which if tested would presumably have had far less causal support) to contest the building consensus. Similarly, the scientific establishment retarded its own journey to grasping the truth by the disdain it practiced for observational studies, refusing to credit any data derived from means other than RCTs, determined all the while to heap discredit on the notion of hypothesis-led experimentation.
Pearl is magnificently frank — above all, he says, the war on the cigarette took so long to be won “because scientists had no straightforward definition of the word ‘cause’, and no way to ascertain causal effect without a randomised control trial.” This purism was undoubtedly a vestige of Wright’s defeat in his war with the disciples of Pearson.
Pearls of Wisdom
One of the oddly pleasing things about reading The Book of Why, if one reads it as a work of history, is entertaining the feeling that it, and its author’s work more generally, form such a key part of that history, as though we were reading a book about Rome’s imperial tradition written by Hadrian or Marcus Aurelius.
In lieu of summarising Pearl’s achievements further, on pain of this post’s already excruciating length, I will point the reader to this, to his causal calculus, and to The Book of Why itself.
Further Riches Around the Margins
When we talk about causation, we are talking about a very special area.
When we discuss physics, or chemistry, or particularly biology, we are surely talking about forces which have definitive power in shaping the work of the human mind, but we are also talking about structures that supersede such limits and take as their medium a far more diverse range of matter. What’s so interesting about causation is that it is a science that is so intimately bound up with human psychology, something which threatens to have direct power to clarify certain psychological mechanisms which have a tendency otherwise to seem opaque. It is an unusually interpretative science, at least in its appearance — this is what it has made it seem nefarious to many statisticians, and unpalatably threatening to many more generally systematic thinkers.
We can see little soupçons of consideration of the interplay between the hard mechanics of causation and human subjectivity throughout the histories of causation specifically as well as of statistics and psychology. As British statistician Udny Yule, who you’ll remember as Karl Pearson’s prodigal assistant, put it in a 1926 address to the Royal Statistical Society:
Now I suppose it is possible, given a little ingenuity and good will, to rationalise very nearly anything.
Who, upon being told that this sentiment was last week’s off-hand Tweet, rather than a vastly prophetic utterance of yester-century, would disbelieve it?
And, of course, no stranger to contemporary dialogues on almost any subject could fail to notice the frequency with which causation is employed illegitimately in order to support conclusions we would want to support more or less regardless of whether causation actually vindicated them. This is what Jonathan Haidt memorably referred to as “the emotional dog and its rational tail.” While a thoroughgoing metanalysis of the causes of such a vast and sweeping phenomenon has yet to come to light, we can with some reasonable confidence suspect that at least a portion of the ongoing replication crisis in behavioural science might be attributable to errors in the ascribing of causation.
Causation and Artificial Intelligence
With our now bolstered understanding of the way in which various intellectual interests reject the science of causation, we can come to understand the current landscape of research and development in AI better, and grasp the limits of some of its present horizons.
We do not have to perform a lot of extrapolation on Pearl’s text in particular to understand the limitations of AI as presently conceived (and even though the book predates some of the more recent scaling-driven advances in the field) — Pearl actively discusses the science of causation relative to AI’s developing capabilities, and does so having won the 2011 Turing Award for making “fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning probabilistic and causal reasoning.”
In spite of these advances to which he contributed so considerably, Pearl is unequivocal about the present limits of AI and the relationship of those limits to the science of causation.
“Deep learning has succeeded primarily by showing that certain questions or tasks we thought were difficult are in fact not…As a result the public believes that “strong AI”, machines that think like human, is just around the corner…In reality, nothing could be further from the truth. I fully agree with Gary Marcus, a neuroscientist at New York University, who…wrote in the New York Times that the field of artificial intelligence is ‘bursting with microdiversities’…but machines are still disappointingly far from humanlike cognition.
Just as they did thirty years ago, machine learning programs (including those with deep neural networks) operate almost entirely in associational mode [i.e. on the first rung of the causal ladder].They are driven by a stream of observations which they attempt to fit to a function…Raw data still dives the fitting process. If, for example, the programmers of a driverless car want it to react differently to new situations, they have to add those new reactions explicitly.”
Challenged subsequently to review his position in light of the advances shown by ChatGPT, Pearl was equivocal, noting that GPTs still lack a “model of causation” that is not purely associative, observing that while ChatGPT can be prompted towards a proper recapitulation of the casual reasoning behind his ‘firing squad’ scenario (“if a man is put before a firing squad, and one member of the squad does not shoot, will the man survive?”) from The Book of Why, the instrument is unable to generalise those insights if one repeats the task with a slightly different scenario, and must be re-prompted.
While there is a mania for Bayesianism (or, at least, vernacular Bayesianism) in the habits of conversation typical to rationalists (rationalists who tend to have a higher-than-average interest in AI, and who are therefore proportionately more likely to credit popular notions about AI), and even a branch of specifically Bayesian AI, not much that is substantive in the science of causation appears to hold much water with the AI crowd. Few concepts are felt at the under-the-skin level in machine learning as keenly as the principle of ‘garbage-in-garbage-out’. Applied not to AI but to causation more broadly, that principle tells us that results based on a causal model “are no better than its underlying assumptions.” We can see this peril of the underlying assumption not being taken sufficiently seriously in such recent literature as the Situational Awareness essays, which build an impressive edifice on a foundation of wholly unjustified assumptions (e.g. what “pre-school intelligence”, “university-level intelligence” etc. all mean). As David Freedman wrote in Oasis or Mirage?
Assumptions behind models are rarely articulated, let alone defended. The problem is exacerbated because journals tend to [favour] a mild degree of novelty in statistical procedures. [Modelling], the search for significance, the preference for novelty, and the lack of interest in assumptions—these norms are likely to generate a flood of non-reproducible results.
So there are two primary consequences of the science of causation being held in such low regard by many engineers specialising in AI: a direct consequence, and a meta-consequence.
The direct consequence is that a lower esteem for the analytical and mechanical codification of causation — that is, turning a feel for causation into something machines can understand and implement, which Pearl himself was so decisive in helping do to the level of simple association, the first rung on his three-step Ladder of Causation — means that AI engineers are unlikely to put the requisite focus on expanding a machine’s scope of causal reasoning beyond association. If, as seems reasonable to believe, firmer understanding of causation is so vital to a putative AGI, this seems likely to limit the field’s near-term horizons.
The meta-consequence is that the engineers-in-question’s low esteem for causation is causing them to ignore the extravagantly complex webs of causation that underlay the functions of organic intelligence — that is, they are ignoring the causes of the behaviours they are trying to replicate. An unwillingness to honestly interface with the complex neuroscience of causation is unlikely to prove anything but a critical impediment to their wider aims of building a true artificial intelligence.
And Pearl’s prior, seminal work in causal reasoning in machines also points to a further fundamental need in progressing the capabilities of AI — the ability to express functions of intelligence in mathematical terms. By creating a cohesive theory and language for the mathematical expression of causation, Pearl was able to fundamentally expand AI’s vocabulary of reasoning.
This same creation of mathematised language, theories, and models for different forms of reasoning will presumably be in order for AI to progress beyond its present association-bound state. For instance, if we expect it to reason ethically beyond whatever ethics are merely prescribed by its trainers, we will have to mathematise ethics, an interesting proposition given the centuries of philosophy that has held such an undertaking to be impossible. To the extent of my knowledge, no one at all in AI, or elsewhere for that matter, is earnestly working on these kinds of problems. In my experience these are unpopular subjects to raise with AI engineers, intimating as they do that the great advances in true AI may yet have a century of careful and deliberate foundation stage work ahead of them, making the engineers in question unlikely to be the ones to imminently conjure gods from the ethers of CUDA, such that many of them are already convinced is their destiny.
Just Because
My developing understanding of the contemporary knowledge estate is that it suffers from several ailments, of which the two conditions most appropriate to discuss here are that:
It is, above all else, afraid of causation, and will go to great lengths to obscure it or calculate it with deliberate incorrectness (this phenomenon, to which I sometimes refer as ‘strategic misunderstanding’, is extremely rife in many diverse fields and contexts today. Because it is so toxic for progress, because we have progressed considerably in prior epochs, and because we are seeing progress slow in recent years, I must presume that it is a phenomenon that has intensified in the last several decades).
It is too slavishly committed to inductivist methods of knowledge discovery, believing that every truth can be found ‘in the data’, and that using frameworks of understanding represents an unacceptable introduction of bias into the knowledge discovery process. Scientists as recent as Einstein were led in their path-breaking endeavours by conceptual thinking, which is innately abstract and diffuse. It’s dirty, crude, qualitative, and prejudicial. And yet, where would science be without it? Prescriptive, purely-correlative, ‘canned’ data-based approaches only get you as far as you understand what variables to measure for and how to measure them; except in the highly unlikely instance that you measure a variable ignorant of its utility, and shorn of extremely eccentric analytic methods, you will only ever find in data roughly what you expect. You can work well within the established means of science by this method, but you cannot break into new areas.
A lack of dare-and-dash in the realm of hypothesis, to move out from where data is at its most protective, will have a desolating effect on scientific enquiry (both in explicitly scientific matters and in all areas that merely rely on the scientific method in a broad sense). It is a good thing, for instance, for the history of humankind at sea (and, for more specific instance, for the naval supremacy and thus the national sovereignty of Great Britain) that 18th century biologist James Lind trusted a causal intuition that citrus fruit could prevent scurvy — vitamin C was not discovered until 200-odd years hence, and we are fortunate he did not live in a time with an insane anti-citrus lobby to contest his methods or suppress his subsequent prescription, but Lind’s success is a signal example of instances in which causation was clear without having explicitly data-oriented ‘laboratory proof’.
Of course, there is so much in the history of causation that there is not room enough here to touch on — sensitivity analysis, causal calculus, the Aristotelian wedge of its pre-history — but there is time for a final hammered emphasis of those key points. That there is still room within science for the gifted interdisciplinarian. That acceptance of scientific progress is not a given, and that science shelters prejudice with surprising fidelity. And that where the larger prospective expansions of our faculties are concerned, ‘it’ is, by no stretch of the imagination, all in the data.
"Like his cousin, the fictional lawyer Phoenix Wright, Sewall was a fundamentally true, honest, and endearing man, possessed of a great power of apprehending the truth. He found himself much embattled as a result of this gift. Like his cousin’s more generalised avian namesake, we must hope that Sewall Wright’s legacy will prove at home in baths of fire, and that his legacy will rise again. It is hard to imagine us making much straightforward progress against so many of the critical, civilisation-shaping unknowns about us unless it does."
Art.