A Second Industrial Enlightenment

Why accelerating scientific discovery depends on institutions, not just intelligence.

Mar 23, 2026

In February 2026, Anthropic CEO Dario Amodei told Dwarkesh Patel he was 90% confident that within ten years we would have what he calls “a country of geniuses in a data center”: AI1 systems that can perform scientific research at the level of Nobel laureates, simultaneously, across biology, mathematics, engineering, and more. The phrase comes from his October 2024 essay Machines of Loving Grace, where he laid out the vision most fully: AI that can “perform, direct, and improve upon nearly everything biologists do,” compressing 50 to 100 years of biological progress into 5 to 10.

It is a powerful vision, and the technical achievement behind it is real. AI's scientific capabilities are advancing faster than almost anyone predicted. Tyler Cowen recently linked to a tweet from Gauti Eggertsson, a macroeconomist at Brown, who wrote that he now finds himself "replicating papers and experimenting with frontier methods in an evening or a few days using Claude Code. That would have taken weeks before." Cowen commented that Eggertsson's vision is "still far too conservative." He may be right. But notice what the acceleration looks like. Replicating known methods, analyzing existing data, experimenting within established scientific frameworks, all done faster. The deeper question is whether that acceleration compounds, whether it produces self-reinforcing growth in scientific understanding, or whether it harvests the low-hanging fruit and then plateaus.

Dario frames the challenge primarily as building sufficiently capable AI. To his credit, he acknowledges complexity. Machines of Loving Grace explicitly identifies physical and logistical constraints (speed of the real world, data scarcity, regulatory friction) as complementary factors that limit what intelligence alone can achieve. He splits his remaining 10% uncertainty in half: about 5% for exogenous disruption (geopolitical crisis, industry turmoil) and about 5% for what he calls his “one little bit of fundamental uncertainty,” the possibility that generalization from verifiable tasks does not fully extend to tasks that are hard to verify. His description of the current moment as “near the end of the exponential” suggests he expects sheer momentum to close even that gap. Whether 5% is the right number is debatable. But the deeper issue is not whether Dario is right about capability. It is that even if he is, capability alone does not explain why some periods of history have produced extraordinary, self-reinforcing growth in useful knowledge while others have not. On this question, Dario's analysis, however nuanced about physical and logistical constraints, stops at the lab door, so to speak. He does not address how scientific knowledge is organized and shared, who has access to it, or what institutional conditions determine whether acceleration compounds.

There is a body of scholarship that answers that question, and its answer is not primarily about intelligence. It is about the institutional connective tissue that determines whether progress sustains itself or peters out. Joel Mokyr, the economic historian, built his career on exactly this insight. His 2025 Nobel Prize (shared with Philippe Aghion and Peter Howitt, whose endogenous growth models formalize the feedback loops between innovation and growth) recognized the centrality of knowledge institutions to sustained economic progress. What follows is an attempt to apply his framework to AI in science, and to argue that 10x-ing the pace of discovery requires building institutions around AI, not just building AI. The essay focuses primarily on the natural sciences, where the examples are most concrete and the stakes most tangible, though the underlying institutional logic likely applies more broadly.

Making that case requires first examining what AI can and cannot currently do in science with some precision, because the strongest version of Dario’s vision implies AI that can autonomously perform the full range of scientific cognition. The first half of the essay draws on recent results, perspectives from scientists and AI researchers, and work in philosophy of science to argue that while this is not physically impossible, the current trajectory is unlikely to get us there. That gap is precisely why the institutional question matters. If full automation of scientific discovery were imminent, institutions would be an afterthought. It is not, and they are not.

The essay then turns to a second problem. AI is not only unlikely to automate the deepest science anytime soon; it is actively reshaping the incentive landscape of the science we have, tilting effort toward well-explored territory and away from the data-sparse questions most likely to produce genuinely new scientific theories. This exploitation trap, and the simultaneous diffusion of AI tools to practitioners outside the academy, sets up the case for Mokyr. His framework suggests that the answer depends less on how powerful the AI becomes than on whether the right institutional infrastructure exists to channel AI’s capabilities into positive feedback loops. What that infrastructure looks like in practice (open data channels, incentives to share failures and surprises, mechanisms for connecting practitioners to researchers) is the subject of the essay’s second half. The original Industrial Enlightenment, as Mokyr calls it, did not merely produce discoveries. It produced the sustained, compounding growth in useful knowledge that transformed medicine, agriculture, manufacturing, and living standards. A second one could do the same, faster. But it requires deliberate construction, and the existing incentive structures that AI is reinforcing (who shares data, who hoards it, which questions get funded) will only harden with time.

The acceleration so far

Before turning to Mokyr, it helps to see what AI is actually doing in science right now. The following results are chosen not for comprehensiveness but for what they reveal about both current capability and its limits.

October 2025. Kosmos, built by FutureHouse and Edison Scientific, aims to function as a fully autonomous research system, one that can identify questions, search the literature, design analyses, and produce findings. Of seven findings across metabolomics, materials science, neuroscience, and statistical genetics, three were independently reproduced. One identified a connection between cardiac fibrosis and a class of metabolic pathways that domain experts confirmed. But the paper reports a telling accuracy split: 85% on data analysis, 82% on literature review, and 57.9% on interpretation, where the system must synthesize across results to articulate what they collectively mean.

November 2025. Stanford’s Biomni agent compressed a genome-wide association study (the kind of analysis that identifies genetic variants linked to diseases like diabetes or heart disease, typically requiring months of specialized statistical work) into 20 minutes.

January 2026. New Scientist reported that professional mathematicians were “stunned by the progress amateurs have made in solving long-standing problems with the assistance of AI tools.” An amateur with a good question and an AI collaborator could now access mathematical techniques that previously required years of specialized training.

February 2026. A collaboration between physicists at the Institute for Advanced Study, Vanderbilt, Cambridge, and Harvard reported that OpenAI’s GPT-5.2 had spotted a pattern in gluon scattering amplitudes (interactions among particles that carry the strong nuclear force) that physicists had studied for 40 years. Zvi Bern at UCLA told Science: “The ideas are not revolutionary. But what is revolutionary is that a machine can do this.”

February 2026. Donald Knuth, the legendary Stanford computer scientist, published “Claude’s Cycles.” Claude Opus had solved an open problem in graph theory Knuth had worked on for years, failing 14 times before arriving at a construction. Knuth wrote the formal proof. His summary: “Augmented mathematics, not autonomous mathematics.” But also: “What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.”

February 2026. Daniel Litt, a mathematician at the University of Toronto and longtime AI skeptic, wrote that he now expects to lose his bet that AI cannot produce a top-tier mathematics paper by 2030. His residual question: “Can it invent the notion of a scheme, or of a perfectoid space? Can it come up with a new technique? Make an interesting new definition? Ask the right question?”2

March 2026. Google DeepMind’s Aletheia, a mathematical research agent, achieved 95.1% accuracy on olympiad-level proofs and autonomously solved four open problems from the collection of Paul Erdős. But DeepMind’s analysis of 200 candidate solutions tells a more complex story: 68.5% were fundamentally flawed. Of the technically correct ones, most were what the authors called “mathematically vacuous,” satisfying formal criteria while missing the underlying mathematical intent. Only 6.5% were “meaningfully correct.” Their assessment: success seems to arise from “clever technical manipulations or vast knowledge retrieval, rather than what mathematicians would consider to be genuine creativity.”

A recent Scientific American article quotes several mathematicians predicting 2026 will be the year AI-assisted results first make it through peer review in major journals. Terence Tao, the Fields Medal-winning mathematician at UCLA, told Dwarkesh Patel last week that AI tools have helped resolve roughly fifty Erdős problems. But he drew a distinction. Purely AI-driven solutions have stalled. “There was a month where that happened and that has stopped, not for lack of trying.” The headline results look impressive because people point AI at hundreds of problems and only publicize the successes. The per-problem success rate is closer to 1–2%. The pattern is human-AI collaboration, not autonomous discovery.

The verification gradient

A pattern runs through these results that maps onto Dario’s caveat about verification, and helps explain both where AI is advancing fastest and where it is likely to stall.

The philosopher C.S. Peirce, one of the most original thinkers of the 19th century, drew a distinction among three forms of inference that helps organize what we are seeing.3 Deduction derives what must follow from a given set of premises. GPT-5.2’s proof of the gluon formula, Claude’s solution for Knuth, and Aletheia’s olympiad performance are deductive achievements. Induction identifies the best-fitting pattern in a body of observations: you observe that drug X reduces symptoms in 90 of 100 patients and conclude it is likely effective. Kosmos identifying metabolic pathways in cardiac fibrosis, Eggertsson’s evening replications, Biomni’s genome-wide analysis in 20 minutes, all inductive achievements at remarkable speed.

The third is what Peirce called abduction: given a surprising observation that does not fit existing frameworks, generate a new explanatory hypothesis. You notice that Mercury’s orbit deviates from Newtonian predictions. Most physicists hypothesize a hidden planet. Einstein instead proposes that spacetime is curved by mass. Both explanations fit the data. What separates them is a judgment about which kind of explanation is more fertile, a judgment the data alone cannot supply.

Here is why Peirce’s distinction maps onto the verification problem in AI training. Deductive tasks are cheap to verify, in that a proof is correct or it is not. This is why progress in mathematics has been so rapid; models can learn from clear reward signals at scale. Inductive tasks are also cheap to verify. A pattern fits the data or it does not, and statistical measures of fit are automatable. But abductive tasks — deciding what a surprising result means, whether it challenges a reigning theory, what new question it opens — are expensive to verify, because they require human evaluation and evaluators often disagree. No one has yet found a cheap, automatable reward signal for the judgment that spacetime is curved rather than that a planet is hiding.

Dario’s bet is that generalization from verifiable tasks will bridge this gap regardless. His 5% is the residual possibility that it won’t, that the frontier where verification becomes expensive and training signal thins is also where generalization runs short. The Kosmos and Aletheia results sit right on this frontier: 85% accuracy where verification is cheap, under 58% and 6.5% where it is not.

How far does recombination go?

The boundary, however, is not as sharp as the taxonomy suggests. The same results classified above as deductive or inductive often have an abductive dimension when viewed differently. GPT-5.2’s gluon formula is deductively proved, yet the physicists involved describe the model as conjecturing the formula, spotting a hidden regularity across 40 years of results that humans had missed. Kosmos’s identification of new metabolic pathways in cardiac fibrosis is inductive pattern-matching on data, even as the connection between two previously unrelated literatures is a hypothesis no individual researcher had formed. Whether this is genuinely abductive or a very sophisticated form of induction is philosophically contested. The practical result is the same. These systems are producing novel hypotheses by synthesizing across vast bodies of literature and data at a scale no individual could match. Call it recombinative abduction.

Stathis Psillos, a philosopher of science, has argued that abduction's core role may be exactly this, selecting plausible candidates for testing from a space of possibilities, with evaluation then proceeding through standard scientific methods like experiment, replication, and statistical analysis.4 How far the recombinative capability can go is what matters. This is, at bottom, the question inside Dario's 5%. Does competence on verifiable tasks fully generalize to the open-ended scientific cognition that produces new frameworks, new questions, and the judgment about which assumptions can be safely ignored? Or does that final stretch require something recombination alone does not provide?

Beyond recombination

Several lines of evidence suggest that the deepest forms of scientific creativity involve capacities that current systems lack and that scaling5 or architectural changes may not automatically supply.

The first is empirical. Kosmos’s gap between 85% on data analysis and 57.9% on interpretation, and Aletheia’s 6.5% meaningful-correctness rate, are not zero, which suggests current architectures have some abductive capacity and can improve further. But the gap is large, and the pattern is consistent across domains. AI performs at or near expert level on tasks with clear verification signals and markedly worse on tasks that require judgment about what results signify. And no AI system has yet produced a paradigm shift, a result that redefines the terms of its field rather than extending established theories within them.

The second concerns how these systems generate outputs. Andrej Karpathy, the former Tesla AI director, described the intuitive version in an October 2025 conversation with Dwarkesh. Models have improved since, but the structural point remains. When you ask an LLM to reflect on a chapter of a book, each response looks reasonable in isolation. “But if I ask it 10 times, you’ll notice that all of them are the same.” The model converges to the same cluster of observations every time. Its responses are sampled from a distribution that is tightly centered on the training data’s center of gravity. Karpathy describes his own reading differently: “The book is a set of prompts for me to do synthetic data generation.” He integrates what he reads with what he already knows, his persistent interests and commitments, in ways that produce new thinking not contained in the book material itself. One could argue this is itself a very sophisticated form of recombination. Perhaps. But it draws on embodied experience, persistent memory, and long-running intellectual commitments that current models do not have, and it is not obvious that scaling alone will produce them.

Li, Oh, and Li (ICLR 2026) added mathematical rigor to Karpathy’s intuition. They proved that under the standard methods used to train LLMs, a constraint designed to keep the model’s outputs close to its training distribution has the side effect of suppressing genuinely novel outputs. Bonuses intended to encourage exploration “disproportionately amplify rewards for regions already well-covered,” reinforcing conservative behavior. Their proposed fix corrects this specific pathology. But the deeper point is that even with a fix, the model still needs a reason to explore in a particular direction, some equivalent of the commitments that orient scientists toward fertile territory rather than merely unfamiliar territory. And this connects back to the verification gradient. There is no cheap reward signal for having explored in the right direction. Where correctness is checkable and reward is dense, reinforcement learning has driven spectacular progress. Where it is not, the guidance thins out.

Philosophy of science helps clarify what the technical discussion points toward. When Einstein confronted Mercury’s anomalous orbit, the data were equally consistent with preserving Newtonian mechanics (hypothesizing a hidden planet whose gravity explained the discrepancy) or abandoning its core assumptions (curved spacetime). What separated the two responses was not a difference in computational power or pattern-matching ability. It was a set of pre-empirical commitments to simplicity, to unification, to what Einstein called a more coherent physical picture, that oriented him toward a specific kind of explanation before the data could confirm it.

The philosopher Ian Hacking argued in Representing and Intervening that these commitments are often grounded in experimental practice: scientists who physically manipulate causes develop intuitions that pure theorists do not. When Hans Christian Ørsted noticed in 1820 that a compass needle deflected near an electric current — during a lecture demonstration, not a planned experiment — the surprise of that observation opened the entire field of electromagnetism. But only because Ørsted had the kind of hands-on familiarity with electrical phenomena that allowed him to notice it against the noise and reflect on its significance rather than dismiss it. Hacking’s point is that certain kinds of scientific knowledge come through intervention in the world, not just representation of it.

What, then, does deep scientific creativity require? Across the evidence and sources reviewed here, three capacities consistently appear. First, sensitivity to anomaly and incoherence, the ability to recognize when a result is not just surprising within a theory but surprising about the theory, when it suggests the reigning framework itself may be wrong. Zahavy et al. argue in ‘LLMs Can’t Jump’ (January 2026) that this kind of recognition depends on an implicit sense of what is coherent, not just what is probable, and that current systems lack a training signal for it. That may overstate the case; the signal is more likely thin than absent. But thin is enough to explain the gap.

The other two capacities are harder to pin down but no less important. Pre-empirical commitments, the values that orient a scientist toward a particular kind of explanation before the data can adjudicate: simplicity, fertility, unification, explanatory depth. These are what Michael Polanyi, the Hungarian-British physical chemist turned philosopher, called “intellectual passions” and what Imre Lakatos, the philosopher of mathematics and science, described as the non-negotiable commitments at the center of a research program, the assumptions a scientist protects rather than abandons when anomalies arise.6 And finally causal interaction with reality, the hands-on experimental work through which scientists build intuitions and encounter surprises that force deeper reflection and integration, the capacity Hacking argues is epistemically distinct from theoretical knowledge.7

These three capacities are related. Sensitivity to anomaly is shaped by experimental practice. Pre-empirical commitments are informed by hands-on experience with what kinds of explanations tend to be fruitful. And all three are expensive to verify in the ML training sense. There is no cheap reward signal for recognizing incoherence, caring about explanatory depth, or having the right experimental intuition. This is the deeper structure of Dario’s 5%. It is not that generalization from verifiable tasks is impossible; AI’s existing recombinative abduction shows it is already happening. It is that the final stretch, from recombination to the deepest forms of scientific creativity, depends on capacities for which current training methods offer thin reward signal, and which scaling or architectural changes may or may not supply.8

The exploitation trap

The verification gradient has a social analogue. Professional science rewards publications, citations, and grants, proxies that accumulate fastest where verification is cheapest, which is also the territory where AI capability is strongest. The specifics vary by country, but the metrics by which scientists are evaluated globally (the same citation indices, the same high-impact journals, the same university rankings) mean the basic incentive structure, and the tilt AI introduces, is broadly shared across the developed world. Two independent reward systems (one governing how models learn, one governing how scientists are evaluated) converge on the same terrain.

This is not a claim about the motivations of individual scientists, most of whom are pursuing genuine questions. It is an observation about the systems they operate in. AI did not create this incentive structure. But it has dramatically reshaped the cost-benefit landscape scientists face, tilting rational effort toward exploitation — harvesting returns from known, data-rich territory — and away from exploration of the data-sparse questions that might open genuinely new territory. James March argued in 1991 that this is a general property of adaptive systems: they refine exploitation more rapidly than exploration, becoming “effective in the short run but self-destructive in the long run.” Three decades of empirical work across strategy, innovation, and organizational learning have broadly confirmed the pattern. AI has dramatically accelerated the dynamic March described.9

Hao, Xu, Li, and Evans show the pattern with unusual clarity. Scientists who adopt AI-augmented methods publish 3.02 times more papers, receive 4.84 times more citations, and become principal investigators 1.37 years earlier, even as the collective volume of distinct research topics contracts by 4.63% and follow-on engagement between scientists falls by 22%. A 4.63% contraction sounds modest; it is not. It represents a net narrowing of the scientific frontier even as output surges. Their assessment: “AI gravitates toward well-lit problems and away from foundational and emergent questions where data is necessarily sparse.” The result is an exploitation trap, everyone scaling the same data-rich peaks, not because any individual scientist lacks curiosity, but because the system’s reward signal points there.

AI has made exploitation dramatically cheaper. What was a months-long genome-wide association study is now twenty minutes. What was a week of replication is now an evening. The predictable result is a rush to harvest the analytical fruit that was always there but previously too time-consuming to pick.

This tilt toward exploitation extends beyond papers into the funding machinery of institutional science. A January 2026 preprint, summarized in Nature, finds that proposals with greater AI involvement at both NIH and NSF are more semantically similar to work those agencies have recently funded; they converge toward the revealed preferences of the funders. At NIH, the more AI-assisted proposals were also more likely to be funded and later produced more papers, though not more breakthrough results. NIH has responded by prohibiting peer reviewers from using AI in evaluations and warning that applications substantially generated by AI will not be treated as original work, but these are administrative patches on a structural problem. The underlying incentive remains. If AI can polish prose, assemble bibliographies, and optimize proposals toward what funders have rewarded before, scientists will use it to do exactly that. The gap between what is submitted and what is genuinely new widens.

The same tilt toward exploitation over exploration shows up in citation patterns. A December 2025 C&EN piece, drawing on a Science study of 2.1 million preprints, reports that AI-assisted papers already cite a wider range of sources and more recent work. That may represent a genuine broadening of intellectual engagement. But given the narrowing that Hao et al. document, the more likely explanation is what I would call citation inflation: as literature search and bibliography assembly become cheaper, citations proliferate faster than genuine intellectual debt accumulates. The average reference reflects less reading, less wrestling with a source’s argument. Bibliographies get fatter; canons get thinner.

AI discovery tools can push against this narrowing. Hao et al. argue that AI could be repurposed to surface anomalies rather than exploit patterns. But if millions of researchers use the same AI discovery tools to identify the same white spaces, they will all converge on the same unexplored territory, turning exploration into a new form of exploitation, as the spaces AI identifies become the next well-lit peaks.

What happens as exploitation saturates? The returns are self-limiting. As AI-enabled analyses flood the literature, the marginal value of yet another competent contribution within an established paradigm falls. AI is making cheap not just the writing of papers but the core deductive and inductive work that constitutes much of scientific practice: statistical analysis, pattern identification, literature synthesis, replication, proof verification. When all of this becomes abundant, distinguishing genuine insight from competent production becomes the central problem for grant reviewers, journal editors, and hiring committees alike. Hao et al. sharpen the point. Concentrating on a narrower set of problems does not even help scientists solve those problems faster; it just produces more papers about them. Without deliberate intervention, they warn, "science risks premature convergence on established paradigms," foreclosing the new fields that historically produce the largest returns.

Consider a grant reviewer facing twenty strong proposals to climb the same hill. AI can help score each on methodological rigor. But AI predictions of likely impact are trained on historical funding patterns, and those patterns reward work on well-explored, data-rich problems. A proposal to explore data-sparse territory will score poorly because there is little historical basis for the prediction. When AI mediates both the writing of proposals and the evaluation of them, applicants optimize for what reviewers reward, reviewers reward what historical patterns predict will succeed, and the variance among proposals (the range of approaches, questions, and methods that reviewers actually see) compresses. What remains as a differentiator when the traditional markers of quality (methodological rigor, novelty of findings, clarity of exposition) become easy for AI to produce is the one thing AI cannot manufacture: credibility. Who is asking this? What institution stands behind them? What data can they access? Do they have the infrastructure to run experiments in the physical world, not just analyses on existing datasets?

Concentration and diffusion

These dynamics push toward two effects worth separating. The first is concentration. The institutions that already command the most resources (funded labor, specialized instruments, exclusive datasets, long-running cohorts, institutional agreements) are also the ones with the most credibility, and credibility is what the system increasingly selects for. AI does not level the playing fields that matter most. In a noisier landscape, brand and track record become stronger filters, and the compounding advantages of prestige grow more decisive.

Funding dynamics reinforce the pattern. The federal share of basic research funding fell from 52% to 41% between 2012 and 2023, even as total federal research funding grew only modestly in real terms. The Trump administration proposed cutting NIH funding by 40% and NSF funding by more than half for FY2026; Congress largely blocked those cuts, but the resulting budgets are still flat or slightly down, and the political environment remains volatile. Federal funding is already heavily concentrated: the top 30 institutions account for 42% of all higher education R&D, and agencies’ evaluation frameworks favor institutions with established track records. When the pie is not growing and the criteria for slicing it reward incumbents, the institutions best positioned to absorb AI’s productivity gains are the ones that were already best positioned. The result is a system that selects for continuity. Resources concentrate at the top. Elite institutions have the most slack to explore genuinely new territory — many do, but the incentives push the other way.

The second effect runs in the opposite direction. The same AI tools that amplify productivity, and concentration, inside the academy also lower barriers elsewhere. A smaller lab can now run analyses that once required dedicated staff. And the diffusion of AI tools extends well beyond the academy. Literature that was previously impenetrable without years of training becomes accessible through AI-mediated synthesis. Statistical techniques that required specialized coursework can now be deployed by someone who understands the scientific question well enough to direct the analysis, even without formal training in the technique itself. In pure mathematics, the amateurs solving long-standing problems noted earlier are one visible sign. The walls of institutional science may rise in one sense, status and established funding relationships growing more determinative, while the tools of scientific reasoning diffuse outward in another.

And not only toward amateur theorists. AI is also lowering the cost for practitioners (engineers, clinicians, materials scientists) to access and apply scientific knowledge that was previously locked behind specialist training. A materials startup can now use AI to screen candidate compounds and interpret results against a literature its engineers could not previously navigate, a capability the next section will examine in detail. These are people with practical problems, using AI to bridge the gap between what science knows and what they need to do, and they are unbound by the publication counts, citation accumulation, and grant conformity that the reward structures of institutional science select for.

That absence is not, by itself, a virtue. It does not guarantee insight. But it creates room for different questions, practically motivated ones that arise from contact with the world rather than from the logic of academic incentives. As Litt argues about mathematics, different results come not because some practitioners are smarter but because they bring “wildly different interests, instincts, approaches to the subject.” What matters is whether those different questions can find their way back to the researchers who could build on them, along with the unexpected observations that inevitably arise when practitioners apply scientific knowledge to real-world problems.

So where does this leave us? AI is unlikely, on the current trajectory, to automate the deepest forms of scientific creativity. And the science it is automating, it is steering toward well-explored territory, the exploitation trap, while the institutional structures that fund and evaluate research reinforce the same bias. At the same time, AI is diffusing the tools of scientific reasoning to practitioners who could, in principle, bring questions and problems that institutional science, left to its own incentives, would not generate on its own. The ingredients for something transformative are present. What remains unclear is whether we can assemble them into a system that generates self-reinforcing growth, the kind of positive feedback that separates a temporary burst from a sustained revolution. Mokyr’s framework, built to explain why the Industrial Revolution sustained itself when earlier waves of innovation did not, offers the clearest lens for thinking about how.

When knowledge compounds

Mokyr’s answer to the historical version of this question, developed across The Gifts of Athena, The Enlightened Economy, and A Culture of Growth, and recognized with the 2025 Nobel Prize in Economics, is that what sustained the Industrial Revolution was not any single invention but a transformation in how useful knowledge was organized and shared. He calls this the “Industrial Enlightenment”: a set of social changes, beginning in the mid-eighteenth century, that reduced the cost of accessing practical knowledge and connected those who understood why techniques worked with those who used them.

What did this look like in practice? The pottery magnate Josiah Wedgwood corresponded with the chemist Antoine Lavoisier about glazing techniques and kiln design, and improved his manufacturing as a result. James Watt’s correspondence with the chemists Joseph Black and John Robinson helped him understand why his steam engine worked, while his practical struggle to minimize steam condensation directed Black’s theoretical attention toward latent heat, one of the foundational concepts of thermodynamics. German “economic societies” created forums where a provincial dyer could learn the chemistry behind a better process and a chemist could learn what problems dyers actually faced. Diderot’s Encyclopédie surveyed artisanal techniques in extraordinary detail, making them accessible to anyone who could read. The common thread was reducing the distance between theoretical knowledge and practical application, and ensuring the information flowed in both directions.

Mokyr's key insight is that this institutional connectivity was constitutive of growth, the reason the Revolution sustained itself rather than petering out like earlier bursts of innovation. In The Gifts of Athena, he writes that “negative feedback was thus replaced by positive feedback, which eventually became so powerful that it became self-sustaining.” Mokyr distinguished between those who held “propositional knowledge,” the theoretical scientists, who understood why things worked, and those who carried out “prescriptive knowledge,” the engineers, manufacturers, and artisans, who knew how to make things. When these two groups were connected, the result was a self-reinforcing spiral of knowledge accumulation that had been impossible in earlier eras. The institutions of the Industrial Enlightenment constructed what the historian Liliane Hilaire-Pérez called passerelles (bridges) between them. And crucially, those institutions did not merely transmit knowledge in one direction. They fostered a culture of open correspondence and collaboration. Institutions like the Lunar Society, the economic societies, and the Republic of Letters kept the distance between theorist and practitioner small enough that surprises and insights traveled naturally in both directions.

Those bridges did not disappear after the eighteenth century. But the traffic across any given bridge has thinned as the network has grown. The number of active peer-reviewed journals has grown at roughly 3–4% per year for decades, doubling every twenty years or so, and individual researchers increasingly work within specialized niches whose literatures barely overlap. The flow of knowledge between theoretical scientists and practitioners continues today, for example through industrial R&D laboratories, land-grant universities, and the modern startup ecosystem, but specialization means any given actor can cross only a few of the bridges available, and the ones they cross are increasingly narrow.

AI is transforming this landscape. Mokyr’s framework helps us see how. AI radically lowers the cost and time required to access propositional knowledge for anyone with a practical problem worth solving, and it does so across the disciplinary boundaries that hyper-specialization has made increasingly difficult to cross.

Updating Mokyr for a world of specialists

In the eighteenth century, the relevant divide ran between two fairly distinct social groups: natural philosophers who understood theory and artisans who made things. Today, the people doing applied scientific work are themselves often formally trained scientists, but they are specialists. A biotech researcher working on protein therapeutics may know structural biology but not the organic chemistry needed to design a viable drug molecule. A clinician noticing an unusual pattern in patient responses may lack the biostatistical tools to determine whether the pattern is real or coincidental. An engineer who has spent years working with battery manufacturing constraints knows exactly what cathode properties matter for a viable product (what Mokyr would call prescriptive knowledge) but may lack familiarity with the materials science literature that could point toward better formulations. The distinction between theoretical scientists and practitioners still matters, but it has been joined by an increasingly fragmented landscape of specialization within each group. Mokyr himself anticipated as much, arguing that the growth of knowledge would eventually outstrip any individual's ability to command it, making the institutions of access more important than ever.

How AI bridges the gaps

AI bridges these cross-domain gaps at a speed and scale that no previous institution could match. How it does so varies. Practitioners are primarily interested in efficient solutions, not understanding for its own sake, so AI’s bridging role operates on a spectrum of abstraction, from explaining adjacent knowledge to automating its application. At one end, a researcher asks Claude or ChatGPT to synthesize an unfamiliar literature, faster and cheaper than Wedgwood writing to Lavoisier, if less personal. At the other, propositional knowledge from multiple subdomains is built directly into an automated workflow that the practitioner uses without needing to engage with the underlying science. The realistic expectation, given scarce human attention and the pressure toward efficient results, is that practitioners will gravitate toward the operationalized end.

Consider two examples. Mitra Chem, a startup developing iron-based battery cathodes, uses machine learning models grounded in physical theory (embedding thermodynamic principles, electrochemical theory, and crystallographic constraints) that let a small team of engineers apply propositional knowledge they could not have mastered discipline by discipline. The company reports a 90% reduction in lab-to-production timelines. TetraScience, a scientific data platform, has adopted NVIDIA’s BioNeMo models to help biotech companies translate unstructured lab data (experimental graphs, charts, instrument readouts) into AI-readable formats that connect to the broader biological literature, so that a researcher can interpret experimental results against models from adjacent disciplines without needing to build those models from scratch.

AI-mediated access to scientific knowledge is expanding rapidly, and increasingly it does not merely explain what adjacent fields know but builds that knowledge into automated workflows practitioners can apply directly. This is the Industrial Enlightenment’s logic operating at a qualitatively different scale, with AI playing the role that Mokyr’s economic societies, correspondence networks, and encyclopedias once played, though with an important difference: those institutions generated bidirectional communication by design, while AI, as a technology, reduces the cost of access to propositional knowledge for everyone, practitioners and researchers alike. But reducing the cost of access is not the same as generating bidirectional communication. Whether AI produces the bidirectional communication that sustained Mokyr’s Industrial Enlightenment depends on what we build around it.

The return channel: the hardest problem

Building true bidirectional information flow between practitioners and theoretical scientists in today’s AI landscape is, I believe, one of the defining institutional challenges for science — and the one most likely to determine whether AI produces a temporary acceleration or a sustained, compounding transformation. The reason is straightforward. Mokyr showed that what sustained the Industrial Enlightenment was not one-directional diffusion but positive feedback loops, where practical experience flowed back and redirected scientific inquiry. Without the return channel, you get faster application of existing knowledge, valuable but self-limiting, because the knowledge base is not being refreshed by the full range of new observations from the field. Academic researchers generate crucial novel observations too, of course, but their observations are already inside the system. The practitioners’ are not. AI can mine the existing published corpus for cross-domain connections, and it will find many. But that corpus is a biased sample, disproportionately positive results from well-explored territory. A sustained yield, the kind that opens entirely new fields, comes from observations not yet in any corpus. Getting them out requires a working return channel, and the return channel is underdeveloped.

Start with the structural obstacles. A biotech startup running AI-guided experiments generates a tight internal cycle in which experimental data feeds model training, which generates new hypotheses, which produce new data. But competitive pressures work against sharing what they learn.

The most scientifically valuable data that industry generates — the failures, the anomalies, the compounds that behaved contrary to prediction — are precisely the data that rarely reach academic researchers, and yet these are the data that could steer research toward fruitful new territory. When they do reach researchers, the results can be striking. In 2020, engineers at Toyota’s North American research lab, working on solid-state batteries for electric vehicles, discovered that their solid electrolytes were failing through an unpredicted mechanism: pores forming at an internal boundary, triggering cracks that existing theoretical models had not anticipated. Academic collaborators at Vanderbilt used the discovery to revise the theory of how these materials break down, producing both new science and practical design guidelines for battery engineering. This is Mokyr’s feedback loop in modern form. A practitioner’s anomaly became a theoretician’s research question, and the revised theory flowed back as better engineering. But it required a deliberate, jointly structured collaboration between an industry lab, a university, and a national laboratory facility. That kind of institutional arrangement remains the exception rather than the norm.

The structural barriers are not the only obstacle. The more AI operationalizes propositional knowledge through automated tools, the more it insulates practitioners from the underlying science, which makes it harder to recognize when a surprising result is scientifically meaningful rather than just a glitch. When Watt’s engine lost steam unexpectedly, he could run his own triage — checking the construction, the fuel supply, the seals — before realizing he needed a theoretician’s help. He was close enough to the physical system to know when something was genuinely surprising. When an automated AI tool predicts a cathode formulation will work and it doesn’t, the practitioner faces a harder version of the same question: is this a genuine anomaly, or just a model hallucination, a faulty prototype, a miscalibrated sensor? The more AI disintermediates the practitioner from the underlying science, the harder this distinction becomes.

This does not eliminate the feedback loop, but it means the channel must be designed with the triage problem in mind. The closest real-world analog is adverse event reporting in medicine. A clinician flags an unexpected patient response; a centralized system aggregates those flags; researchers investigate the patterns. Something similar is needed here, aggregating flagged prediction failures across many users and routing them to researchers who can assess whether a cluster of surprises points to incomplete theory. And because most AI automation tools will be built by private companies, the coordination problem is compounded. There is no standardized channel for “this result was unexpected and might be scientifically interesting.”

Some infrastructure for sharing data exists, but it is uneven. Open databases like the Protein Data Bank (over 220,000 experimentally determined structures, contributed by both industry and academia), PubChem (over 110 million unique chemical structures), and the Materials Project (over 650,000 users) are modern passerelles. But the design challenge is how to make industrial data discoverable and analyzable without requiring full disclosure from companies whose competitive position depends on proprietary knowledge. The most promising approaches do not ask companies to publish all their raw data. They create secure environments where researchers can bring questions to the data rather than moving the data itself. The MELLODDY consortium demonstrated that ten pharmaceutical companies could improve their own predictive models through federated learning over 2.6 billion confidential data points without pooling raw proprietary data, a proof of concept that competitive firms can contribute to a shared knowledge base while protecting their individual positions.

So much for the structural obstacles. But even if they were resolved, pooled data, however well-structured, may not be sufficient. The cultural obstacles may run deeper. The first Industrial Enlightenment did not run on compiled tables or published catalogues alone. It ran on correspondence: direct, iterative exchange between people who understood different parts of the problem. Wedgwood did not scribble kiln observations into a ledger that Lavoisier later reviewed. They wrote to each other, repeatedly, each refining the other’s understanding. The modern equivalent would be institutions that foster this kind of ongoing collaboration; not just shared data but shared problems, discussed between practitioners and researchers working on different facets of the same challenge.

There are bright spots. Preprint servers like arXiv have lowered the cost of sharing findings across institutional boundaries. The Howard Hughes Medical Institute’s Open Science Initiative now requires all HHMI-funded research to be freely available as preprints. And Hugging Face, where practitioners and researchers contribute models, datasets, and benchmarks side by side on the basis of quality rather than affiliation, comes closest to the spirit of what Hilaire-Pérez called passerelles. But even Hugging Face is primarily a platform for sharing artifacts rather than a forum for iterative, problem-centered dialogue. The structural incentives run against deeper exchange. Researchers are rewarded for publishing, not for corresponding with practitioners, and practitioners have no economic incentive to share observations that might have scientific value. How much more cross-boundary dialogue is needed is genuinely unclear. But the norms that would make it routine are, at best, nascent.

The antidote to exploitation

The fragmentation of scientific knowledge into ever-narrower silos means the potential for AI-mediated cross-pollination is larger than anything the eighteenth century could have produced. There are more bridges to build and more traffic to carry across each one. And this is also the most plausible structural antidote to March’s exploitation trap. If institutional science is tilted toward exploiting known territory, then connecting it to practitioners whose questions arise from working on useful unsolved problems, rather than from the logic of academic incentives, steers the system toward fruitful areas of unexplored territory. The practitioner’s problem becomes the academic scientist’s question, much as Wedgwood’s glazing challenges directed Lavoisier’s chemistry. If the return channel can be made to work, not just data flowing into repositories but genuine exchange between the people who encounter surprises and the people who can theorize about them, the conditions exist for the kind of positive feedback that Mokyr showed is the difference between a temporary burst and a revolution. The final section sketches what that could look like.

A Second Industrial Enlightenment

In Machines of Loving Grace, Dario writes that “a surprisingly large fraction of the progress in biology has come from a truly tiny number of discoveries, often related to broad measurement tools or techniques that allow precise but generalized or programmable intervention in biological systems: CRISPR, PCR, optogenetics, and expansion microscopy.” He thinks their rate of discovery could be increased by 10x, compressing a century of biological progress into a decade. This is a noble ambition. But the “country of geniuses” framing, even with the caveats Dario attaches to it, directs attention toward building more powerful AI and away from the institutional conditions that determine whether that power compounds. This is especially true when Dario describes the current moment as “near the end of the exponential,” language that suggests the remaining problems will be swept aside by capability gains alone. The argument of this essay is that 10x-ing the pace of scientific discovery requires something more than intelligence concentrated in data centers. It requires building the conditions for a second Industrial Enlightenment.

My read of the evidence is that we are closer to this than most people realize and further from it than the AI optimists assume. The raw capability is extraordinary. But capability without institutional infrastructure produces acceleration within established scientific theory. Faster drug screening, faster protein folding, faster literature synthesis. The natural objection is that AI can also mine the existing literature for cross-domain connections, finding more CRISPRs hiding in plain sight. It can, and it probably will. But consider what CRISPR actually required: Mojica’s empirical surprise (noticing repeated DNA sequences in his data on salt-tolerant archaea) and Doudna and Charpentier’s cross-domain leap (recognizing that a bacterial immune mechanism could be repurposed for gene editing). AI can accelerate the second step dramatically. But a sustained 10x requires a sustained increase in the input to that process: the stream of surprising empirical observations from diverse contact with reality. The published scientific corpus is a biased sample, disproportionately positive results from well-explored territory. This is the exploitation trap in published form. The low-hanging cross-domain connections in published literature will be mined by researchers worldwide, with the same AI tools, simultaneously. The sustained yield comes from new observations not yet in any corpus. Getting them out is an institutional achievement as much as a technical one.

There are many possible paths to a second Industrial Enlightenment. On the path we are on, one where highly capable AI already exists, two additional ingredients are needed. The first (call it ten thousand Wedgwood Workshops) is broad access for practitioners to AI that automates the application of propositional knowledge from adjacent subspecialties, not just chatbots but automated operational tools. The volume and diversity of surprising observations scales with the number of people doing empirical work across different domains, and the automated tools most likely to surface scientifically valuable surprises remain concentrated in well-funded organizations. Broadening access, building standardized anomaly-reporting channels into these tools, and creating the funding conditions and open-source norms that make this the default rather than the exception are the structural prerequisites.

The second (call it ten thousand Lavoisier Laboratories) is reform on the receiving end, institutional structures that make it rational for researchers to attend to practitioner-generated anomalies. Today, such anomalies face significant barriers. No obvious journal venue, no grant mechanism designed to evaluate them, no credentialing system that recognizes their source. But even if these structural barriers were removed, the previous section argued that something deeper is needed: not just shared data but shared problems, discussed between practitioners and researchers working on different facets of the same challenge. Wedgwood and Lavoisier did not exchange data files; they exchanged ideas, iteratively. The culture that made this natural, where the distance between theorist and practitioner was small enough that surprises traveled freely, is what we most lack and what is hardest to engineer. The bright spots that exist today (arXiv, Hugging Face, HHMI’s open science requirements, federated consortia like MELLODDY) are genuine but remain closer to sharing artifacts than to sharing problems.

How far AI advances at the interpretation layer — whether it can, as Litt asks, invent new techniques, ask the right question, make an interesting new definition — matters too. If it advances substantially, the relationship between human and artificial intelligence in science will change in ways difficult to predict, potentially compressing the role of human scientists toward higher-order judgment and oversight. But even in that scenario, the observations that reorient inquiry still arise from distributed contact with reality, and the infrastructure connecting them to the people who can theorize about them still determines how fast the system learns. The more powerful the AI, the more consequential the design of that infrastructure.

The first Industrial Enlightenment was not caused by any single invention. It was sustained by a network of institutions that made the feedback loop between propositional and prescriptive knowledge self-reinforcing. AI is the most powerful technology for connecting scientific knowledge to practical application since the printing press, but a technology, however powerful, does not by itself generate the norms, incentives, and channels of communication that constitute an institution. The printing press needed publishers, vernacular translations, and networks of distribution to reshape European intellectual life. The Republic of Letters, the informal networks of open scholarly correspondence that emerged alongside it, helped lay the groundwork for the Industrial Enlightenment. AI needs open data infrastructure, incentives to share, and mechanisms that route practitioners’ surprises back to the researchers who can act on them.

This essay has tried to frame what we need to build around AI. The gains within existing channels are already extraordinary. But the deeper prize, self-reinforcing knowledge spirals that compound across domains the way Mokyr showed they once compounded across the workshops and laboratories of eighteenth-century Europe, requires institutions we have not yet finished building. Dario’s vision is a Country of Geniuses. It offers extraordinary intelligence, but leaves the question of how knowledge flows between people largely unexamined. What Mokyr’s framework suggests we need is closer to a Republic of Tokens. Distributed, participatory, and designed so that a practitioner’s surprising observation can reach the researcher who needs it as readily as Wedgwood’s letters once reached Lavoisier. That is the work ahead.

Throughout this essay I use “AI” as shorthand for large language models and the systems built on top of them.

Litt’s full reflection is worth reading. He asks whether models will have absolute advantage over humans in all aspects of mathematical research, “not just proving theorems, but the messy intangibles of taste, creativity, theory-building, philosophy,” and separately, whether they will have comparative advantage. His answer: “There exist mathematicians who are much more talented than I am. And yet I am still producing results they have not produced… In part this is because the attention of such mathematicians is a limited resource; in part, I think, it is because different mathematicians have wildly different interests, instincts, approaches to the subject.” He sees no reason this should not persist with powerful AI mathematicians: “Presumably the attention of such will be a much less limited resource. But mathematics is very, very large.”

Peirce’s tripartite distinction has been debated extensively. Several philosophers have argued abduction is compatible with, and perhaps reducible to, Bayesian confirmation theory. The question for AI is not whether generating a novel framework is in principle continuous with selecting among known candidates (it must be, since both are physical processes) but whether that continuity is practically exploitable: whether the same training methods that produce excellent selection also produce genuine generation, or whether the path between them is long enough to constitute a functionally different problem.

Psillos, in “Abduction: Between Conceptual Richness and Computational Complexity” (Abduction and Induction, Kluwer, 2000), proposed that abduction’s role may be limited to selecting plausible candidates for testing, with subsequent evaluation following Bayesian lines, updating beliefs about candidate hypotheses as evidence accumulates.

The “scaling hypothesis” holds that increasing compute, data, and model size will continue to yield qualitative gains in capability. It has been remarkably predictive for deductive and inductive tasks. Whether it extends to the deepest forms of scientific creativity is precisely what is at issue.

The philosopher Michael Polanyi argued in Personal Knowledge (1958) that scientists are guided by “intellectual passions,” a tacit sense of what is worth pursuing that operates before definitive evidence is available. Imre Lakatos refined the point in The Methodology of Scientific Research Programmes (1978): scientific programs have protected core assumptions and revisable outer belts. When Mercury’s orbit didn’t fit, most physicists revised the belt (hypothesizing a hidden planet). Einstein abandoned the core (absolute space and time). Both were logically consistent with the data; what separated them was a judgment about explanatory depth. W.V.O. Quine’s “Two Dogmas of Empiricism” (1951) made a related argument: that no observation can ever definitively confirm or refute a single hypothesis in isolation, because our beliefs face experience as a whole.

A number of companies are building infrastructure that could give AI systems something closer to causal interaction with reality: automated wet labs (Automata, Periodic, Converge Bio), clinical AI environments (Medra), spatial intelligence (World Labs, Advanced Machine Intelligence (AMI) Labs), and full-cycle AI research platforms (Edison Scientific). Whether these will enable genuine framework generation remains an open question.

There is early work on closing this gap. “Curiosity-driven” reinforcement learning methods attempt to give models intrinsic reward signals for exploring novel states, analogous to the intrinsic motivation that drives human inquiry. Reinforcement learning from verifiable rewards (RLVR), which has produced dramatic gains in mathematics and coding, is beginning to expand into chemistry, biology, and other domains. And Dupoux, LeCun, and Malik (“Why AI Systems Don’t Learn and What to Do About It,” March 2026) propose a more fundamental rethinking: an architecture integrating learning from passive observation with learning from active behavior in the world, arguing that current systems cannot learn post-deployment because they lack the capacity to switch between these learning modes. Whether these approaches can eventually provide reward signal for the kind of open-ended exploration that characterizes deep scientific creativity remains to be seen. But the direction is active and worth watching.

For a recent quantitative analysis of individual scientists’ exploration-exploitation strategies, see Huang et al.,“Exploration and Exploitation: Which Research Strategy Are You Better At?” (Quantitative Science Studies, 2025), which finds that exploitation strategies tend to stifle research performance while exploration strategies are high-risk but high-yield.

Hollis Robbins

Mar 23

This is excellent of course. Your claim about the research enterprise is roughly the same as I'm making about the higher ed enterprise: AI cannot produce knowledge that depends on particular people in particular places doing particular work under expert supervision. Universities are supposed to be where expert knowledge transfer happens but instead, most schools deliver generalities at scale.

Gavin

Mar 29Edited

Great essay!

> Pre-empirical commitments, the values that orient a scientist toward a particular kind of explanation before the data can adjudicate: simplicity, fertility, unification, explanatory depth.

I think Nicholas Maxwell's Aim-Oriented Empiricism framework was a good attempt to make a general solution to this, at least for the domain of fundamental physics:

https://aeon.co/essays/bring-back-science-and-philosophy-as-natural-philosophy

https://www.amazon.com.au/dp/155778924X

> These systems are producing novel hypotheses by synthesizing across vast bodies of literature and data at a scale no individual could match.

It seems like LLMs basically end up doing the old idea of literature-based discovery (LBD) at scale, and it is very impressive. LBD formally relied on explicit linkages between concepts in the literature to infer new ideas, whereas it looks like LLMs can go a bit beyond that and infer new ideas from concepts that are only implicitly linked in the existing literature. But it's not clear how far a datacenter of LLM experts could push into the adjacent possible solely based on a very deep understanding of existing literature - I feel it will become very hallucination-prone and ungrounded from reality the further it goes, so I agree that causal connection with reality will be critical for developing fundamentally new knowledge.

> mechanisms that route practitioners’ surprises back to the researchers who can act on them.

Finding and engaging with communities of practice (or maybe professional societies as a first step) seems like a good way to do this!

5 more comments...

Out of Sample

Discussion about this post

Ready for more?