Scientific understanding

A growing part of my work is focused on AI4Science — but not only in the narrow sense of using machine learning to accelerate analysis. I am interested in a deeper question:

What would it mean for an artificial system to participate in scientific understanding?

This connects my technical work in AI4Science to long-standing questions in philosophy of science: hypothesis formation, explanation, underdetermination, auxiliary assumptions, model revision, and the structure of scientific understanding. My thinking here is influenced by Leibniz’s dream of formal systems of reasoning, Popper’s view of science as conjecture and refutation, and Quine’s picture of knowledge as a web of interdependent commitments rather than isolated propositions.

The goal is not merely to build an AI assistant that summarizes papers or proposes plausible hypotheses. The deeper goal is to build systems that can participate in the cycle of scientific discovery: generating hypotheses, linking them to evidence, identifying auxiliary assumptions, proposing discriminating experiments, revising models when predictions fail, and producing interpretable descriptions of physical systems.

A transversal program across N⁴

AI4Science is not a fifth N. It is a transversal program that draws on all four faces of the framework at once.

N¹

NeuroPhysics

Many target systems are physical and biological — their dynamics must be inferred from incomplete, noisy observations.

N²

NeuroComputation

It asks what it means for a system to compute with models, explanations, and hypotheses.

N³

NeuroDynamics

Discovery itself is iterative: conjecture, testing, revision, stabilization, and occasionally rupture.

N⁴

NeuroAI

It asks how artificial systems can be designed to assist, extend, or formalize scientific reasoning.

Three synergistic components

Hypothesis generation

Supported by the MIT Generative AI Impact Consortium, I am developing a hypothesis-generation engine for scientific discovery. The aim is to move beyond literature summarization toward structured hypothesis formation: extracting claims, mechanisms, assumptions, evidence, contradictions, and open questions from scientific corpora, and using them to generate testable hypotheses.

The emphasis is on provenance and epistemic structure. A useful scientific AI system should not simply output a hypothesis. It should expose where the hypothesis came from, what assumptions it depends on, what evidence supports or weakens it, and what observations would discriminate it from alternatives.

Interpretable symbolic regression for physical systems

A second component focuses on neural networks, approximation theorems, and symbolic regression. Here the goal is to use neural models not as opaque predictors, but as tools for discovering interpretable structure in observations of physical systems.

Can neural networks help recover the governing forms, latent variables, or effective equations underlying observed dynamics?

This connects machine learning to system identification, dynamical systems, and the physics tradition of seeking compact explanatory descriptions. The aim is not prediction alone, but interpretable discovery.

Co-scientist engines for data-driven discovery

The third component integrates hypothesis generation and interpretable model discovery into a broader co-scientist engine for data-driven discovery of physical systems.

Such a system should connect literature, data, models, simulations, symbolic structure, and experimental design. It should be able to move between natural-language hypotheses and mathematical representations; between observed data and candidate mechanisms; between exploratory search and confirmatory evaluation.

The long-term aim is an AI system that does not replace scientific judgment, but augments the scientific process by making the structure of reasoning more explicit, inspectable, and generative.

In this sense, AI4Science becomes part of a broader project: understanding understanding itself.