Companion post to:
Compression, Regularity, Randomness and Emergent Structure: Rethinking Physical Complexity in the Data-Driven Era
Nima Dehghani arXiv . 2505.07222 (2025) DOI: https://doi.org/10.48550/arXiv.2505.07222

Rethinking Physical Complexity in the Data-Driven Era

Compression, Regularity, Randomness, and the Role of Machine Learning

Across physics, complex systems, computational neuroscience, and machine learning, we often face the same basic problem: we observe a system through data, and we want to know whether the data contain structure.

But “structure” is not a simple word.

A system may be predictable but trivial, like a pure sine wave. It may be unpredictable but structurally uninteresting, like idealized white noise. Or it may sit somewhere in between: partly regular, partly noisy, partly compressible, partly surprising. Many of the systems that matter scientifically—turbulent flows, biological networks, neural population activity, non-equilibrium physical systems, evolving ecosystems, and trained neural networks—live in precisely this intermediate regime.

This is the regime we usually call complex.

In our paper, Compression, Regularity, Randomness and Emergent Structure: Rethinking Physical Complexity in the Data-Driven Era, we revisit the foundations of complexity measurement from a contemporary data-driven perspective. The goal is not simply to review existing measures. There are already many excellent reviews of entropy, algorithmic complexity, information flow, and dynamical systems metrics. Instead, the goal is to ask a more structural question:

What are these measures actually measuring?

More specifically:

Are they measuring randomness?
Are they measuring regularity?
Are they measuring complexity?
Are they computationally accessible?
And how do modern machine-learning methods approximate the classical ideals that complexity theory often defines but cannot compute?

The central claim of the paper is that many classical and modern approaches can be organized around three interrelated but distinct axes:

regularity,
randomness, and
complexity.

This may sound simple, but it changes how we interpret many familiar measures.

Complexity is not just “between order and disorder”

A common intuition in complexity science is that complexity lies somewhere between perfect order and complete randomness. This intuition is useful, but it can also be misleading if interpreted too literally.

Perfect order is not complex. A crystal lattice, a repeated string, or a sinusoidal oscillation can be highly regular and easily described. Complete randomness is also not complex in the structural sense. A sequence of independent fair coin flips has maximal unpredictability, but it does not contain rich organization. It is incompressible, but not meaningful.

Complexity is different. It involves structure that is not trivial, regularity that is not simply periodic, and randomness that does not fully erase organization.

A useful way to think about this is that regularity and randomness are not strict opposites. They can coexist. A system may be locally unpredictable but globally structured. A chaotic attractor may generate trajectories that are difficult to predict point-by-point, while still occupying a highly organized region of state space. Neural population activity may appear noisy at the level of single units or electrodes, while still expressing low-dimensional modes, metastable states, or structured collective dynamics.

This is why a single entropy value rarely tells the full story. High entropy may indicate genuine randomness, but it may also reflect hidden structure that the chosen representation fails to capture. Low entropy may indicate regularity, but not necessarily complexity. The key is to distinguish between unpredictability, compressible structure, and nontrivial organization.

The three axes: regularity, randomness, complexity

The paper organizes the landscape around three axes.

Randomness

Randomness refers to unpredictability, irreducibility, or the absence of identifiable pattern. In statistical information theory, randomness is often associated with entropy. A probability distribution with many equally likely outcomes has high Shannon entropy. A time series whose future remains uncertain even given its past has a high entropy rate.

At the algorithmic level, randomness has a deeper meaning. A string is algorithmically random if there is no shorter program that generates it. Its shortest description is essentially the string itself. This is the intuition behind Kolmogorov complexity.

But these two notions are not identical. A sequence can appear statistically random under one description while still being generated by a compact deterministic rule. This distinction matters enormously in empirical science, because our measurements are always filtered through a choice of representation, scale, and coarse-graining.

Regularity

Regularity refers to predictable, recurring, or compressible structure. A system has regularity when there are patterns that can be used to summarize or predict it. These patterns may be periodicities, symmetries, conservation laws, low-dimensional manifolds, attractor geometry, causal states, motifs, or governing equations.

Regularity is what makes modeling possible. If a dataset has no regularity, there is nothing to learn except its raw realization. If it has regularity, then a model can compress it.

In this sense, regularity is tightly connected to scientific explanation. A good physical theory is a compression of many observations into a compact generative structure. Newton’s laws, Maxwell’s equations, the Navier-Stokes equations, statistical mechanics, and modern effective theories all operate by compressing observed phenomena into mathematical structures with predictive power.

Complexity

Complexity arises when regularity and randomness interact nontrivially. A complex system is not merely random, but it is also not simply ordered. Its structure may be distributed across scales. It may require memory to predict. It may exhibit emergent macrostates, collective modes, nonlinear interactions, or synergistic dependencies among components.

This is why complexity should not be equated with entropy. Entropy measures uncertainty. Complexity concerns the organization of structure under uncertainty.

A perfectly regular system can have low entropy and low complexity. A purely random system can have high entropy and low structural complexity. A complex system often lives in the middle: it contains enough regularity to be modeled, but enough variability, hierarchy, or nonlinear interaction to make the model nontrivial.

Compression as the organizing principle

The paper uses compression as a unifying lens.

Compression gives an operational meaning to all three axes:

Regularity enables compression.
If data contain recurring patterns, symmetries, or low-dimensional structure, they can be described more compactly than by listing every observation.
Randomness resists compression.
A truly random sequence has no shorter description than itself.
Complexity corresponds to nontrivial compression.
A complex system is compressible, but not trivially so. Its compressed description may require a sophisticated model, a long computation, or a hierarchy of representations.

This compression-based view links classical information theory, algorithmic complexity, statistical mechanics, and modern machine learning.

In algorithmic information theory, the Kolmogorov complexity of an object is the length of the shortest program that generates it. In minimum description length approaches, the best model is the one that balances model complexity against the cost of encoding the data given the model. In physics, a successful theory compresses observations into laws, symmetries, and state variables. In machine learning, latent-variable models compress high-dimensional data into lower-dimensional representations that preserve what is useful for reconstruction, prediction, or generation.

The language changes across fields, but the underlying question is similar:

What is the shortest, most structured, most useful description of the data?

A taxonomy of measures

The paper groups complexity-related measures into three broad families: statistical, algorithmic, and dynamical.

Each family captures a different aspect of the regularity-randomness-complexity landscape.

1. Statistical measures: uncertainty and apparent disorder

Statistical entropy measures quantify uncertainty in observed outcomes. Shannon entropy is the canonical example. Rényi and Tsallis entropies generalize this idea by changing sensitivity to rare or common events, heavy tails, or non-extensive behavior. Approximate entropy, sample entropy, and permutation entropy adapt entropy-like reasoning to time series, especially in physiological and dynamical settings.

These measures are powerful because they are usually computable. They can be applied to empirical data with relatively modest assumptions. This is why entropy-based metrics are widely used in neuroscience, physiology, finance, climate science, and complex systems analysis.

But statistical entropy measures have a limitation: they primarily quantify unpredictability. They do not necessarily detect deep structure.

A high-entropy signal may be genuinely random, or it may contain structure that is invisible to the chosen statistic. A low-entropy signal may be predictable, but not complex. Entropy is therefore essential, but incomplete.

This matters in computational neuroscience. Neural signals often look noisy, variable, and high-dimensional. Entropy-based measures can quantify aspects of this variability, but they may miss structured population dynamics, latent trajectories, coordination across areas, or task-dependent low-dimensional manifolds. The same issue appears in statistical physics and complex systems: macroscopic order may be hidden beneath microscopic fluctuations.

Statistical measures are therefore best understood as measures of uncertainty, not as complete measures of complexity.

2. Algorithmic measures: structure, description length, and depth

Algorithmic measures move from probability distributions to descriptions.

The central object here is Kolmogorov complexity: the length of the shortest program that generates a string or dataset. If a sequence can be generated by a short program, it has low Kolmogorov complexity. If no shorter description exists, it is algorithmically random.

This gives a profound definition of randomness and regularity. But it also introduces a major problem: Kolmogorov complexity is uncomputable in general.

Other algorithmic measures refine this picture.

Effective complexity, introduced by Gell-Mann and Lloyd, aims to isolate the complexity of the regularities in a system rather than counting randomness itself. A random string may have high Kolmogorov complexity, but low effective complexity, because it lacks meaningful structure. Effective complexity asks: how complex is the structured part?

Logical depth, introduced by Bennett, measures the computational effort required to unfold an object from its compressed description. This captures the intuition that complex objects are not merely compressible; they may require a long generative history. A random string is shallow because it is just printed directly. A trivial string is shallow because it is generated easily. A deep object is one whose compact description requires substantial computation to produce the observed structure.

Statistical complexity, from computational mechanics, measures the amount of information stored in the causal states of a process—the minimal predictive memory needed to optimally forecast its future. This is especially important for dynamical systems, because it links complexity to prediction.

These measures are conceptually powerful because they distinguish structure from noise. But they are also difficult or impossible to compute exactly. This is the central tension of classical complexity theory:

The measures that most deeply capture structure are often the least accessible in practice.

3. Dynamical information measures: storage, transfer, and modification

Many empirical systems are not static objects. They evolve. Their complexity lies not only in what is present at one time, but in how information is stored, transmitted, transformed, and generated over time.

Dynamical information measures address this.

Entropy rate measures the uncertainty of a process after accounting for its past. A low entropy rate indicates strong temporal regularity; a high entropy rate indicates persistent unpredictability.

Transfer entropy measures directed information flow from one process to another. It asks whether knowing the past of process (X) improves prediction of the future of process (Y), beyond what (Y)’s own past already provides. This has made transfer entropy influential in neuroscience, where researchers often want to infer directed functional interactions among brain regions or neural populations.

Active information storage measures how much of a system’s future is predictable from its own past. This is a measure of memory.

Information modification and related ideas from partial information decomposition attempt to quantify something even subtler: the emergence of new information through synergistic interactions among multiple sources. This is especially relevant for biological and neural systems, where collective computation may not be reducible to pairwise interactions.

These dynamical measures are important because complexity is often temporal. In cortical dynamics, for example, the relevant structure may not be visible in a static snapshot. It may appear in transitions between states, traveling waves, metastable assemblies, feedback loops, cross-frequency interactions, or context-dependent information flow.

Dynamical measures therefore occupy a crucial region of the landscape: they connect randomness, regularity, and complexity to time.

The computability bottleneck

Once these measures are placed side by side, a pattern emerges.

The easiest measures to compute are often the shallowest conceptually. Shannon entropy, permutation entropy, and related statistics can be estimated directly from data, although with the usual finite-sample and estimator issues.

The deepest measures—Kolmogorov complexity, logical depth, sophistication, and effective complexity—are either formally uncomputable or practically inaccessible without strong assumptions.

This gives rise to a depth-accessibility tradeoff:

Shallow measures are easy to compute but may miss structure.
Deep measures capture structure more faithfully but are often uncomputable.
Intermediate measures require modeling assumptions, estimators, or approximations.

This is not just a technical inconvenience. It is a foundational issue. Complexity science has often defined its most powerful concepts in idealized mathematical terms, while empirical science requires finite data, noisy measurements, limited samples, and imperfect models.

The practical question becomes:

If the ideal measure is uncomputable, what kind of approximation is scientifically useful?

This is where modern machine learning enters the story.

Machine learning as an approximation to classical complexity ideals

Modern machine learning does not solve the uncomputability of Kolmogorov complexity. A neural network does not magically compute the shortest possible program for a dataset.

But machine learning does something operationally important. It approximates many of the goals that classical complexity measures formalize.

Deep learning models search for compressed representations. They identify latent variables. They discard irrelevant variability. They learn predictive structure. They can impose physical constraints. They can discover approximate governing equations. They can generate new samples from learned distributions.

In other words, many modern data-driven methods act as pragmatic surrogates for uncomputable or inaccessible complexity measures.

They do not compute the classical quantities exactly. But they approximate their operational roles.

Autoencoders and VAEs: compression in latent space

Autoencoders provide the simplest example.

An autoencoder maps high-dimensional data (x) into a lower-dimensional latent representation (z), and then reconstructs (x) from (z). The bottleneck forces the model to preserve information that is useful for reconstruction while discarding irrelevant variation.

This is a practical form of compression.

Variational autoencoders extend this idea by introducing a probabilistic latent space and a regularized objective. The VAE balances reconstruction accuracy against the complexity or entropy of the latent representation. This is not identical to minimum description length or Kolmogorov complexity, but it is strongly related in spirit: the model seeks a compact latent code that captures the generative structure of the data.

For complex physical or biological systems, this matters because the latent space may reveal macroscopic variables that are not obvious in the raw measurements.

In computational neuroscience, for example, latent-variable models have become central for interpreting population activity. High-dimensional neural recordings often contain lower-dimensional trajectories related to movement, perception, attention, internal state, or task structure. The latent model compresses neural activity into coordinates that may better reflect the underlying dynamical organization.

From the perspective of complexity theory, this is not merely dimensionality reduction. It is an attempt to separate regularity from noise.

Latent ODEs, Koopman models, and dynamical regularity

For time-evolving systems, static compression is not enough. We want latent variables that evolve according to structured dynamics.

Latent ODEs model trajectories in a continuous-time latent space. Rather than treating observations as independent samples, they learn a dynamical system whose latent state evolves over time. This is especially relevant for irregularly sampled biological, physical, or clinical data.

Koopman autoencoders pursue a related but distinct idea: they seek latent coordinates in which nonlinear dynamics become approximately linear. This connects deep learning to classical dynamical systems theory. The hope is that a complex nonlinear system may become simpler when viewed in the right coordinates.

Both approaches approximate the same broad objective: find a compressed representation in which temporal evolution becomes more predictable.

In the language of the paper, they are operational approximations to dynamical regularity. They reduce entropy rate in the learned representation. They preserve predictive information. They seek a latent space in which complexity becomes organized enough to model.

Symbolic regression and the search for compact laws

Another important class of methods is symbolic regression.

Methods such as SINDy, AI Feynman, and related approaches search for explicit equations that describe observed dynamics. Rather than learning a black-box mapping, they attempt to recover compact symbolic rules.

This has a direct connection to compression. A symbolic equation is a short description. If a system’s behavior can be generated by a compact equation, then the equation is a compressed representation of the data.

Symbolic regression therefore approximates a central scientific ideal: the discovery of governing laws.

It also resonates with logical depth. A short equation may unfold into rich, complex behavior over time. The complexity is not in the length of the equation alone, but in the computational unfolding of that equation into trajectories, patterns, bifurcations, or emergent structures.

This is one reason symbolic regression is so attractive for physics and complex systems. It does not merely predict; it offers a candidate compressed explanation.

Physics-informed neural networks: constraining the search for structure

Unconstrained neural networks are powerful function approximators, but they can also learn the wrong structure. They may fit noise, exploit shortcuts, or produce representations that are predictive but physically meaningless.

Physics-informed neural networks address this by embedding physical constraints directly into the learning objective. Differential equations, conservation laws, boundary conditions, symmetries, or known operators can be included in the loss function.

This changes the nature of learning. The model is no longer searching over arbitrary functions. It is searching within a space constrained by physical regularity.

In the language of effective complexity, this is crucial. Effective complexity is about the structured regularities of a system, not the accidental details or noise. PINNs and related physics-informed models attempt to bias learning toward those regularities.

This is also a broader lesson for AI in science. Predictive accuracy is not enough. Scientific models must learn the right compression: a representation that captures the relevant structure of the system, not merely a curve-fitting shortcut.

Latent spaces as operational arenas for complexity

One of the main arguments of the paper is that latent spaces should not be viewed merely as engineering devices.

A latent space is an arena in which a learning system negotiates among regularity, randomness, and complexity.

In a good latent representation:

regularities are compressed,
irrelevant variability is discarded or marginalized,
uncertainty is represented,
dynamics become more predictable,
structure becomes easier to manipulate, interpret, or generate.

This makes latent spaces operational approximations to classical complexity-theoretic ideals.

Classical theory asks: what is the shortest description, the meaningful structure, the predictive state, the computational unfolding, the non-random regularity?

Modern machine learning asks similar questions in a different language: what latent representation minimizes reconstruction loss, maximizes predictive likelihood, respects physical constraints, separates generative factors, or supports downstream generalization?

The connection is not exact, but it is deep.

Latent spaces are where compression, regularity extraction, and complexity management become computationally actionable.

Why this matters for computational neuroscience

The computational neuroscience audience has a special stake in this discussion.

Neural data are high-dimensional, noisy, nonstationary, multiscale, and dynamical. They are also structured. The brain is not a white-noise generator. Nor is it a simple clock. It is a system in which microscopic variability and macroscopic organization coexist.

This makes neuroscience a natural domain for the regularity-randomness-complexity framework.

Entropy-based measures can quantify variability in spike trains, LFPs, EEG, ECoG, calcium imaging, or behavioral signals. Dynamical measures can probe information flow, memory, and directed interactions. Algorithmic and compression-based perspectives can help distinguish meaningful structure from noise. Latent dynamical models can reveal low-dimensional trajectories and state-space organization.

But the framework also warns against overinterpretation.

A neural signal with high entropy is not necessarily “more complex.” A low-dimensional latent trajectory is not automatically the true underlying computation. A good reconstruction does not guarantee a physically or biologically meaningful representation. A learned model may compress the data in a way that is useful for prediction but misaligned with the mechanisms we care about.

This is why complexity measures and machine-learning models should be treated as part of a broader inferential strategy. They require validation, surrogate controls, synthetic benchmarks, perturbation tests, and domain-specific interpretation.

The goal is not to assign a single complexity number to the brain. The goal is to understand which aspects of neural activity are regular, which are random or effectively unpredictable, and which reflect nontrivial dynamical organization.

Why this matters for physics and complex systems

For physicists and complex-systems researchers, the key point is that modern data-driven methods are not separate from the classical questions of complexity science. They are increasingly becoming practical tools for approximating them.

Physics has always relied on compression. A law of nature is a compression of empirical regularity. A state variable is a coarse-grained representation. A phase diagram is a compressed map of possible regimes. A renormalization-group flow organizes behavior across scales.

Machine learning now adds a new layer to this tradition. It can search for latent variables, discover approximate symmetries, model high-dimensional dynamics, and identify compact representations in regimes where hand-derived theory is difficult.

But the success of these methods should be interpreted through the lens of complexity. What kind of structure has the model learned? What has it discarded? What regularities are imposed by the architecture? What randomness is modeled explicitly? What is merely absorbed into latent noise? What is the computational cost of unfolding the representation back into observed behavior?

These are not secondary questions. They determine whether a learned model is scientifically meaningful.

The limits of neural proxies

It is important to be clear: machine learning does not eliminate the foundational problems.

Autoencoders may learn compressed representations, but not necessarily minimal or interpretable ones. VAEs impose priors that may oversmooth rare events or distort multimodal structure. Latent ODEs may impose smooth dynamics where the true system has abrupt transitions. Koopman models may find useful linear embeddings without identifying the mechanistic variables of interest. PINNs may enforce the wrong physics if the assumed equations are incomplete.

Every proxy has an inductive bias.

This is why the paper argues for a pluralistic and operational view. There is no universal complexity measure that works for every system, scale, and scientific question. The right measure depends on the data, the domain, the coarse-graining, the available prior knowledge, and the purpose of the analysis.

For some questions, entropy is enough. For others, one needs dynamical information flow. For others, compression-based proxies, symbolic regression, or physics-informed latent models are more appropriate.

The practical question is not “what is the complexity of this system?” in the abstract.

The better question is:

Which form of regularity, randomness, or complexity is relevant to the scientific problem at hand, and what approximation can capture it with known biases?

Toward operational complexity

The broader motivation of the paper is to move from abstract complexity to operational complexity.

Operational complexity means that we do not merely define ideal quantities. We ask how they can be approximated, estimated, learned, validated, and used in real scientific workflows.

This requires connecting several traditions:

statistical mechanics and coarse-graining,
algorithmic information theory,
computational mechanics,
nonlinear dynamics,
information theory,
representation learning,
physics-informed machine learning,
computational neuroscience,
and AI-guided scientific discovery.

The common thread is compression: the search for structured descriptions that preserve what matters.

In the data-driven era, complexity science is no longer only about defining measures. It is also about building models that can discover the right representations from data.

Outlook

The next generation of scientific machine learning should not only optimize predictive accuracy. It should explicitly incorporate ideas from complexity theory:

compression,
regularity extraction,
uncertainty modeling,
coarse-graining,
dynamical structure,
symbolic representation,
and the separation of meaningful structure from noise.

For AI applied to physical and biological systems, this means building models that do more than fit data. They should reveal structure, respect constraints, expose latent variables, support interpretation, and generalize across regimes.

For complexity science, machine learning offers a new operational toolkit. It cannot compute the uncomputable, but it can approximate some of the goals that motivated those ideal measures in the first place.

This is the central message of the paper: classical complexity theory and modern machine learning are not separate conversations. They are converging on the same problem from different directions.

Classical theory asks what complexity is.

Machine learning asks how structure can be learned from data.

The data-driven era forces us to bring these questions together.

Rethinking Physical Complexity in the Data-Driven Era

Rethinking Physical Complexity in the Data-Driven Era

Compression, Regularity, Randomness, and the Role of Machine Learning

Complexity is not just “between order and disorder”

The three axes: regularity, randomness, complexity

Randomness

Regularity

Complexity

Compression as the organizing principle

A taxonomy of measures

1. Statistical measures: uncertainty and apparent disorder

2. Algorithmic measures: structure, description length, and depth

3. Dynamical information measures: storage, transfer, and modification

The computability bottleneck

Machine learning as an approximation to classical complexity ideals

Autoencoders and VAEs: compression in latent space

Latent ODEs, Koopman models, and dynamical regularity

Symbolic regression and the search for compact laws

Physics-informed neural networks: constraining the search for structure

Latent spaces as operational arenas for complexity

Why this matters for computational neuroscience

Why this matters for physics and complex systems

The limits of neural proxies

Toward operational complexity

Outlook

The room this opens