Cortical Machinery as a Design Principle for RNNs
Companion post to:
Harnessing Cortical Geometry, Wiring, and Function as Inductive Biases for RNNs
Mo Shakiba, Rana Rokni, Mohammad Mohammadi, Nima Dehghani
arXiv 2026.
DOI: ``
Harnessing Cortical Geometry, Wiring, and Function as Inductive Biases for RNNs
A central problem at the interface of neuroscience, machine learning, and complex systems physics is whether the organization of biological circuits can be translated into useful inductive biases for artificial systems.
Recurrent neural networks are a natural setting for this question. RNNs support temporal integration, memory, decision-making, and the maintenance of internal state. They are widely used both as engineering tools and as models of neural computation. Yet most artificial recurrent networks are still initialized generically and trained with relatively unconstrained recurrent connectivity. Cortex is not built this way. Cortical neurons occupy physical space. Their connectivity is shaped by wiring cost. Their recurrent interactions are embedded in anatomical structure. Their functional relationships are expressed through activity.
In this paper, we ask whether those measured cortical regularities can be used directly as inductive structure for artificial recurrent networks.
The question is not simply whether we can make RNNs look more biological. The sharper question is:
Which aspects of cortical organization actually help recurrent networks learn?
To address this, we use data from the MICrONS functional connectomics program, where two-photon calcium imaging is co-registered with high-resolution electron microscopy reconstruction in the same mouse visual cortex. This makes it possible to link neuronal position, anatomical connectivity, and functional activity within a common cortical substrate.
The core idea is simple. Instead of treating cortical data only as something to model or predict, we treat it as something to build with.
From stylized spatial embedding to measured cortical priors
Recent work on spatially embedded RNNs has shown that assigning recurrent units positions in physical space and penalizing long-distance communication can produce sparse, modular, and small-world architectures while preserving strong task performance. This is an important conceptual step because it treats physical embedding and wiring economy as computationally meaningful constraints rather than biological ornamentation.
But many such models still rely on artificial spatial grids or stylized embeddings. In our work, we replace these abstract constraints with measured cortical structure.
Using MICrONS, we derive three classes of cortical priors:
- Function-derived recurrent weight initialization
- Real neuronal spatial embedding
- Communicability-aware regularization derived from anatomical connectivity
We then construct eleven RNN variants that selectively include or remove these priors. This allows us to treat cortical organization not as one monolithic biological constraint, but as a set of separable design factors.
That separation is important. It allows us to ask whether performance is driven mainly by function, geometry, wiring, or their interaction.
The three cortical priors
1. Functional initialization: $W^*$
Instead of relying only on orthogonal or random recurrent initialization, we initialize recurrent weights using functional relationships derived from calcium-imaging activity.
The base recurrent weights are sampled from a log-normal distribution, motivated by the heavy-tailed structure of biological synaptic weights. These weights are then modulated element-wise by functional relationships among neurons, including Pearson correlation and the Spike Time Tiling Coefficient.
In schematic form, the biologically informed recurrent initialization is
\[W_{\mathrm{bio}} = W_{\mathrm{lognormal}} \odot \mathrm{Corr} \odot \mathrm{STTC},\]where $\odot$ denotes element-wise multiplication.
We also test an alternative initialization based on the precision matrix. This is useful because the precision matrix emphasizes conditional dependencies by removing variance shared through the rest of the population. Correlation captures broad co-activity structure; precision gives a sparser view of putatively more direct functional dependency.
This gives the recurrent network an initial structure that reflects how neurons co-vary during activity, rather than starting from an uninformed random matrix.
2. Real spatial embedding: $D^*$
Spatially embedded RNNs typically place units on an artificial grid. Here, we instead embed recurrent units using measured MICrONS neuronal soma coordinates.
This lets us ask whether the actual spatial arrangement of cortical neurons provides useful structure for recurrent optimization. The distance matrix derived from these coordinates enters the regularization term, penalizing recurrent solutions that are spatially expensive.
The control condition, denoted $D$, uses artificial grid coordinates. This allows us to isolate the contribution of real cortical geometry.
The key question is therefore not whether space matters in the abstract, but whether measured cortical space matters.
3. Communicability-aware regularization: $C$ and $C^*$
Anatomical connectivity does not only define direct edges. It also defines possible routes through which activity can propagate across the network.
To capture this, we use communicability, a matrix-exponential graph measure that weights walks of all lengths through the network. In simplified form,
\[C = \exp\left(S^{-1/2} W S^{-1/2}\right),\]where $W$ is the adjacency matrix and $S$ is the diagonal matrix of node strengths.
This gives a communication-aware view of the anatomical graph. It is richer than simply asking whether two neurons are directly connected, and different from reducing the graph to shortest paths alone.
We test two formulations.
In the direct communicability condition, $C$, communicability enters the spatial regularization term directly:
\[\mathcal{L} = \mathcal{L}_{\mathrm{task}} + \lambda \|W \odot D \odot C\|.\]In the $C^*$ condition, we use an Earth Mover’s Distance penalty to match the distribution of communicability values between the empirical cortical graph and the artificial recurrent network:
\[\mathcal{L} = \mathcal{L}_{\mathrm{task}} + \lambda \|W \odot D\| + \lambda_{\mathrm{EMD}} \, \mathrm{EMD} \left( C_{\mathrm{emp}}, C_{\mathrm{art}} \right).\]Together, $W^$, $D^$, and $C/C^*$ allow us to ask how cortical function, geometry, and communication topology jointly shape recurrent learning.
A systematic ablation of cortical priors
The model family was designed as a structured ablation. Each variant selectively includes or removes one or more cortical priors: function-derived initialization, real spatial embedding, and communicability-aware regularization. This design lets us ask not only whether biological constraints help, but which biological constraints help, and under what combinations.
| Model variant | Functional initialization | Spatial embedding | Communicability constraint | Main comparison it enables |
|---|---|---|---|---|
| $W^D^C$ | Biological, from MICrONS activity | Real MICrONS coordinates | Direct communicability | Full cortical-prior model with direct communicability |
| $WD^*C$ | Standard/random | Real MICrONS coordinates | Direct communicability | Tests whether geometry and communicability help without biological weight initialization |
| $WDC$ | Standard/random | Artificial grid | Direct communicability | Stylized spatial-control model |
| $WD^*$ | Standard/random | Real MICrONS coordinates | None | Isolates the effect of real cortical geometry |
| $WD$ | Standard/random | Artificial grid | None | Spatial embedding control using non-biological coordinates |
| $W$ | Standard/random | None | None | Baseline simple RNN |
| $W^D^C^*$ | Biological, from MICRONS activity | Real MICrONS coordinates | EMD matching of communicability distributions | Full cortical-prior model with distributional communicability matching |
| $WD^C^$ | Standard/random | Real MICrONS coordinates | EMD matching | Tests EMD-based communicability without biological initialization |
| $W^DC^$ | Biological, from MICRONS activity | Artificial grid | EMD matching | Tests biological initialization without real cortical geometry |
| $W^!D^*C$ | Permuted biological initialization | Real MICrONS coordinates | Direct communicability | Tests whether biological weight statistics help even when neuron-to-neuron assignment is disrupted |
| $W^!D^C^$ | Permuted biological initialization | Real MICrONS coordinates | EMD matching | Tests the same permutation control under EMD-based communicability |
Here $W^$ denotes function-derived biological initialization, $W^!$ denotes a permuted biological initialization, $D^$ denotes real MICrONS neuronal coordinates, $D$ denotes an artificial spatial grid, $C$ denotes direct communicability regularization, and $C^*$ denotes an Earth Mover’s Distance penalty matching empirical and artificial communicability distributions.
This table is important because it shows that the study is not a single comparison between a “biological” model and a “non-biological” model. It is a factorial perturbation of cortical structure. Function, geometry, and communication topology are separated so that their individual and combined effects can be examined.
Task suite
We trained all model variants on three cognitive decision-making tasks.
The first was One-Choice Inference, where the network must integrate sequential stimuli across a delay to choose the correct movement direction.
The second was Perceptual Decision-Making, where the network must integrate noisy sensory evidence and identify the dominant alternative.
The third was Go/NoGo, where the network must maintain stimulus information and generate or suppress an action at the correct time.
These tasks probe different combinations of temporal integration, short-term memory, and decision control. They are not intended to exhaust the richness of cognition, but they provide a controlled setting for comparing recurrent architectures under matched conditions.
Main result: cortical priors improve learning
The strongest models were those combining function-derived initialization, real spatial embedding, and communicability-aware regularization. In particular, the $W^D^C$ and $W^D^C^*$ variants consistently performed among the best across tasks.
The important point, however, is not only that the fully constrained models performed well. The ablation structure revealed a hierarchy among the priors.
Functional initialization provided the largest and most general gain. Models initialized from functional cortical relationships strongly outperformed randomly initialized or partially constrained variants, especially on the more difficult tasks.
Real spatial embedding added a robust secondary benefit. Replacing artificial grid coordinates with measured neuronal coordinates improved performance across several matched comparisons, showing that cortical geometry itself contributes useful structure.
Communicability had a more selective effect. Communication-aware regularization refined learning and topology, especially when combined with real spatial embedding, but it was not the dominant driver of performance.
This hierarchy is one of the central messages of the paper. Cortical priors do not all matter equally. Function-derived initialization has the strongest effect, spatial embedding provides an additional computational advantage, and communicability shapes the learned solution more selectively.
For me, this is the important shift. The result is not simply that “biology helps.” The result is that different aspects of cortical organization have different computational roles.
Positive-only recurrence: biological initialization stabilizes difficult optimization
We also tested a more restrictive setting in which recurrent weights were constrained to be non-negative. This is not a full Dale-compliant excitatory/inhibitory network, since the models are built from excitatory MICrONS neurons, but it is a strong sign constraint on recurrent dynamics.
Under this positive-only constraint, randomly initialized models collapsed toward chance performance. In contrast, models initialized with functional cortical priors retained high accuracy.
This result suggests that biological initialization does more than improve average performance. It changes the optimization landscape. By starting the recurrent network in a structured region of parameter space, functional initialization makes learning more robust even when the recurrent architecture is severely constrained.
The permuted biological initialization, $W^!$, was especially informative. This variant preserves the empirical distribution of biologically derived weights but disrupts their original neuron-to-neuron assignment. Its performance was often comparable to $W^*$, indicating that part of the benefit comes from the statistical structure of the biological initialization rather than a strict one-to-one mapping between specific neurons and specific weights.
But distribution alone is not the full story.
When we resampled the nonzero entries of $W_{\mathrm{bio}}$ from its empirical cumulative distribution function and reassigned them to the same support, performance dropped substantially. Thus, simply matching the marginal weight distribution is not sufficient. The original biologically derived weight values contain task-relevant structure that is lost under distribution-matched randomization.
This is a subtle but important point. The biological prior is not reducible to a generic heavy-tailed distribution. Its values and structure carry computational information.
Cortical priors reshape the topology of learned recurrent networks
We then asked whether cortical constraints change not only task accuracy, but also the organization of the learned recurrent weights.
They do.
Across model variants, cortical priors shifted recurrent networks away from high-entropy, random-like configurations and toward more structured regimes. We quantified this using graph-theoretic measures including entropy, modularity, small-worldness, and assortativity.
Entropy
Functionally initialized and spatially grounded models tended to develop lower-entropy recurrent weight organization. This indicates that learning under cortical priors concentrates recurrent structure into more specialized, less random configurations.
Entropy here acts as a compact descriptor of how concentrated or diffuse the learned weight distribution becomes:
\[H(W) = - \sum_i p(w_i) \log_2 p(w_i).\]High-entropy models were generally closer to random-like regimes and performed poorly on the harder tasks. Low- and intermediate-entropy models were more structured and often performed better.
The implication is that cortical priors do not merely bias the initial state of the network. They alter the class of solutions reached by learning.
Modularity and small-worldness
Models combining functional priors, real spatial embedding, and direct communicability regularization developed stronger modular and small-world structure.
This matters because modular small-world topology is a canonical feature of brain networks. It supports local specialization while preserving efficient global communication. The learned RNNs therefore did not merely perform better; they converged toward organizational regimes that resemble known principles of biological network architecture.
Small-worldness was quantified as
\[\sigma = \frac{C/C_{\mathrm{rand}}}{L/L_{\mathrm{rand}}},\]where $C$ is the clustering coefficient, $L$ is the characteristic path length, and the denominator gives the corresponding random-network baselines.
However, topology and performance were not identical. Some $C^*$ variants achieved high task accuracy and low entropy without developing comparably strong modularity or small-worldness. This dissociation shows that sparse structured connectivity and modular small-world organization are related but distinct outcomes of learning.
This point is important. A network can become structured without becoming modular in the canonical small-world sense. The cortical priors constrain the solution space, but they do not force all successful networks into one topological endpoint.
Assortativity
Assortativity revealed another distinction. Some spatially constrained models developed positively assortative, hub-rich organization, while functionally initialized models more often shifted toward disassortative hub-periphery structure.
This suggests that different cortical priors favor different trade-offs between integration, segregation, and distribution of computational load. Cortical constraints do not impose a single canonical topology. They bias the family of solutions that learning discovers.
This is precisely why the ablation design matters. If we had treated biology as a single intervention, we would have missed the fact that function, geometry, and communicability pull the learned networks toward different regions of topological space.
MICrONS as an architectural resource for AI
For machine learning, the message is that better recurrent computation does not necessarily require larger models, more gates, or more architectural complexity.
The models here use the same basic RNN architecture. What changes is the inductive structure: how recurrent weights are initialized, how spatial cost is imposed, and how communication structure is regularized.
This points to a broader design principle. Biological data can be used not only as something to predict, but as something to build with. Measured cortical organization can provide priors that improve learning, robustness, and internal organization.
In this sense, MICrONS-like datasets are not only resources for neuroscience. They are architectural resources for AI.
This does not mean that artificial systems should copy cortex literally. The point is more precise. Cortical data can reveal reusable constraints: spatial embedding, wiring economy, structured functional initialization, communication-aware regularization, modularity, sparsity, and small-world organization. These are not superficial biological details. They are design principles that can be tested, ablated, and repurposed.
Why this matters for neuroscience
For neuroscience, the framework provides a way to reverse-engineer cortical computation through controlled model perturbation.
Because we can independently ablate function-derived initialization, real geometry, and communicability, we can ask which biological features are necessary for which computational outcomes. This makes the RNN model family a kind of experimental testbed.
The results suggest that anatomical wiring and geometry matter, but that functional relationships provide the strongest constraint on recurrent learning. This fits with a broader lesson from connectome-constrained modeling: structure alone may not uniquely determine dynamics. To understand computation, we need both wiring and activity.
The cortical graph provides a scaffold. Functional activity tells us how that scaffold is used.
This distinction is central. A connectome alone may define a space of possible dynamics, but not a unique computation. Function-derived structure narrows that space.
Relation to biological computation and physical computing
There is a broader way to read these results.
A recurrent network computes by evolving its state. Its computation is not only represented in its input-output map, but in the trajectory of its internal dynamics. From this perspective, initialization, geometry, and connectivity are not implementation details. They define the physical substrate in which computation unfolds.
This view is natural in neuroscience, but it is also relevant to machine learning and physical computing more generally. Biological systems compute under constraints: spatial embedding, energetic cost, wiring economy, noise, sign structure, local interactions, and recurrent dynamics. These constraints are often treated as inconveniences when building artificial systems. But in biological systems, they may be part of what makes computation efficient, robust, and reusable.
The present work treats cortical organization in exactly this way. It asks whether the constraints of the cortical substrate can be translated into artificial recurrent systems as useful inductive structure.
The answer is yes, but with nuance. The most useful biological constraint was not simply the anatomical graph. It was the function-derived structure of the activity. Geometry helped. Communicability helped more selectively. The computational value emerged from the interplay between measured function, measured space, and measured wiring.
What the study does not claim
It is important to be clear about the scope.
This is not a claim that the RNNs reproduce cortical computation in full biological detail. The models use excitatory neurons from MICrONS and do not yet incorporate the full diversity of inhibitory cell types, laminar structure, neuromodulation, synaptic dynamics, or developmental constraints.
The positive-only experiments are not a full Dale-compliant excitatory/inhibitory model. They are a strong sign-constrained test showing that functional initialization stabilizes learning under a difficult recurrent constraint.
The task suite is also controlled and simplified. One-Choice Inference, Perceptual Decision-Making, and Go/NoGo probe important recurrent capacities, but they do not capture the full richness of naturalistic cortical computation.
The point is therefore not that the model is a complete model of cortex. The point is that measured cortical organization contains computationally useful structure, and that this structure can be decomposed experimentally.
The larger picture
The central claim of the paper is straightforward:
Cortical organization is not just biological detail. It is computational structure.
By grounding RNNs in measured cortical geometry, wiring, and function, we show that biological priors can improve learning, stabilize optimization under sign constraints, and guide networks toward structured topological regimes.
This gives us two complementary opportunities.
For AI, cortical data can inspire more efficient, robust, and interpretable recurrent architectures.
For neuroscience, artificial recurrent networks can serve as controlled systems for testing which aspects of cortical organization are computationally consequential.
The machinery of cortex can therefore be used both as an inductive basis for artificial learning systems and as a scientific object to be interrogated through those systems.
In that sense, this work is not only about building better RNNs, and it is not only about interpreting cortical data. It is about using biological organization as a bridge between neural computation, machine learning, and the physics of structured dynamical systems.