Open Data in Neurophysiology: From Data Sharing to Scientific Infrastructure

Companion post to:
Open Data In Neurophysiology: Advancements, Solutions & Challenges
Colleen J. Gillon, Cody Baker, Ryan Ly, Edoardo Balzani, Bingni W. Brunton, Manuel Schottdorf, Satrajit Ghosh and Nima Dehghani eNeuro 20 November 2025, 12 (11) ENEURO.0486-24.2025 DOI: https://doi.org/10.1523/ENEURO.0486-24.2025

In November 2025, our paper “Open Data In Neurophysiology: Advancements, Solutions & Challenges” was published in eNeuro:

Colleen J. Gillon, Cody Baker, Ryan Ly, Edoardo Balzani, Bingni W. Brunton, Manuel Schottdorf, Satrajit Ghosh, and Nima Dehghani.
Open Data In Neurophysiology: Advancements, Solutions & Challenges.
eNeuro, 12(11), 2025.
DOI: 10.1523/ENEURO.0486-24.2025

The paper grew out of the inaugural Open Data In Neurophysiology (ODIN) symposium, which we organized at MIT in 2023 through the Open Data In Neuroscience initiative at the McGovern Institute, in collaboration with DANDI. The full 2023 program is available here:

ODIN Symposium 2023 — Open Data In Neurophysiology

and the video recordings are available here:

ODIN 2023 videos

ODIN continued in 2025 at the Allen Institute, with the theme “Integrating Scales and Modalities: The Future of Neurophysiology.”

ODIN Symposium 2025 speakers

ODIN 2025 videos

As one of the founding organizers of ODIN, my motivation was not simply to create another conference on open science. The goal was to create a forum where experimental neurophysiologists, computational neuroscientists, tool builders, data infrastructure developers, modelers, and funders could discuss the practical future of open neurophysiology as an integrated scientific ecosystem.

This paper is one outcome of that effort.

The central claim: open data is not enough

It is now almost routine to say that neuroscience should become more open. But “open data” can mean very different things.

At its weakest, it means that a file is placed somewhere online after publication. At its strongest, it means that the data, metadata, code, computational environment, provenance, annotations, and scientific context are organized in a way that allows other researchers to inspect, reuse, compare, extend, and reinterpret the work.

The ODIN paper argues for the second meaning.

Modern neurophysiology is entering a regime where simple file sharing is no longer sufficient. The field now produces large, heterogeneous, multimodal datasets from Neuropixels probes, high-density ECoG, calcium imaging, voltage imaging, behavior tracking, whole-brain observatories, organoids, and increasingly complex closed-loop experiments. These datasets are often valuable far beyond the original experiment. But their value depends on whether they can be made scientifically usable outside the laboratory that produced them.

That is the real challenge: not merely to make data available, but to make neurophysiology reusable, interpretable, computable, and interoperable.

Why neurophysiology is at an inflection point

Many areas of biology have already been transformed by open data. Structural biology and genomics are obvious examples. Protein sequence and structure databases, genome repositories, and shared computational tools did not merely make old workflows more efficient. They changed what kinds of questions could be asked.

Neurophysiology has the potential to undergo a similar transformation, but the problem is harder in several ways.

A neurophysiology experiment is not just a matrix of measurements. It is a structured event in time: animals or humans behaving, stimuli being presented, neural activity being recorded across multiple devices, clocks drifting or requiring alignment, environmental conditions changing, interventions occurring, behavioral states fluctuating, and analysis pipelines transforming raw signals into derived objects.

For such data to be reusable, it is not enough to know that a signal was recorded. We need to know what was recorded, from where, under what conditions, with what device, aligned to which behavioral and stimulus streams, processed through which pipeline, and interpreted through which assumptions.

This is why neurophysiology needs more than repositories. It needs infrastructure.

Standards and archives: NWB, DANDI, and the problem of interoperability

A major part of the ODIN discussion centered on the role of common standards and archives. The Neurodata Without Borders (NWB) ecosystem has become one of the central standards for neurophysiology data. Its importance is not just that it gives the field a common file format. More importantly, it gives us a shared conceptual structure for organizing neural recordings, behavioral variables, stimuli, metadata, and derived data products.

The DANDI Archive then provides a home for these standardized datasets. DANDI is especially important because neurophysiology data are large, and because researchers increasingly need cloud-native ways to access and compute on these datasets without downloading everything locally.

But the paper also emphasizes that standards are not magic. They solve some problems while exposing others.

For example, many experiments record multiple time series in parallel: neural signals, behavioral video, pose estimates, stimulus timing, running speed, eye tracking, pupil dynamics, optogenetic stimulation, electrophysiology streams from multiple probes, and so on. These streams may be recorded by different devices with different clocks. Aligning them is often nontrivial and may require laboratory-specific procedures. If that alignment is undocumented, hidden in a script, or only implicitly known by the original lab, then the shared dataset remains only partially reusable.

This is one reason the paper spends significant attention on metadata, provenance, data stream alignment, and computational reproducibility. These are not bureaucratic details. They are part of the scientific object.

Raw data, processed data, and the compression of future possibility

One of the hardest questions in open neurophysiology is deceptively simple:

What should be preserved?

With high-throughput electrophysiology and optical physiology, raw data can easily reach terabyte scales. Keeping everything forever is expensive. But sharing only heavily processed data can remove information that later turns out to be essential.

Spike sorting is a clear example. Sharing sorted spike trains is convenient, compact, and often sufficient for many analyses. But spike sorting decisions depend on algorithmic choices, parameter settings, noise conditions, probe geometry, manual curation, and assumptions about unit identity. If the raw data are discarded, later researchers may not be able to evaluate false positives, false negatives, drift, waveform changes, or alternative sorting strategies.

The same issue applies more broadly. Calcium imaging pipelines produce ROIs, traces, deconvolved events, neuropil corrections, motion correction outputs, and quality control metrics. Behavioral pipelines produce pose estimates, latent states, syllables, or task labels. Each transformation increases usability for some purposes but can remove information relevant to others.

So the question is not simply “raw or processed?” The better question is:

Which levels of the data hierarchy need to be preserved so that future validation, reanalysis, and discovery remain possible?

This is where provenance becomes central. If we share processed data, we should also preserve enough information about the processing pipeline, software versions, parameters, quality metrics, and input data to make that transformation scientifically inspectable.

Open tools are part of the scientific record

The ODIN paper also discusses a growing ecosystem of open-source neurophysiology tools. These include tools for data conversion, data management, spike sorting, calcium imaging, pose estimation, behavioral segmentation, visualization, benchmarking, cloud computation, and model evaluation.

Examples include NWB GUIDE, NeuroConv, SpikeInterface, DataJoint, Spyglass, Neurosift, Dendro, DeepLabCut, SLEAP, MoSeq, CEBRA, Pynapple, Brain-Score, and others.

The point is not that every lab should use the same tool for every task. Neurophysiology is too diverse for that. The point is that tools are becoming part of the shared infrastructure of the field. They encode assumptions, workflows, analysis conventions, and best practices. When they are well maintained, documented, tested, and connected to standards, they can substantially reduce the friction of doing reproducible science.

But this also creates a social problem. Open-source tools do not maintain themselves.

Many tools begin as graduate-student or postdoc projects. They may become widely used, but their maintenance often depends on fragile labor structures. When the original developer leaves, the tool may stagnate. Bugs accumulate. Documentation falls behind. Dependencies break. The field loses a resource that may have taken years to build.

For this reason, the paper argues that open science requires funding mechanisms for research software engineering, data curation, documentation, benchmarking, and long-term maintenance. These should not be treated as peripheral technical services. They are now part of the epistemic machinery of neuroscience.

AI-ready neuroscience requires more than large datasets

A particularly important theme in the paper is the relationship between open neurophysiology and AI.

It is tempting to assume that once enough neurophysiology data are openly available, AI systems will simply extract the relevant structure. But that assumption is too simplistic. Large models require not only large datasets, but also consistent organization, rich metadata, reliable annotations, common vocabularies, and meaningful benchmarks.

Neurophysiology currently lacks many of these ingredients.

Even basic terms such as “burst,” “ripple,” “oscillation,” “state,” “assembly,” “event,” or “motif” can mean different things across subfields, species, brain areas, recording modalities, and analysis traditions. This is not merely a semantic inconvenience. It directly affects whether models trained on one dataset can generalize to another.

Similarly, annotations are often sparse. Many datasets contain the annotations needed for the original paper, but not necessarily the richer descriptions needed for reuse, meta-analysis, or foundation-model training. Important information may remain in lab notebooks, custom spreadsheets, undocumented preprocessing scripts, or the memory of the people who ran the experiment.

If we want AI systems that can reason across neurophysiology datasets, generate hypotheses, compare models, assist with analysis, or discover cross-dataset regularities, then the field needs datasets that are not only open, but AI-ready.

That means:

standardized metadata;
explicit provenance;
machine-readable experimental structure;
robust quality metrics;
shared vocabularies and ontologies;
community annotation mechanisms;
benchmark datasets and tasks;
clear links between raw data, processed data, code, and interpretation.

AI will not remove the need for careful neuroinformatics. It will make careful neuroinformatics more important.

From observatories to ecosystems

One of the most exciting developments discussed in the paper is the emergence of large-scale collaborative platforms and observatory-style models.

The Allen Institute’s OpenScope platform, for example, is inspired by the logic of astronomical observatories: researchers propose experiments, and accepted proposals are carried out through standardized high-throughput pipelines. This model can democratize access to data collection technologies that would otherwise be available only to a small number of highly resourced laboratories.

The International Brain Laboratory provides a different but complementary model: distributed laboratories using standardized protocols to generate reproducible, large-scale datasets across sites.

These platforms show that open neurophysiology is not only about what happens after a paper is published. It also affects how experiments are designed, how data are collected, how metadata are captured, how pipelines are standardized, and how communities organize around shared scientific questions.

At the same time, the ODIN paper emphasizes that smaller laboratory datasets remain essential. Neuroscience progresses through diversity: different preparations, species, tasks, brain areas, behavioral regimes, interventions, and theoretical motivations. Large platforms cannot and should not replace individual laboratories.

The challenge is to create an ecosystem in which both large observatory-style datasets and smaller lab-generated datasets can be made interoperable, searchable, reusable, and scientifically meaningful.

Data quality needs to become visible

Open data is useful only if users can evaluate whether a dataset is appropriate for their question.

This requires quality metrics. In neurophysiology, such metrics might include signal quality, channel dropout, motion artifacts, spike sorting quality, imaging stability, behavioral tracking reliability, temporal alignment accuracy, metadata completeness, missingness, annotation quality, and reproducibility of derived features.

The paper argues that the field needs community-driven ways to evaluate and report such properties. This would help researchers decide which datasets are suitable for specific analyses, and it would also provide feedback to improve future experiments.

Importantly, quality does not mean perfection. Different datasets are useful for different purposes. A dataset with limitations may still be valuable if those limitations are transparent. The problem is not imperfection; the problem is opacity.

The incentive problem

Perhaps the most important argument in the paper is that open science will not succeed through technical solutions alone.

The current academic incentive structure is poorly aligned with the labor required to make neurophysiology data reusable. Preparing a dataset for reuse takes time. Writing documentation takes time. Cleaning metadata takes time. Creating tutorials takes time. Maintaining software takes time. Answering user questions takes time. Building community standards takes time.

Yet these contributions are often undervalued relative to traditional papers.

If we want open neurophysiology to become the norm, then datasets, codebases, benchmarks, standards, and tools must receive real academic credit. Funding agencies, journals, universities, hiring committees, tenure committees, and scientific societies all have a role to play.

The paper therefore calls for:

stable support for open-source software and data infrastructure;
funding for research software engineers and data specialists;
institutional core facilities for data management and computational reproducibility;
clearer expectations from journals around data and code availability;
recognition of datasets and tools as scholarly contributions;
better mechanisms for measuring reuse, impact, and community value.

Without these changes, open science remains dependent on volunteer labor and short-term enthusiasm. That is not a sustainable foundation for a field as technically demanding as modern neurophysiology.

ODIN as community infrastructure

ODIN was created to help build the social layer around these technical problems.

The inaugural ODIN symposium at MIT brought together people working on devices, neuroinformatics, data standards, platforms, open-source tools, modeling, benchmarking, AI, and community governance. The structure of the meeting reflected a central belief: open neurophysiology cannot be solved by any one group alone. It requires interaction between experimentalists, computational scientists, data engineers, tool developers, theorists, and institutions.

The 2025 ODIN symposium at the Allen Institute continued this trajectory with a focus on integrating scales and modalities. That theme is exactly where the field is heading. Neurophysiology is no longer confined to a single modality or scale. We increasingly want to connect molecules, cell types, circuits, behavior, large-scale recordings, structural data, dynamics, and models.

That integration will require open infrastructure.

What I hope readers take from the paper

For experimental neurophysiologists, I hope the paper is useful as a practical map of the issues that now surround data collection, metadata, standards, sharing, and reuse.

For computational neuroscientists, I hope it clarifies why access to data is only the first step. The deeper issue is whether datasets are sufficiently structured and documented to support serious modeling, benchmarking, and cross-dataset inference.

For tool builders and neuroinformatics developers, I hope the paper gives a broader context for why your work matters. Tools are not just convenience layers. They shape what kinds of science become possible.

For institutions and funders, I hope the message is clear: open neurophysiology requires durable support. Infrastructure, standards, curation, software maintenance, and data expertise must be funded as core scientific activities.

And for the ODIN community, I see this paper as a snapshot of a movement still in formation. It documents where we are, what is already working, what remains fragile, and what we need to build next.

Open neurophysiology is not simply about making data public. It is about building the conditions under which neurophysiology can become cumulative, reproducible, interoperable, and computationally powerful.

That is the larger project ODIN is trying to support.