scripts
Local-only tooling. Nothing here runs on GitHub Pages.
build_garden_graph.py
Generates assets/data/garden-graph.json, the knowledge graph powering the
interactive Idea Garden at /garden/.
When to run
Re-run whenever any of the following change:
- a paper in
_papers/(added, deleted, or itstitle,tags,description, or abstract body edited) _data/concepts.yml
The output JSON is checked in. Forgetting to re-run is harmless — the Garden will simply not reflect the new paper until the JSON is regenerated and committed.
How to run
From the repository root:
python scripts/build_garden_graph.py
Output, on success:
wrote assets/data/garden-graph.json: 24 papers, 71 concepts, 312 edges (TF-IDF mode)
The script is deterministic — running it twice with no source changes produces a byte-identical JSON.
Dependencies
- Python 3.9+
pyyaml(required)scikit-learn(optional; enables TF-IDF mining of paper abstracts). Without it, the script falls back to plain substring matching, which still works but produces fewer mined edges.
pip install pyyaml scikit-learn
What it does
- Parses every
_papers/*.mdfront matter and body. Files with malformed YAML are skipped with a warning, not a hard fail. - Reads
_data/concepts.yml(the curated layer). - Builds a bipartite graph:
- paper nodes (one per paper) and concept nodes (curated concepts
- every distinct
tags:value, canonicalized via a small alias map).
- every distinct
- edges: explicit paper↔concept from
tags:and curatedpapers:lists (weight 1.0); mined paper↔concept from TF-IDF over title + description + abstract (weight 0.6); concept↔concept Jaccard cooccurrence when two concepts share ≥ 2 papers (weight = Jaccard).
- paper nodes (one per paper) and concept nodes (curated concepts
- Writes
assets/data/garden-graph.jsonwith sorted, stable output.