You Can Edit the DNA, But You Can't Be an Embryo Again

I was watching Dwarkesh Patel’s latest video on the next AI training paradigm, and something clicked — not about AI, but about biology.

Dwarkesh’s core insight is that current AI training has a structural ceiling. The methods that work — reinforcement learning from verifiable rewards — only work in environments that are grindable. Deterministic, replayable, clonable, parallelizable. You can spin up 10,000 agents and throw them at the same coding problem simultaneously. You can’t do that with Amazon’s checkout flow. That’s why AI coding assistants are way ahead of AI that can navigate a computer — it’s not a talent problem, it’s an environment problem.

He proposes a way around this called On-Policy Self-Distillation (OPSD). The idea: after a long agent session, the model has learned a ton just from being in context. OPSD trains the base model to match the per-token predictions of that context-loaded “veteran” model. Every deployment becomes a training opportunity. You distill the experience back into the weights.

And here’s where my brain went: that’s CRISPR.

CRISPR lets you edit genes. You can change the DNA. But phenotype — the actual expressed organism — isn’t just genes. It’s genes × environment × time. You can insert the right DNA into a cell, but you can’t give that cell a developmental history it didn’t live through. There are critical periods where the environment shapes the organism in ways that can’t be retroactively applied. You can get new DNA, but you will never be an embryo again.

That’s exactly the OPSD problem. Dwarkesh wants to distill the veteran model’s in-context learning back into base weights — essentially editing the genome to skip development. But the in-context learning is the developmental environment. It’s not just information that could live in weights. It’s the accumulated effect of navigating a specific trajectory through problem space. The sequence matters, not just the endpoint.

The embryo line is the killer. Same reason training at short context and serving at long doesn’t generalize — the model never “developed” in long-horizon space. It has the DNA for it, but not the developmental environment.

There’s an epigenetics angle too. Environment modifies gene expression without changing DNA. That’s in-context learning — behavior shifts based on context without weight changes, but those shifts don’t persist. CRISPR can’t edit epigenetic marks any more than fine-tuning can capture transient in-context adaptations.

And the analogy even predicts where things might work. Some gene therapies work great for monogenic diseases — single gene, clear mechanism, environment doesn’t complicate it much. Distillation probably works the same way: narrow, well-defined capabilities where the “environment” is mostly deterministic. But complex, multi-factor capabilities — like actual autonomous agent performance over long horizons — those are the polygenic diseases of AI. Editing one pathway doesn’t fix the system-level problem.

I keep coming back to this because it reframes the AI scaling debate. The question isn’t “can we make the models bigger?” It’s “can we give them a developmental environment that actually matters?” And biology’s answer to that question is: not by editing the DNA alone. The environment has to be lived through.

You can change the code. You can’t skip the growing up.

Get new posts in your inbox