2025-02-10

What R1 Made Visible

People keep asking if China can innovate. Watching R1 ship made that question feel too coarse. I was also reading Keyu Jin around this time. Her version of the China story made more sense to me than the usual culture-war one: the country spent decades doing the highest-return thing in front of it. R1 made that idea feel less abstract.

I don't think the answer lives at the level of a country. Original work shows up inside particular environments: places where reality pushes back clearly, mistakes become visible quickly, and being wrong doesn't end the game. R1 had that kind of environment around it: published benchmarks, weights you could check, and a lab small enough to change direction when a training run taught them something.

For most of the last few decades China optimized for a different goal. It was trying to move fast, build at scale, and close the gap with global market leaders. Adopting and adapting was the rational path. The receipts are recognizable: 40,000+ km of high-speed rail built in two decades, mobile payments deployed without an incumbent credit card system to negotiate around, EV and battery supply chains that arrived before the Western auto industry knew the timeline. None of this required originality. It required the political-economic environment to sustain execution at scale, which it did.

That mode had real costs. A system that gets very good at catching up doesn't automatically become good at originating. The incentives, habits, and tolerances are different.

R1 was the first thing that made me notice the environment was shifting. The technical work was specific. GRPO replaced PPO's separate critic model with a group-relative reward signal: no extra critic model, no extra parameters. MLA compressed the KV cache through a small latent layer, solving a production bottleneck Western labs had been working around. R1-Zero showed reasoning could emerge from feedback alone, without supervised demonstrations to imitate. The recipe was published, the weights were open, and within months other labs were redesigning their training pipelines around it.

The same lens applies to places that are not shifting. Politically sensitive sectors do not get clean feedback. Markets with sudden regulatory changes make it hard to plan long enough to learn. Founders who cannot afford to lose money will rationally copy what already works. In those places, the execution machine does not disappear. It just optimizes for the measurable thing in front of it.

So the question I find myself asking now is where China is building environments that can keep reality close to the work. R1 is one answer. The harder question is which other environments can stay close enough to reality for that kind of originality to repeat.