Training

TL;DR AI coding agents typically produce code, and a human reviewer verifies it or tests catch regressions. AutoResearch is a different paradigm: an agent iterates on an ML model to improve a continuous metric. Its success comes from constraining the agent to a fixed time budget and defining an evaluation that is invariant to the code changes it is allowed to make. The implications go beyond ML models: the same approach could apply to latency, capacity, or any system with a measurable objective....