BuilderBench — A benchmark for generalist agents
arXiv:2510.06288v1 Announce Type: new Abstract: Today’s AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills…
