FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
arXiv:2511.02872v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of…
