GAUSS: Benchmarking Structured Mathematical Skills for Large Language Models

2025-09-24 19:00 GMT · 10 months ago aimagpro.com

arXiv:2509.18122v1 Announce Type: cross
Abstract: We introduce textbf{GAUSS} (textbf{G}eneral textbf{A}ssessment of textbf{U}nderlying textbf{S}tructured textbf{S}kills in Mathematics), a benchmark that evaluates LLMs’ mathematical abilities across twelve core skill dimensions, grouped into three domains: knowledge and understanding, problem solving and communication, and meta-skills and creativity. By categorizing problems according to cognitive skills and designing tasks that isolate specific abilities, GAUSS constructs comprehensive, fine-grained, and interpretable profiles of models’ mathematical abilities. These profiles faithfully represent their underlying mathematical intelligence. To exemplify how to use the textsc{GAUSS} benchmark, we have derived the skill profile of textsc{GPT-5-thinking}, revealing its strengths and weaknesses as well as its differences relative to textsc{o4-mini-high}, thereby underscoring the value of multidimensional, skill-based evaluation.

No results