MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
arXiv:2502.18940v2 Announce Type: replace-cross Abstract: Evaluating the pedagogical capabilities of AI-based tutoring models is critical for making guided progress in the field. Yet, we lack a reliable, easy-to-use, and simple-to-run evaluation that reflects the pedagogical abilities of models. To fill…
