Archives AI News

RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

arXiv:2510.06186v1 Announce Type: cross Abstract: Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature…

VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs

arXiv:2506.06727v3 Announce Type: replace Abstract: Large Multimodal Models have achieved remarkable progress in integrating vision and language, enabling strong performance across perception, reasoning, and domain-specific tasks. However, their capacity to reason over multiple, visually similar inputs remains insufficiently explored. Such…

Open Agent Specification (Agent Spec) Technical Report

arXiv:2510.04173v2 Announce Type: replace Abstract: Open Agent Specification (Agent Spec) is a declarative language that allows AI agents and their workflows to be defined in a way that is compatible across different AI frameworks, promoting portability and interoperability within AI…

PACER: Physics Informed and Uncertainty Aware Climate Emulator

arXiv:2410.21657v4 Announce Type: replace-cross Abstract: Physics based numerical climate models serve as critical tools for evaluating the effects of climate change and projecting future climate scenarios. However, the reliance on numerical simulations of physical equations renders them computationally intensive and…

Teacher-Student Guided Inverse Modeling for Steel Final Hardness Estimation

arXiv:2510.05402v1 Announce Type: new Abstract: Predicting the final hardness of steel after heat treatment is a challenging regression task due to the many-to-one nature of the process — different combinations of input parameters (such as temperature, duration, and chemical composition)…