Archives AI News

RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

arXiv:2510.06186v1 Announce Type: cross Abstract: Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature…

October 8, 2025

Biomedical reasoning in action: Multi-agent System for Auditable Biomedical Evidence Synthesis

arXiv:2510.05335v1 Announce Type: new Abstract: We present M-Reason, a demonstration system for transparent, agent-based reasoning and evidence integration in the biomedical domain, with a focus on cancer research. M-Reason leverages recent advances in large language models (LLMs) and modular agent…

October 8, 2025

VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs

arXiv:2506.06727v3 Announce Type: replace Abstract: Large Multimodal Models have achieved remarkable progress in integrating vision and language, enabling strong performance across perception, reasoning, and domain-specific tasks. However, their capacity to reason over multiple, visually similar inputs remains insufficiently explored. Such…

October 8, 2025

Integrating Bayesian methods with neural network–based model predictive control: a review

arXiv:2510.05338v1 Announce Type: new Abstract: In this review, we assess the use of Bayesian methods in model predictive control (MPC), focusing on neural-network-based modeling, control design, and uncertainty quantification. We systematically analyze individual studies and how they are implemented in…

October 8, 2025

Open Agent Specification (Agent Spec) Technical Report

arXiv:2510.04173v2 Announce Type: replace Abstract: Open Agent Specification (Agent Spec) is a declarative language that allows AI agents and their workflows to be defined in a way that is compatible across different AI frameworks, promoting portability and interoperability within AI…

October 8, 2025

MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts

arXiv:2510.05363v1 Announce Type: new Abstract: Adapting Foundation Models to new domains with limited training data is challenging and computationally expensive. While prior work has demonstrated the effectiveness of using domain-specific exemplars as in-context demonstrations, we investigate whether representing exemplars purely…

October 8, 2025

PACER: Physics Informed and Uncertainty Aware Climate Emulator

arXiv:2410.21657v4 Announce Type: replace-cross Abstract: Physics based numerical climate models serve as critical tools for evaluating the effects of climate change and projecting future climate scenarios. However, the reliance on numerical simulations of physical equations renders them computationally intensive and…

October 8, 2025

What Do You Mean? Exploring How Humans and AI Interact with Symbols and Meanings in Their Interactions

arXiv:2510.05378v1 Announce Type: new Abstract: Meaningful human-AI collaboration requires more than processing language, it demands a better understanding of symbols and their constructed meanings. While humans naturally interpret symbols through social interaction, AI systems treat them as patterns with compressed…

October 8, 2025

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

arXiv:2503.23278v3 Announce Type: replace-cross Abstract: The Model Context Protocol (MCP) is an emerging open standard that defines a unified, bi-directional communication and dynamic discovery protocol between AI models and external tools or resources, aiming to enhance interoperability and reduce fragmentation…

October 8, 2025

Teacher-Student Guided Inverse Modeling for Steel Final Hardness Estimation

arXiv:2510.05402v1 Announce Type: new Abstract: Predicting the final hardness of steel after heat treatment is a challenging regression task due to the many-to-one nature of the process — different combinations of input parameters (such as temperature, duration, and chemical composition)…

October 8, 2025