Archives AI News

Not only a helper, but also a teacher: Interactive LLM Cascade

arXiv:2509.22984v1 Announce Type: new Abstract: Large Language Models (LLMs) vary widely in their capabilities, with larger models often having better performance but higher cost: choosing an LLM model often involves trading off performance and cost. The LLM Cascade is a…

Towards Strategic Persuasion with Language Models

arXiv:2509.22989v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated strong persuasive capabilities comparable to those of humans, offering promising benefits while raising societal concerns about their deployment. However, systematically evaluating the persuasive capabilities of LLMs is inherently challenging,…

Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research

arXiv:2509.22831v1 Announce Type: new Abstract: Research on Large Language Models (LLMs) increasingly focuses on identifying mechanistic explanations for their behaviors, yet the field lacks clear principles for determining when (and how) findings from one model instance generalize to another. This…

Hilbert: Recursively Building Formal Proofs with Informal Reasoning

arXiv:2509.22819v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate impressive mathematical reasoning abilities, but their solutions frequently contain errors that cannot be automatically verified. Formal theorem proving systems such as Lean 4 offer automated verification with complete accuracy, motivating…

Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

arXiv:2509.23023v1 Announce Type: new Abstract: Mafia is a social deduction game where informed mafia compete against uninformed townsfolk. Its asymmetry of information and reliance on theory-of-mind reasoning mirror real-world multi-agent scenarios, making it a useful testbed for evaluating the social…

Pretraining Large Language Models with NVFP4

arXiv:2509.25149v1 Announce Type: cross Abstract: Large Language Models (LLMs) today are powerful problem solvers across many domains, and they continue to get stronger as they scale in model size, training set size, and training set quality, as shown by extensive…

Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents

arXiv:2509.23045v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly applied to software engineering (SWE), with SWE-bench as a key benchmark. Solutions are split into SWE-Agent frameworks with multi-turn interactions and workflow-based Agentless methods with single-turn verifiable steps. We…

Risk Profiling and Modulation for LLMs

arXiv:2509.23058v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting…