Archives AI News

Not only a helper, but also a teacher: Interactive LLM Cascade

arXiv:2509.22984v1 Announce Type: new Abstract: Large Language Models (LLMs) vary widely in their capabilities, with larger models often having better performance but higher cost: choosing an LLM model often involves trading off performance and cost. The LLM Cascade is a…

September 30, 2025

Towards Strategic Persuasion with Language Models

arXiv:2509.22989v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated strong persuasive capabilities comparable to those of humans, offering promising benefits while raising societal concerns about their deployment. However, systematically evaluating the persuasive capabilities of LLMs is inherently challenging,…

September 30, 2025

JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory

arXiv:2509.22888v1 Announce Type: new Abstract: Standard LLM evaluation practices compress diverse abilities into single scores, obscuring their inherently multidimensional nature. We present JE-IRT, a geometric item-response framework that embeds both LLMs and questions in a shared space. For question embeddings,…

September 30, 2025

Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research

arXiv:2509.22831v1 Announce Type: new Abstract: Research on Large Language Models (LLMs) increasingly focuses on identifying mechanistic explanations for their behaviors, yet the field lacks clear principles for determining when (and how) findings from one model instance generalize to another. This…

September 30, 2025

Hilbert: Recursively Building Formal Proofs with Informal Reasoning

arXiv:2509.22819v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate impressive mathematical reasoning abilities, but their solutions frequently contain errors that cannot be automatically verified. Formal theorem proving systems such as Lean 4 offer automated verification with complete accuracy, motivating…

September 30, 2025

Vehicle Classification under Extreme Imbalance: A Comparative Study of Ensemble Learning and CNNs

arXiv:2509.24880v1 Announce Type: cross Abstract: Accurate vehicle type recognition underpins intelligent transportation and logistics, but severe class imbalance in public datasets suppresses performance on rare categories. We curate a 16-class corpus (~47k images) by merging Kaggle, ImageNet, and web-crawled data,…

September 30, 2025

Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

arXiv:2509.23023v1 Announce Type: new Abstract: Mafia is a social deduction game where informed mafia compete against uninformed townsfolk. Its asymmetry of information and reliance on theory-of-mind reasoning mirror real-world multi-agent scenarios, making it a useful testbed for evaluating the social…

September 30, 2025

Pretraining Large Language Models with NVFP4

arXiv:2509.25149v1 Announce Type: cross Abstract: Large Language Models (LLMs) today are powerful problem solvers across many domains, and they continue to get stronger as they scale in model size, training set size, and training set quality, as shown by extensive…

September 30, 2025

Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents

arXiv:2509.23045v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly applied to software engineering (SWE), with SWE-bench as a key benchmark. Solutions are split into SWE-Agent frameworks with multi-turn interactions and workflow-based Agentless methods with single-turn verifiable steps. We…

September 30, 2025

Risk Profiling and Modulation for LLMs

arXiv:2509.23058v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting…

September 30, 2025