Archives AI News

Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing

arXiv:2602.11786v2 Announce Type: replace Abstract: Traditional benchmarks for large language models (LLMs), such as HELM and AIR-BENCH, primarily assess safety through breadth-oriented evaluation across diverse tasks and risk categories. However, real-world deployment often exposes a different class of risk: operational…

April 29, 2026

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

arXiv:2604.24954v1 Announce Type: new Abstract: We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements…

April 29, 2026

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

arXiv:2603.26554v2 Announce Type: replace Abstract: Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative…

April 29, 2026

Compute Aligned Training: Optimizing for Test Time Inference

arXiv:2604.24957v1 Announce Type: new Abstract: Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the likelihood of individual samples under a…

April 29, 2026

AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning

arXiv:2604.18445v2 Announce Type: replace Abstract: Performance, power, and area (PPA) optimization is a fundamental task in RTL design, requiring a precise understanding of circuit functionality and the relationship between circuit structures and PPA metrics. Recent studies attempt to automate this…

April 29, 2026

CoreFlow: Low-Rank Matrix Generative Models

arXiv:2604.24959v1 Announce Type: new Abstract: Learning matrix-valued distributions from high-dimensional and possibly incomplete training data is challenging: ambient-space generative modeling is computationally expensive and statistically fragile when the matrix dimension is large but the sample size is limited. We propose…

April 29, 2026

OptProver: Bridging Olympiad and Optimization through Continual Training in Formal Theorem Proving

arXiv:2604.23712v2 Announce Type: replace Abstract: Recent advances in formal theorem proving have focused on Olympiad-level mathematics, leaving undergraduate domains largely unexplored. Optimization, fundamental to machine learning, operations research, and scientific computing, remains underserved by existing provers. Its reliance on domain-specific…

April 29, 2026

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

arXiv:2604.24964v1 Announce Type: new Abstract: Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site workflows. Common web navigation tasks, such as comparing…

April 29, 2026

A Deep Reinforcement Learning Approach to Automated Stock Trading, using xLSTM Networks

arXiv:2503.09655v2 Announce Type: replace-cross Abstract: Traditional Long Short-Term Memory (LSTM) networks are effective for handling sequential data but have limitations such as gradient vanishing and difficulty in capturing long-term dependencies, which can impact their performance in dynamic and risky environments…

April 29, 2026

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

arXiv:2604.24971v1 Announce Type: new Abstract: We present PolyKV, a system in which multiple concurrent inference agents share a single, asymmetrically compressed KV cache pool. Rather than allocating a separate KV cache per agent — the standard paradigm — PolyKV writes…

April 29, 2026