Archives AI News

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

arXiv:2602.17607v1 Announce Type: cross Abstract: PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer…

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

arXiv:2505.17508v4 Announce Type: replace Abstract: Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs). KL regularization is ubiquitous, yet the design surface, choice of KL direction (forward vs. reverse), normalization (normalized vs.…

LiveClin: A Live Clinical Benchmark without Leakage

arXiv:2602.16747v1 Announce Type: new Abstract: The reliability of medical LLM evaluation is critically undermined by data contamination and knowledge obsolescence, leading to inflated scores on static benchmarks. To address these challenges, we introduce LiveClin, a live benchmark designed for approximating…

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv:2602.16746v1 Announce Type: new Abstract: Grokking — the delayed transition from memorization to generalization in small algorithmic tasks — remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight…

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

arXiv:2602.16740v1 Announce Type: new Abstract: In mechanistic interpretability, recent work scrutinizes transformer “circuits” – sparse, mono or multi layer sub computations, that may reflect human understandable functions. Yet, these network circuits are rarely acid-tested for their stability across different instances…

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

arXiv:2602.16953v1 Announce Type: cross Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due…