Archives AI News

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

arXiv:2505.17508v4 Announce Type: replace Abstract: Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs). KL regularization is ubiquitous, yet the design surface, choice of KL direction (forward vs. reverse), normalization (normalized vs.…

February 20, 2026

LiveClin: A Live Clinical Benchmark without Leakage

arXiv:2602.16747v1 Announce Type: new Abstract: The reliability of medical LLM evaluation is critically undermined by data contamination and knowledge obsolescence, leading to inflated scores on static benchmarks. To address these challenges, we introduce LiveClin, a live benchmark designed for approximating…

February 20, 2026

Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

arXiv:2602.17287v1 Announce Type: cross Abstract: Modern neural translation models based on the Transformer architecture are known for their high performance, particularly when trained on high-resource datasets. A standard next-token prediction training strategy, while widely adopted in practice, may lead to…

February 20, 2026

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

arXiv:2602.16745v1 Announce Type: new Abstract: Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled…

February 20, 2026

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv:2602.16746v1 Announce Type: new Abstract: Grokking — the delayed transition from memorization to generalization in small algorithmic tasks — remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight…

February 20, 2026

Chip-processing method could assist cryptography schemes to keep data secure

By enabling two chips to authenticate each other using a shared fingerprint, this technique can improve privacy and energy efficiency.

February 20, 2026

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

arXiv:2505.00282v3 Announce Type: replace-cross Abstract: To analyze unstructured data (text, images, audio, video), economists typically first extract low-dimensional structured features with a neural network. Neural networks do not make generically unbiased predictions, and biases will propagate to estimators that use…

February 20, 2026

Block-Recurrent Dynamics in Vision Transformers

arXiv:2512.19941v5 Announce Type: replace-cross Abstract: As Vision Transformers (ViTs) become standard vision backbones, a mechanistic account of their computational phenomenology is essential. Despite architectural cues that hint at dynamical structure, there is no settled framework that interprets Transformer depth as…

February 20, 2026

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

arXiv:2510.14974v3 Announce Type: replace Abstract: Few-step diffusion or flow-based generative models typically distill a velocity-predicting teacher into a student that predicts a shortcut towards denoised data. This format mismatch has led to complex distillation procedures that often suffer from a…

February 20, 2026

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

arXiv:2602.10117v3 Announce Type: replace Abstract: Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their stated reasoning is therefore unreliable, and existing bias…

February 20, 2026