Archives AI News

Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

arXiv:2602.17287v1 Announce Type: cross Abstract: Modern neural translation models based on the Transformer architecture are known for their high performance, particularly when trained on high-resource datasets. A standard next-token prediction training strategy, while widely adopted in practice, may lead to…

February 20, 2026

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

arXiv:2602.16745v1 Announce Type: new Abstract: Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled…

February 20, 2026

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv:2602.16746v1 Announce Type: new Abstract: Grokking — the delayed transition from memorization to generalization in small algorithmic tasks — remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight…

February 20, 2026

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

arXiv:2602.16742v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or…

February 20, 2026

Chip-processing method could assist cryptography schemes to keep data secure

By enabling two chips to authenticate each other using a shared fingerprint, this technique can improve privacy and energy efficiency.

February 20, 2026

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

arXiv:2505.00282v3 Announce Type: replace-cross Abstract: To analyze unstructured data (text, images, audio, video), economists typically first extract low-dimensional structured features with a neural network. Neural networks do not make generically unbiased predictions, and biases will propagate to estimators that use…

February 20, 2026

Block-Recurrent Dynamics in Vision Transformers

arXiv:2512.19941v5 Announce Type: replace-cross Abstract: As Vision Transformers (ViTs) become standard vision backbones, a mechanistic account of their computational phenomenology is essential. Despite architectural cues that hint at dynamical structure, there is no settled framework that interprets Transformer depth as…

February 20, 2026

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

arXiv:2510.14974v3 Announce Type: replace Abstract: Few-step diffusion or flow-based generative models typically distill a velocity-predicting teacher into a student that predicts a shortcut towards denoised data. This format mismatch has led to complex distillation procedures that often suffer from a…

February 20, 2026

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

arXiv:2602.10117v3 Announce Type: replace Abstract: Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their stated reasoning is therefore unreliable, and existing bias…

February 20, 2026

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

arXiv:2602.17607v1 Announce Type: cross Abstract: PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer…

February 20, 2026