Archives AI News

HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation

arXiv:2603.10359v1 Announce Type: cross Abstract: Distilling reasoning capabilities from Large Reasoning Models (LRMs) into smaller models is typically constrained by the limitation of rejection sampling. Standard methods treat the teacher as a static filter, discarding complex “corner-case” problems where the…

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

arXiv:2603.10697v1 Announce Type: cross Abstract: Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to…

Marginals Before Conditionals

arXiv:2603.10074v1 Announce Type: new Abstract: We construct a minimal task that isolates conditional learning in neural networks: a surjective map with K-fold ambiguity, resolved by a selector token z, so H(A | B) = log K while H(A | B,…

Disjunctive Branch-and-Bound for Certifiably Optimal Low-Rank Matrix Completion

arXiv:2305.12292v4 Announce Type: replace Abstract: Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and…

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

arXiv:2603.10079v1 Announce Type: new Abstract: We analyse SGD training of a shallow, fully connected network in the NTK scaling and provide a quantitative theory of the catapult phase. We identify an explicit criterion separating two behaviours: When an explicit function…

Digging Deeper: Learning Multi-Level Concept Hierarchies

arXiv:2603.10084v1 Announce Type: new Abstract: Although concept-based models promise interpretability by explaining predictions with human-understandable concepts, they typically rely on exhaustive annotations and treat concepts as flat and independent. To circumvent this, recent work has introduced Hierarchical Concept Embedding Models…