Archives AI News

DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency

arXiv:2605.06592v1 Announce Type: cross Abstract: Contrastive language-image pretraining (CLIP) suffers from two structural weaknesses: the symmetric InfoNCE loss discards the relative ordering among unmatched in-batch pairs, and global pooling collapses the visual representation into a semantic bottleneck that is poorly…

May 8, 2026

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

arXiv:2605.05226v1 Announce Type: new Abstract: The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamentally in how to transform feedback provided only at the end of a sequence into fine-grained…

May 8, 2026

On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines

arXiv:2508.14482v3 Announce Type: replace Abstract: The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are essential for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a…

May 8, 2026

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

arXiv:2605.05227v1 Announce Type: new Abstract: Data curation is a critical yet under-explored area in large language model (LLM) training. Existing methods, such as data selection and mixing, operate in an offline paradigm, detaching themselves from training. This separation introduces engineering…

May 8, 2026

Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving

arXiv:2601.21351v2 Announce Type: replace Abstract: Attentio-FFN disaggregation (AFD) is an emerging architecture for LLM decoding that separates state-heavy, KV-cache-dominated Attention computation from stateless, compute-intensive FFN computation, connected by per-step communication. While AFD enables independent scaling of memory and compute resources,…

May 8, 2026

Evolutionary fine tuning of quantized convolution-based deep learning models

arXiv:2605.05228v1 Announce Type: new Abstract: Deep learning models are the most efficient models in many machine learning tasks. The main disadvantage when using them in IoT, mobile devices, independent autonomous or real-time systems is their complexity and memory size. Therefore,…

May 8, 2026

Structural Sensitivity in Compressed Transformers: Relative Error Propagation and Layer Removal

arXiv:2603.20991v2 Announce Type: replace Abstract: Compressing transformer weights makes large language models cheaper to deploy. But each layer’s compression introduces an error. These errors accumulate as the signal passes through later layers, and how they accumulate is not well understood.…

May 8, 2026

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

arXiv:2605.05278v1 Announce Type: new Abstract: Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we…

May 8, 2026

Importance-Guided Basis Selection for Low-Rank Decomposition of Large Language Models

arXiv:2605.01627v2 Announce Type: replace Abstract: Low-rank decomposition is a compelling approach for compressing large language models, but its effectiveness hinges on selecting which singular-vector bases to retain for a target task. Existing methods such as Basel adapt singular-value coefficients on…

May 8, 2026

Pretrained Event Classification Model for High Energy Physics Analysis

arXiv:2412.10665v2 Announce Type: replace-cross Abstract: We introduce a foundation model for event classification in high-energy physics, built on a Graph Neural Network architecture and trained on 120 million simulated proton-proton collision events spanning 12 distinct physics processes. The model is…

May 8, 2026