Archives AI News

BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation

arXiv:2604.16514v2 Announce Type: replace-cross Abstract: Autoregressive vision-language models (VLMs) deliver strong multimodal capability, but their token-by-token decoding imposes a fundamental inference bottleneck. Diffusion VLMs offer a more parallel decoding paradigm, yet directly converting a pretrained autoregressive VLM into a large-block…

SAGE-32B: Agentic Reasoning via Iterative Distillation

arXiv:2601.04237v2 Announce Type: replace-cross Abstract: We demonstrate SAGE-32B, a 32 billion parameter language model that focuses on agentic reasoning and long range planning tasks. Unlike chat models that aim for general conversation fluency, SAGE-32B is designed to operate in an…

How to Teach Large Multimodal Models New Skills

arXiv:2510.08564v2 Announce Type: replace-cross Abstract: How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. Surprisingly,…

LEPO: Latent Reasoning Policy Optimization for Large Language Models

arXiv:2604.17892v2 Announce Type: replace Abstract: Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably collapse to deterministic inference, failing to discover diverse reasoning…

Towards Understanding the Robustness of Sparse Autoencoders

arXiv:2604.18756v1 Announce Type: new Abstract: Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse Autoencoders (SAEs) are widely used for interpretability, their robustness implications remain underexplored. We present a study of integrating…