Archives AI News

Improving Diversity in Black-box Few-shot Knowledge Distillation

arXiv:2604.25795v1 Announce Type: cross Abstract: Knowledge distillation (KD) is a well-known technique to effectively compress a large network (teacher) to a smaller network (student) with little sacrifice in performance. However, most KD methods require a large training set and internal…

Is the Modality Gap a Bug or a Feature? A Robustness Perspective

arXiv:2603.29080v2 Announce Type: replace-cross Abstract: Many modern multi-modal models (e.g. CLIP) seek an embedding space in which the two modalities are aligned. Somewhat surprisingly, almost all existing models show a strong modality gap: the distribution of images is well-separated from…

A Hybridizable Neural Time Integrator for Stable Autoregressive Forecasting

arXiv:2604.21101v2 Announce Type: replace Abstract: For autoregressive modeling of chaotic dynamical systems over long time horizons, the stability of both training and inference is a major challenge in building scientific foundation models. We present a hybrid technique in which an…

JaGuard: Position Error Correction of GNSS Jamming with Deep Temporal Graphs

arXiv:2509.14000v4 Announce Type: replace Abstract: Global Navigation Satellite Systems (GNSS) face growing disruption from intentional jamming, undermining critical infrastructure where precise positioning and timing are essential. Current position error correction (PEC) methods mainly focus on multi-path propagation errors and fail…

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

arXiv:2603.12118v2 Announce Type: replace Abstract: Any-to-Any models are an emerging class of multimodal models that accept combinations of multimodal data (e.g., text, image, video, audio) as input and generate them as output. Serving these models are challenging; different requests with…

Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model

arXiv:2604.24809v1 Announce Type: new Abstract: We present Nautile-370M, a 371-million-parameter small language model designed for efficient reasoning under strict parameter and inference budgets. Nautile-370M uses a hybrid backbone in which two SeqCond Attention (SCA) layers, a linear-time spectral sequence operator…