Archives AI News

mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT

arXiv:2603.21606v5 Announce Type: replace Abstract: Current language model training commonly applies multi-task Supervised Fine-Tuning (SFT) using a homogeneous compute budget across all sub-datasets. This approach is fundamentally sub-optimal: heterogeneous learning dynamics cause faster-learning tasks to overfit early while slower ones…

Missing-Aware Multimodal Fusion for Unified Microservice Incident Management

arXiv:2603.25538v2 Announce Type: replace Abstract: Automated incident management is critical for microservice reliability. While recent unified frameworks leverage multimodal data for joint optimization, they unrealistically assume perfect data completeness. In practice, network fluctuations and agent failures frequently cause missing modalities.…

H-Node Attack and Defense in Large Language Models

arXiv:2603.26045v1 Announce Type: new Abstract: We present H-Node Adversarial Noise Cancellation (H-Node ANC), a mechanistic framework that identifies, exploits, and defends hallucination representations in transformer-based large language models (LLMs) at the level of individual hidden-state dimensions. A logistic regression probe…

Curved representational Bregman divergences and their applications

arXiv:2504.05654v5 Announce Type: replace-cross Abstract: By analogy to the terminology of curved exponential families in statistics, we define curved Bregman divergences as Bregman divergences restricted to non-affine parameter subspaces and sub-dimensional Bregman divergences when the restrictions are affine. A common…

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

arXiv:2512.02425v2 Announce Type: replace-cross Abstract: Recent advances in video large language models have demonstrated strong capabilities in understanding short clips. However, scaling them to hours- or days-long videos remains highly challenging due to limited context capacity and the loss of…