Archives AI News

Don’t be lazy: CompleteP enables compute-efficient deep transformers

arXiv:2505.01618v3 Announce Type: replace Abstract: We study compute efficiency of LLM training when using different parameterizations, i.e., rules for adjusting model and optimizer hyperparameters (HPs) as model size changes. Some parameterizations fail to transfer optimal base HPs (such as learning…

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

arXiv:2505.16690v4 Announce Type: replace Abstract: Post-training of large language models is essential for adapting pre-trained language models (PLMs) to align with human preferences and downstream tasks. While PLMs typically exhibit well-calibrated confidence, post-trained language models (PoLMs) often suffer from over-confidence,…

A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

arXiv:2510.17697v2 Announce Type: replace-cross Abstract: Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing…

Pre-training Epidemic Time Series Forecasters with Compartmental Prototypes

arXiv:2502.03393v5 Announce Type: replace Abstract: Accurate epidemic forecasting is crucial for outbreak preparedness, but existing data-driven models are often brittle. Typically trained on a single pathogen, they struggle with data scarcity during new outbreaks and fail under distribution shifts caused…

On the Robustness of Kernel Goodness-of-Fit Tests

arXiv:2408.05854v5 Announce Type: replace-cross Abstract: Goodness-of-fit testing is often criticized for its lack of practical relevance: since “all models are wrong”, the null hypothesis that the data conform to our model is ultimately always rejected as the sample size grows.…

Beyond the Ideal: Analyzing the Inexact Muon Update

arXiv:2510.19933v1 Announce Type: new Abstract: The Muon optimizer has rapidly emerged as a powerful, geometry-aware alternative to AdamW, demonstrating strong performance in large-scale training of neural networks. However, a critical theory-practice disconnect exists: Muon’s efficiency relies on fast, approximate orthogonalization,…

FINDER: Feature Inference on Noisy Datasets using Eigenspace Residuals

arXiv:2510.19917v1 Announce Type: new Abstract: ”Noisy” datasets (regimes with low signal to noise ratios, small sample sizes, faulty data collection, etc) remain a key research frontier for classification methods with both theoretical and practical implications. We introduce FINDER, a rigorous…