Archives AI News

Judge Model for Large-scale Multimodality Benchmarks

arXiv:2601.06106v1 Announce Type: new Abstract: We propose a dedicated multimodal Judge Model designed to provide reliable, explainable evaluation across a diverse suite of tasks. Our benchmark spans text, audio, image, and video modalities, drawing from carefully sampled public datasets with…

January 13, 2026

GroupSegment-SHAP: Shapley Value Explanations with Group-Segment Players for Multivariate Time Series

arXiv:2601.06114v1 Announce Type: new Abstract: Multivariate time-series models achieve strong predictive performance in healthcare, industry, energy, and finance, but how they combine cross-variable interactions with temporal dynamics remains unclear. SHapley Additive exPlanations (SHAP) are widely used for interpretation. However, existing…

January 13, 2026

The Impact of Post-training on Data Contamination

arXiv:2601.06103v1 Announce Type: new Abstract: We present a controlled study of how dataset contamination interacts with the post-training stages now standard in large language model training pipelines. Starting from clean checkpoints of Qwen2.5 (0.5B) and Gemma3 (1B/4B), we inject five…

January 13, 2026

Australian Bushfire Intelligence with AI-Driven Environmental Analytics

arXiv:2601.06105v1 Announce Type: new Abstract: Bushfires are among the most destructive natural hazards in Australia, causing significant ecological, economic, and social damage. Accurate prediction of bushfire intensity is therefore essential for effective disaster preparedness and response. This study examines the…

January 13, 2026

Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs

arXiv:2601.06100v1 Announce Type: new Abstract: We present a theory-first framework that interprets inference-time adaptation in large language models (LLMs) as online Bayesian state estimation. Rather than modeling rapid adaptation as implicit optimization or meta-learning, we formulate task- and context-specific learning…

January 13, 2026

The Hessian of tall-skinny networks is easy to invert

arXiv:2601.06096v1 Announce Type: new Abstract: We describe an exact algorithm for solving linear systems $Hx=b$ where $H$ is the Hessian of a deep net. The method computes Hessian-inverse-vector products without storing the Hessian or its inverse in time and storage…

January 13, 2026

Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking

arXiv:2601.06065v1 Announce Type: new Abstract: The need for long-context reasoning has led to alternative neural network architectures besides Transformers and self-attention, a popular model being Hyena, which employs causal 1D-convolutions implemented with FFTs. Long convolutions enable efficient global context mixing,…

January 13, 2026

A Complete Decomposition of Stochastic Differential Equations

arXiv:2601.07834v1 Announce Type: cross Abstract: We show that any stochastic differential equation with prescribed time-dependent marginal distributions admits a decomposition into three components: a unique scalar field governing marginal evolution, a symmetric positive-semidefinite diffusion matrix field and a skew-symmetric matrix…

January 13, 2026

Stress Testing Machine Learning at $10^{10}$ Scale: A Comprehensive Study of Adversarial Robustness on Algebraically Structured Integer Streams

arXiv:2601.06117v1 Announce Type: new Abstract: This paper presents a large-scale stress test of machine learning systems using structured mathematical data as a benchmark. We evaluate the robustness of tree-based classifiers at an unprecedented scale, utilizing ten billion deterministic samples and…

January 13, 2026

DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models

arXiv:2503.04472v3 Announce Type: replace Abstract: Recent advancements in slow thinking reasoning models have shown exceptional performance in complex reasoning tasks. However, these models often exhibit overthinking (generating redundant reasoning steps for simple problems), leading to excessive computational resource usage. While…

January 13, 2026