Archives AI News

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels

arXiv:2511.02872v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of…

November 7, 2025

Beyond the Kolmogorov Barrier: A Learnable Weighted Hybrid Autoencoder for Model Order Reduction

arXiv:2410.18148v5 Announce Type: replace Abstract: Representation learning for high-dimensional, complex physical systems aims to identify a low-dimensional intrinsic latent space, which is crucial for reduced-order modeling and modal analysis. To overcome the well-known Kolmogorov barrier, deep autoencoders (AEs) have been…

November 7, 2025

Exact Expressive Power of Transformers with Padding

arXiv:2505.18948v2 Announce Type: replace Abstract: Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there more efficient alternatives to expand a…

November 7, 2025

Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

arXiv:2511.03808v1 Announce Type: new Abstract: Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely…

November 7, 2025

One Size Does Not Fit All: Architecture-Aware Adaptive Batch Scheduling with DEBA

arXiv:2511.03809v1 Announce Type: new Abstract: Adaptive batch size methods aim to accelerate neural network training, but existing approaches apply identical adaptation strategies across all architectures, assuming a one-size-fits-all solution. We introduce DEBA (Dynamic Efficient Batch Adaptation), an adaptive batch scheduler…

November 7, 2025

FusionDP: Foundation Model-Assisted Differentially Private Learning for Partially Sensitive Features

arXiv:2511.03806v1 Announce Type: new Abstract: Ensuring the privacy of sensitive training data is crucial in privacy-preserving machine learning. However, in practical scenarios, privacy protection may be required for only a subset of features. For instance, in ICU data, demographic attributes…

November 7, 2025

Fair and Explainable Credit-Scoring under Concept Drift: Adaptive Explanation Frameworks for Evolving Populations

arXiv:2511.03807v1 Announce Type: new Abstract: Evolving borrower behaviors, shifting economic conditions, and changing regulatory landscapes continuously reshape the data distributions underlying modern credit-scoring systems. Conventional explainability techniques, such as SHAP, assume static data and fixed background distributions, making their explanations…

November 7, 2025

Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

arXiv:2511.03774v1 Announce Type: new Abstract: Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due…

November 7, 2025

What’s in Common? Multimodal Models Hallucinate When Reasoning Across Scenes

arXiv:2511.03768v1 Announce Type: new Abstract: Multimodal language models possess a remarkable ability to handle an open-vocabulary’s worth of objects. Yet the best models still suffer from hallucinations when reasoning about scenes in the real world, revealing a gap between their…

November 7, 2025

Laugh, Relate, Engage: Stylized Comment Generation for Short Videos

arXiv:2511.03757v1 Announce Type: new Abstract: Short-video platforms have become a central medium in the modern Internet landscape, where efficient information delivery and strong interactivity are reshaping user engagement and cultural dissemination. Among the various forms of user interaction, comments play…

November 7, 2025