Archives AI News

On Evaluating LLM Alignment by Evaluating LLMs as Judges

arXiv:2511.20604v1 Announce Type: cross Abstract: Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models’ (LLMs) alignment typically involves directly assessing their…

November 26, 2025

When Should Neural Data Inform Welfare? A Critical Framework for Policy Uses of Neuroeconomics

arXiv:2511.19548v1 Announce Type: new Abstract: Neuroeconomics promises to ground welfare analysis in neural and computational evidence about how people value outcomes, learn from experience and exercise self-control. At the same time, policy and commercial actors increasingly invoke neural data to…

November 26, 2025

MGAS: Multi-Granularity Architecture Search for Trade-Off Between Model Effectiveness and Efficiency

arXiv:2310.15074v4 Announce Type: replace Abstract: Neural architecture search (NAS) has gained significant traction in automating the design of neural networks. To reduce search time, differentiable architecture search (DAS) reframes the traditional paradigm of discrete candidate sampling and evaluation into a…

November 26, 2025

Researchers discover a shortcoming that makes LLMs less reliable

Large language models can learn to mistakenly link certain sentence patterns with specific topics — and may then repeat these patterns instead of reasoning.

November 26, 2025

Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting

arXiv:2506.20024v2 Announce Type: replace Abstract: Diffusion models are a powerful tool for probabilistic forecasting, yet most applications in high-dimensional complex systems predict future states individually. This approach struggles to model complex temporal dependencies and fails to explicitly account for the…

November 26, 2025

Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition

arXiv:2511.20641v1 Announce Type: cross Abstract: Long-tailed multi-label visual recognition poses a significant challenge, as images typically contain multiple labels with highly imbalanced class distributions, leading to biased models that favor head classes while underperforming on tail classes. Recent efforts have…

November 26, 2025

Domain Fusion Controllable Generalization for Cross-Domain Time Series Forecasting from Multi-Domain Integrated Distribution

arXiv:2412.03068v2 Announce Type: replace Abstract: Conventional deep models have achieved unprecedented success in time series forecasting. However, facing the challenge of cross-domain generalization, existing studies utilize statistical prior as prompt engineering fails under the huge distribution shift among various domains.…

November 26, 2025

Softmax Transformers are Turing-Complete

arXiv:2511.20038v1 Announce Type: cross Abstract: Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax…

November 26, 2025

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flow

arXiv:2511.20462v1 Announce Type: cross Abstract: Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are…

November 26, 2025

Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification

arXiv:2511.19486v1 Announce Type: new Abstract: Driven by recent advances in artificial intelligence (AI), a growing body of work demonstrates the potential of using large language models (LLMs) to generate human-like responses in market research and social science applications. Two primary…

November 26, 2025