Archives AI News

On Evaluating LLM Alignment by Evaluating LLMs as Judges

arXiv:2511.20604v1 Announce Type: cross Abstract: Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models’ (LLMs) alignment typically involves directly assessing their…

Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting

arXiv:2506.20024v2 Announce Type: replace Abstract: Diffusion models are a powerful tool for probabilistic forecasting, yet most applications in high-dimensional complex systems predict future states individually. This approach struggles to model complex temporal dependencies and fails to explicitly account for the…

Softmax Transformers are Turing-Complete

arXiv:2511.20038v1 Announce Type: cross Abstract: Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax…

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flow

arXiv:2511.20462v1 Announce Type: cross Abstract: Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are…