Archives AI News

Exact Causal Attention with 10% Fewer Operations

arXiv:2510.05175v1 Announce Type: new Abstract: We present Fast Causal Attention (FCA), an algorithm that computes exact Causal Attention using 10% fewer operations. FCA accelerates a special class of matrix multiplications where either one operand or the output matrix is upper-…

Expected Free Energy-based Planning as Variational Inference

arXiv:2504.14898v4 Announce Type: replace-cross Abstract: We address the problem of planning under uncertainty, where an agent must choose actions that not only achieve desired outcomes but also reduce uncertainty. Traditional methods often treat exploration and exploitation as separate objectives, lacking…

PatternKV: Flattening KV Representation Expands Quantization Headroom

arXiv:2510.05176v1 Announce Type: new Abstract: KV cache in autoregressive LLMs eliminates redundant recomputation but has emerged as the dominant memory and bandwidth bottleneck during inference, notably with long contexts and test-time scaling. KV quantization is a key lever for reducing…

FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders

arXiv:2510.05829v1 Announce Type: cross Abstract: In this work, we present FoleyGRAM, a novel approach to video-to-audio generation that emphasizes semantic conditioning through the use of aligned multimodal encoders. Building on prior advancements in video-to-audio generation, FoleyGRAM leverages the Gramian Representation…

Moloch’s Bargain: Emergent Misalignment When LLMs Compete for Audiences

arXiv:2510.06105v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly shaping how information is created and disseminated, from companies using them to craft persuasive advertisements, to election campaigns optimizing messaging to gain votes, to social media influencers boosting engagement.…

A Data-Driven Prism: Multi-View Source Separation with Diffusion Model Priors

arXiv:2510.05205v1 Announce Type: new Abstract: A common challenge in the natural sciences is to disentangle distinct, unknown sources from observations. Examples of this source separation task include deblending galaxies in a crowded field, distinguishing the activity of individual neurons from…

Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models

arXiv:2406.01899v3 Announce Type: replace Abstract: Models for natural language and images benefit from data scaling behavior: the more data fed into the model, the better they perform. This ‘better with more’ phenomenon enables the effectiveness of large-scale pre-training on vast…

Approximate Gaussianity Beyond Initialisation in Neural Networks

arXiv:2510.05218v1 Announce Type: new Abstract: Ensembles of neural network weight matrices are studied through the training process for the MNIST classification problem, testing the efficacy of matrix models for representing their distributions, under assumptions of Gaussianity and permutation-symmetry. The general…