Archives AI News

Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?

arXiv:2503.10632v3 Announce Type: replace Abstract: Kolmogorov-Arnold networks (KANs) are a remarkable innovation that consists of learnable activation functions, with the potential to capture more complex relationships from data. Presently, KANs are deployed by replacing multilayer perceptrons (MLPs) in deep networks,…

Natural Gradient VI: Guarantees for Non-Conjugate Models

arXiv:2510.19163v1 Announce Type: new Abstract: Stochastic Natural Gradient Variational Inference (NGVI) is a widely used method for approximating posterior distribution in probabilistic models. Despite its empirical success and foundational role in variational inference, its theoretical underpinnings remain limited, particularly in…

Improved Exploration in GFlownets via Enhanced Epistemic Neural Networks

arXiv:2506.16313v2 Announce Type: replace Abstract: Efficiently identifying the right trajectories for training remains an open problem in GFlowNets. To address this, it is essential to prioritize exploration in regions of the state space where the reward distribution has not been…

MaNGO – Adaptable Graph Network Simulators via Meta-Learning

arXiv:2510.05874v2 Announce Type: replace Abstract: Accurately simulating physics is crucial across scientific domains, with applications spanning from robotics to materials science. While traditional mesh-based simulations are precise, they are often computationally expensive and require knowledge of physical parameters, such as…

Improving planning and MBRL with temporally-extended actions

arXiv:2505.15754v2 Announce Type: replace Abstract: Continuous time systems are often modeled using discrete time dynamics but this requires a small simulation step to maintain accuracy. In turn, this requires a large planning horizon which leads to computationally demanding planning problems…

Hubble: a Model Suite to Advance the Study of LLM Memorization

arXiv:2510.19811v1 Announce Type: cross Abstract: We present Hubble, a suite of fully open-source large language models (LLMs) for the scientific study of LLM memorization. Hubble models come in standard and perturbed variants: standard models are pretrained on a large English…

Deep Linear Probe Generators for Weight Space Learning

arXiv:2410.10811v2 Announce Type: replace Abstract: Weight space learning aims to extract information about a neural network, such as its training dataset or generalization error. Recent approaches learn directly from model weights, but this presents many challenges as weights are high-dimensional…