Archives AI News

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

arXiv:2603.10101v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capacity of Large Language Models (LLMs). However, RLVR solely relies on final answers as outcome rewards, neglecting the correctness of intermediate reasoning steps. Training…

V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation

arXiv:2603.11042v1 Announce Type: cross Abstract: Generating music that temporally aligns with video events is challenging for existing text-to-music models, which lack fine-grained temporal control. We introduce V2M-Zero, a zero-pair video-to-music generation approach that outputs time-aligned music for video. Our method…

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

arXiv:2410.02113v3 Announce Type: replace Abstract: Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to…

Losing dimensions: Geometric memorization in generative diffusion

arXiv:2410.08727v2 Announce Type: replace-cross Abstract: Diffusion models power leading generative AI, but when and how they memorize training data, especially on low-dimensional manifolds, remains unclear. We find memorization emerges gradually, not abruptly: as data become scarce, diffusion models experience a…

Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

arXiv:2510.23914v2 Announce Type: replace Abstract: While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence…

Latent Poincar’e Shaping for Agentic Reinforcement Learning

arXiv:2602.09375v3 Announce Type: replace Abstract: We propose LaPha, a method for training AlphaZero-like LLM agents in a Poincar’e latent space. Under LaPha, the search process can be visualized as a tree rooted at the prompt and growing outward from the…

Kernel Tests of Equivalence

arXiv:2603.10886v1 Announce Type: cross Abstract: We propose novel kernel-based tests for assessing the equivalence between distributions. Traditional goodness-of-fit testing is inappropriate for concluding the absence of distributional differences, because failure to reject the null hypothesis may simply be a result…