Archives AI News

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

arXiv:2505.22811v2 Announce Type: replace-cross Abstract: Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which…

First Hallucination Tokens Are Different from Conditional Ones

arXiv:2507.20836v3 Announce Type: replace Abstract: Large Language Models (LLMs) hallucinate, and detecting these cases is key to ensuring trust. While many approaches address hallucination detection at the response or span level, recent work explores token-level detection, enabling more fine-grained intervention.…

Wasserstein Bounds for generative diffusion models with Gaussian tail targets

arXiv:2412.11251v2 Announce Type: replace Abstract: We present an estimate of the Wasserstein distance between the data distribution and the generation of score-based generative models. The sampling complexity with respect to dimension is $mathcal{O}(sqrt{d})$, with a logarithmic constant. In the analysis,…

On the $O(frac{sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $ell_1$ Norm

arXiv:2505.11840v3 Announce Type: replace Abstract: As the default optimizer for training large language models, AdamW has achieved remarkable success in deep learning. However, its convergence behavior is not theoretically well-understood. This paper establishes the convergence rate $frac{1}{K}sum_{k=1}^KEleft[||nabla f(x^k)||_1right]leq O(frac{sqrt{d}C}{K^{1/4}})$ for…

Oracle-based Uniform Sampling from Convex Bodies

arXiv:2510.02983v1 Announce Type: cross Abstract: We propose new Markov chain Monte Carlo algorithms to sample a uniform distribution on a convex body $K$. Our algorithms are based on the Alternating Sampling Framework/proximal sampler, which uses Gibbs sampling on an augmented…