Archives AI News

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

arXiv:2505.21893v2 Announce Type: replace-cross Abstract: Preference learning has become a central technique for aligning generative models with human expectations. Recently, it has been extended to diffusion models through methods like Direct Preference Optimization (DPO). However, existing approaches such as Diffusion-DPO…

Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training

arXiv:2506.12037v2 Announce Type: replace-cross Abstract: Training large language models typically demands extensive GPU memory and substantial financial investment, which poses a barrier for many small- to medium-sized teams. In this paper, we propose a full-parameter pre-training and fine-tuning framework based…

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments

arXiv:2509.21998v1 Announce Type: new Abstract: As LLMs are increasingly deployed as agents, agentic reasoning – the ability to combine tool use, especially search, and reasoning – becomes a critical skill. However, it is hard to disentangle agentic reasoning when evaluated…

A Notion of Uniqueness for the Adversarial Bayes Classifier

arXiv:2404.16956v3 Announce Type: replace-cross Abstract: We propose a new notion of uniqueness for the adversarial Bayes classifier in the setting of binary classification. Analyzing this concept produces a simple procedure for computing all adversarial Bayes classifiers for a well-motivated family…

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

arXiv:2505.14679v2 Announce Type: replace-cross Abstract: Lifelong learning enables large language models (LLMs) to adapt to evolving information by continually updating their internal knowledge. An ideal system should support efficient, wide-ranging updates while preserving existing capabilities and ensuring reliable deployment. Model…

Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings

arXiv:2507.07032v3 Announce Type: replace Abstract: Protein structure prediction often hinges on multiple sequence alignments (MSAs), which underperform on low-homology and orphan proteins. We introduce PLAME, a lightweight MSA design framework that leverages evolutionary embeddings from pretrained protein language models to…

Pre-Training Representations of Binary Code Using Contrastive Learning

arXiv:2210.05102v5 Announce Type: replace-cross Abstract: Binary code analysis and comprehension is critical to applications in reverse engineering and computer security tasks where source code is not available. Unfortunately, unlike source code, binary code lacks semantics and is more difficult for…

Process Reinforcement through Implicit Rewards

arXiv:2502.01456v2 Announce Type: replace Abstract: Dense process rewards have proven a more effective alternative to the sparse outcome-level rewards in the inference-time scaling of large language models (LLMs), particularly in tasks requiring complex multi-step reasoning. While dense rewards also offer…