Archives AI News

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

arXiv:2603.04177v1 Announce Type: cross Abstract: Large language model (LLM) coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt. Human developers address such issues through refactoring: behavior-preserving program transformations that improve structure and maintainability.…

SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

arXiv:2505.20065v2 Announce Type: replace Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world applications, balancing helpfulness and safety has become a central challenge. A natural approach is to incorporate safety constraints into Reinforcement Learning from Human Feedback (RLHF),…

Semi-Supervised Generative Learning via Latent Space Distribution Matching

arXiv:2603.04223v1 Announce Type: cross Abstract: We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and unpaired data, and…

List Sample Compression and Uniform Convergence

arXiv:2403.10889v2 Announce Type: replace Abstract: List learning is a variant of supervised classification where the learner outputs multiple plausible labels for each instance rather than just one. We investigate classical principles related to generalization within the context of list learning.…

A Short Note on a Variant of the Squint Algorithm

arXiv:2603.03409v1 Announce Type: new Abstract: This short note describes a simple variant of the Squint algorithm of Koolen and Van Erven [2015] for the classic expert problem. Via an equally simple modification of their proof, we prove that this variant…

[Re] FairDICE: A Gap Between Theory And Practice

arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not…