Archives AI News

GRPO-$lambda$: Credit Assignment improves LLM Reasoning

arXiv:2510.00194v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving their reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown…

LoRAFusion: Efficient LoRA Fine-Tuning for LLMs

arXiv:2510.00206v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has become the leading Parameter-Efficient Fine-Tuning (PEFT) method for Large Language Models (LLMs), as it significantly reduces GPU memory usage while maintaining competitive fine-tuned model quality on downstream tasks. Despite these benefits,…

Training-free LLM Verification via Recycling Few-shot Examples

arXiv:2506.17251v2 Announce Type: replace Abstract: Although LLMs have achieved remarkable performance, the inherent stochasticity of their reasoning process and varying conclusions present significant challenges. Majority voting or Best-of-N with external verification models has been explored to find the most promising…

Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation

arXiv:2510.00212v1 Announce Type: new Abstract: Model-Agnostic Meta-Learning (MAML) is a versatile meta-learning framework applicable to both supervised learning and reinforcement learning (RL). However, applying MAML to meta-reinforcement learning (meta-RL) presents notable challenges. First, MAML relies on second-order gradient computations, leading…

Combating Noisy Labels via Dynamic Connection Masking

arXiv:2508.09697v2 Announce Type: replace Abstract: Noisy labels are inevitable in real-world scenarios. Due to the strong capacity of deep neural networks to memorize corrupted labels, these noisy labels can cause significant performance degradation. Existing research on mitigating the negative effects…