Archives AI News

DualTune: Decoupled Fine-Tuning for On-Device Agentic Systems

arXiv:2510.00229v1 Announce Type: new Abstract: The deployment of Large Language Models (LLMs) as agentic orchestrators has revolutionized task automation, but the need for privacy-preserving, cost-effective solutions demands on-device inference capabilities. However, local LLMs consistently underperform compared to frontier models in…

October 2, 2025

Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes

arXiv:2501.08521v3 Announce Type: replace-cross Abstract: Federated Learning (FL) has emerged as a decentralized machine learning technique, allowing clients to train a global model collaboratively without sharing private data. However, most FL studies ignore the crucial challenge of heterogeneous domains where…

October 2, 2025

MAGIC-MASK: Multi-Agent Guided Inter-Agent Collaboration with Mask-Based Explainability for Reinforcement Learning

arXiv:2510.00274v1 Announce Type: new Abstract: Understanding the decision-making process of Deep Reinforcement Learning agents remains a key challenge for deploying these systems in safety-critical and multi-agent environments. While prior explainability methods like StateMask, have advanced the identification of critical states,…

October 2, 2025

EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework

arXiv:2506.22200v4 Announce Type: replace-cross Abstract: Recent advances in reinforcement learning (RL) have significantly enhanced the reasoning capabilities of large language models (LLMs). Group Relative Policy Optimization (GRPO), a lightweight variant of Proximal Policy Optimization (PPO), improves efficiency but suffers from…

October 2, 2025

ICL Optimized Fragility

arXiv:2510.00300v1 Announce Type: new Abstract: ICL guides are known to improve task-specific performance, but their impact on cross-domain cognitive abilities remains unexplored. This study examines how ICL guides affect reasoning across different knowledge domains using six variants of the GPT-OSS:20b…

October 2, 2025

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

arXiv:2509.24945v2 Announce Type: replace-cross Abstract: The paradigm shift in large language models (LLMs) from instinctive responses to chain-of-thought (CoT) reasoning has fueled two prevailing assumptions: (1) reasoning capabilities only emerge in sufficiently large models, and (2) such capabilities require training…

October 2, 2025

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

arXiv:2510.00307v1 Announce Type: new Abstract: Agents backed by large language models (LLMs) often rely on external tools drawn from marketplaces where multiple providers offer functionally equivalent options. This raises a critical point concerning fairness: if selection is systematically biased, it…

October 2, 2025

MetaLogic: Robustness Evaluation of Text-to-Image Models via Logically Equivalent Prompts

arXiv:2510.00796v1 Announce Type: cross Abstract: Recent advances in text-to-image (T2I) models, especially diffusion-based architectures, have significantly improved the visual quality of generated images. However, these models continue to struggle with a critical limitation: maintaining semantic consistency when input prompts undergo…

October 2, 2025

When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets

arXiv:2510.00332v1 Announce Type: new Abstract: We present CAIA, a benchmark exposing a critical blind spot in AI evaluation: the inability of state-of-the-art models to operate in adversarial, high-stakes environments where misinformation is weaponized and errors are irreversible. While existing benchmarks…

October 2, 2025

TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes

arXiv:2510.00906v1 Announce Type: cross Abstract: Interactive Imitation Learning deals with training a novice policy from expert demonstrations in an online fashion. The established DAgger algorithm trains a robust novice policy by alternating between interacting with the environment and retraining of…

October 2, 2025