Archives AI News

FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness

arXiv:2509.13334v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning has emerged as a powerful tool for improving large language model performance on complex tasks, but recent work shows that reasoning steps often fail to causally influence the final answer, creating brittle…

Asterisk Operator

arXiv:2509.13364v1 Announce Type: new Abstract: We propose the textbf{Asterisk Operator} ($ast$-operator), a novel unified framework for abstract reasoning based on Adjacency-Structured Parallel Propagation (ASPP). The operator formalizes structured reasoning tasks as local, parallel state evolution processes guided by implicit relational…

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

arXiv:2408.11824v4 Announce Type: replace-cross Abstract: With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices.…

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation

arXiv:2509.13368v1 Announce Type: new Abstract: Reinforcement learning agent development traditionally requires extensive expertise and lengthy iterations, often resulting in high failure rates and limited accessibility. This paper introduces $Agent^2$, a novel agent-generates-agent framework that achieves fully automated RL agent design…

CoPL: Collaborative Preference Learning for Personalizing LLMs

arXiv:2503.01658v2 Announce Type: replace-cross Abstract: Personalizing large language models (LLMs) is important for aligning outputs with diverse user preferences, yet existing methods struggle with flexibility and generalization. We propose CoPL (Collaborative Preference Learning), a graph-based collaborative filtering framework that models…

The Art of Saying “Maybe”: A Conformal Lens for Uncertainty Benchmarking in VLMs

arXiv:2509.13379v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have achieved remarkable progress in complex visual understanding across scientific and reasoning tasks. While performance benchmarking has advanced our understanding of these capabilities, the critical dimension of uncertainty quantification has received insufficient…

MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform

arXiv:2506.00308v2 Announce Type: replace-cross Abstract: Understanding the prevalence of misinformation in health topics online can inform public health policies and interventions. However, measuring such misinformation at scale remains a challenge, particularly for high-stakes but understudied topics like opioid-use disorder (OUD)–a…

From Next Token Prediction to (STRIPS) World Models — Preliminary Results

arXiv:2509.13389v1 Announce Type: new Abstract: We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where…

Zero-Knowledge Proofs in Sublinear Space

arXiv:2509.05326v2 Announce Type: replace-cross Abstract: Zero-knowledge proofs allow verification of computations without revealing private information. However, existing systems require memory proportional to the computation size, which has historically limited use in large-scale applications and on mobile and edge devices. We…

SteeringControl: Holistic Evaluation of Alignment Steering in LLMs

arXiv:2509.13450v1 Announce Type: new Abstract: We introduce SteeringControl, a benchmark for evaluating representation steering methods across core alignment objectives–bias, harmful generation, and hallucination–and their effects on secondary behaviors such as sycophancy and commonsense morality. While prior alignment work often highlights…