Archives AI News

BuilderBench — A benchmark for generalist agents

arXiv:2510.06288v1 Announce Type: new Abstract: Today’s AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills…

October 9, 2025

Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization

arXiv:2507.03318v2 Announce Type: replace-cross Abstract: Explainable artificial intelligence (XAI) approaches have been increasingly applied in drug discovery to learn molecular representations and identify substructures driving property predictions. However, building end-to-end explainable models for structure-activity relationship (SAR) modeling for compound property…

October 9, 2025

Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them

arXiv:2510.06534v1 Announce Type: new Abstract: Agentic search leverages large language models (LLMs) to interpret complex user information needs and execute a multi-step process of planning, searching, and synthesizing information to provide answers. This paradigm introduces unique challenges for LLMs’ reasoning…

October 9, 2025

Community-Centered Spatial Intelligence for Climate Adaptation at Nova Scotia’s Eastern Shore

arXiv:2509.01845v2 Announce Type: replace-cross Abstract: This paper presents an overview of a human-centered initiative aimed at strengthening climate resilience along Nova Scotia’s Eastern Shore. This region, a collection of rural villages with deep ties to the sea, faces existential threats…

October 9, 2025

Auto-Prompt Ensemble for LLM Judge

arXiv:2510.06538v1 Announce Type: new Abstract: We present a novel framework that improves the reliability of LLM judges by selectively augmenting LLM with auxiliary evaluation dimensions. Existing LLM judges often miss crucial evaluation dimensions because they fail to recognize the implicit…

October 9, 2025

Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization

arXiv:2510.06274v1 Announce Type: new Abstract: Recent progress has pushed AI frontiers from pattern recognition tasks toward problems that require step by step, System2 style reasoning, especially with large language models. Yet, unlike learning, where generalization and out of distribution (OoD)…

October 9, 2025

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

arXiv:2510.06587v1 Announce Type: new Abstract: Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information…

October 9, 2025

Resolution scaling governs DINOv3 transfer performance in chest radiograph classification

arXiv:2510.07191v1 Announce Type: cross Abstract: Self-supervised learning (SSL) has advanced visual representation learning, but its value in chest radiography, a high-volume imaging modality with fine-grained findings, remains unclear. Meta’s DINOv3 extends earlier SSL models through Gram-anchored self-distillation. Whether these design…

October 9, 2025

Fine-Grained Emotion Recognition via In-Context Learning

arXiv:2510.06600v1 Announce Type: new Abstract: Fine-grained emotion recognition aims to identify the emotional type in queries through reasoning and decision-making processes, playing a crucial role in various systems. Recent methods use In-Context Learning (ICL), enhancing the representation of queries in…

October 9, 2025

Vibe Checker: Aligning Code Evaluation with Human Preference

arXiv:2510.07315v1 Announce Type: cross Abstract: Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check is tied to real-world human…

October 9, 2025