Archives AI News

Mirror-Consistency: Harnessing Inconsistency in Majority Voting

arXiv:2410.10857v2 Announce Type: replace-cross Abstract: Self-Consistency, a widely-used decoding strategy, significantly boosts the reasoning capabilities of Large Language Models (LLMs). However, it depends on the plurality voting rule, which focuses on the most frequent answer while overlooking all other minority…

September 18, 2025

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

arXiv:2509.13761v1 Announce Type: new Abstract: Large Language Models (LLMs) have made remarkable progress in mathematical reasoning, but still continue to struggle with high-precision tasks like numerical computation and formal symbolic manipulation. Integrating external tools has emerged as a promising approach…

September 18, 2025

Forget What You Know about LLMs Evaluations — LLMs are Like a Chameleon

arXiv:2502.07445v2 Announce Type: replace-cross Abstract: Large language models (LLMs) often appear to excel on public benchmarks, but these high scores may mask an overreliance on dataset-specific surface cues rather than true language understanding. We introduce the Chameleon Benchmark Overfit Detector…

September 18, 2025

MIRA: Empowering One-Touch AI Services on Smartphones with MLLM-based Instruction Recommendation

arXiv:2509.13773v1 Announce Type: new Abstract: The rapid advancement of generative AI technologies is driving the integration of diverse AI-powered services into smartphones, transforming how users interact with their devices. To simplify access to predefined AI services, this paper introduces MIRA,…

September 18, 2025

Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations

arXiv:2504.01153v4 Announce Type: replace-cross Abstract: While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or ‘hallucinations’ with potentially disastrous consequences. The recent integration of web search results into LLMs…

September 18, 2025

An Exhaustive DPLL Approach to Model Counting over Integer Linear Constraints with Simplification Techniques

arXiv:2509.13880v1 Announce Type: new Abstract: Linear constraints are one of the most fundamental constraints in fields such as computer science, operations research and optimization. Many applications reduce to the task of model counting over integer linear constraints (MCILC). In this…

September 18, 2025

From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

arXiv:2505.12381v2 Announce Type: replace-cross Abstract: Current research on bias in language models (LMs) predominantly focuses on data quality, with significantly less attention paid to model architecture and temporal influences of data. Even more critically, few studies systematically investigate the origins…

September 18, 2025

Exploring Major Transitions in the Evolution of Biological Cognition With Artificial Neural Networks

arXiv:2509.13968v1 Announce Type: new Abstract: Transitional accounts of evolution emphasise a few changes that shape what is evolvable, with dramatic consequences for derived lineages. More recently it has been proposed that cognition might also have evolved via a series of…

September 18, 2025

Pareto-Grid-Guided Large Language Models for Fast and High-Quality Heuristics Design in Multi-Objective Combinatorial Optimization

arXiv:2507.20923v2 Announce Type: replace-cross Abstract: Multi-objective combinatorial optimization problems (MOCOP) frequently arise in practical applications that require the simultaneous optimization of conflicting objectives. Although traditional evolutionary algorithms can be effective, they typically depend on domain knowledge and repeated parameter tuning,…

September 18, 2025

CrowdAgent: Multi-Agent Managed Multi-Source Annotation System

arXiv:2509.14030v1 Announce Type: new Abstract: High-quality annotated data is a cornerstone of modern Natural Language Processing (NLP). While recent methods begin to leverage diverse annotation sources-including Large Language Models (LLMs), Small Language Models (SLMs), and human experts-they often focus narrowly…

September 18, 2025