Archives AI News

Mirror-Consistency: Harnessing Inconsistency in Majority Voting

arXiv:2410.10857v2 Announce Type: replace-cross Abstract: Self-Consistency, a widely-used decoding strategy, significantly boosts the reasoning capabilities of Large Language Models (LLMs). However, it depends on the plurality voting rule, which focuses on the most frequent answer while overlooking all other minority…

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

arXiv:2509.13761v1 Announce Type: new Abstract: Large Language Models (LLMs) have made remarkable progress in mathematical reasoning, but still continue to struggle with high-precision tasks like numerical computation and formal symbolic manipulation. Integrating external tools has emerged as a promising approach…

Forget What You Know about LLMs Evaluations — LLMs are Like a Chameleon

arXiv:2502.07445v2 Announce Type: replace-cross Abstract: Large language models (LLMs) often appear to excel on public benchmarks, but these high scores may mask an overreliance on dataset-specific surface cues rather than true language understanding. We introduce the Chameleon Benchmark Overfit Detector…

Pareto-Grid-Guided Large Language Models for Fast and High-Quality Heuristics Design in Multi-Objective Combinatorial Optimization

arXiv:2507.20923v2 Announce Type: replace-cross Abstract: Multi-objective combinatorial optimization problems (MOCOP) frequently arise in practical applications that require the simultaneous optimization of conflicting objectives. Although traditional evolutionary algorithms can be effective, they typically depend on domain knowledge and repeated parameter tuning,…

CrowdAgent: Multi-Agent Managed Multi-Source Annotation System

arXiv:2509.14030v1 Announce Type: new Abstract: High-quality annotated data is a cornerstone of modern Natural Language Processing (NLP). While recent methods begin to leverage diverse annotation sources-including Large Language Models (LLMs), Small Language Models (SLMs), and human experts-they often focus narrowly…