Archives AI News

Ai2 Researchers are Changing the Benchmarking Game by Introducing Fluid Benchmarking that Enhances Evaluation along Several Dimensions

A team of researchers from Allen Institute for Artificial Intelligence (Ai2), University of Washington and CMU introduce Fluid Benchmarking, an adaptive LLM evaluation method that replaces static accuracy with 2-parameter IRT ability estimation and Fisher-information–driven item selection. By asking only the…

September 17, 2025

Canon Promo Codes: 10% Off | September 2025

Save 10%, plus over $900 on cameras, lenses, and more with today’s Canon coupons for new and refurbished tech.

September 17, 2025

LegalZoom Promo Code: Exclusive 10% Off LLC Formations

Save on top services at LegalZoom, like LLC registration, incorporation, estate plans, and more with coupons and deals from WIRED.

September 17, 2025

Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

arXiv:2509.12507v1 Announce Type: cross Abstract: One of the main goals of robotics and intelligent agent research is to enable natural communication with humans in physically situated settings. While recent work has focused on verbal modes such as language and speech,…

September 17, 2025

Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning

arXiv:2504.17356v2 Announce Type: replace-cross Abstract: Feature selection aims to preprocess the target dataset, find an optimal and most streamlined feature subset, and enhance the downstream machine learning task. Among filter, wrapper, and embedded-based approaches, the reinforcement learning (RL)-based subspace exploration…

September 17, 2025

Minimax optimal transfer learning for high-dimensional additive regression

arXiv:2509.06308v2 Announce Type: replace-cross Abstract: This paper studies high-dimensional additive regression under the transfer learning framework, where one observes samples from a target population together with auxiliary samples from different but potentially related regression models. We first introduce a target-only…

September 17, 2025

LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

arXiv:2507.20999v3 Announce Type: replace Abstract: Large-scale generative models like DeepSeek-R1 and OpenAI-O1 benefit substantially from chain-of-thought (CoT) reasoning, yet pushing their performance typically requires vast data, large model sizes, and full-parameter fine-tuning. While parameter-efficient fine-tuning (PEFT) helps reduce cost, most…

September 17, 2025

Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models

arXiv:2403.16393v2 Announce Type: replace-cross Abstract: The wide adoption of Large language models (LLMs) makes their dependability a pressing concern. Detection of errors is the first step to mitigating their impact on a system and thus, efficient error detection for LLMs…

September 17, 2025

Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection

arXiv:2509.12990v1 Announce Type: cross Abstract: In this report, we address the problem of determining whether a user performs an action incorrectly from egocentric video data. To handle the challenges posed by subtle and infrequent mistakes, we propose a Dual-Stage Reweighted…

September 17, 2025

Empowering Time Series Analysis with Foundation Models: A Comprehensive Survey

arXiv:2405.02358v3 Announce Type: replace Abstract: Time series data are ubiquitous across diverse real-world applications, making time series analysis critically important. Traditional approaches are largely task-specific, offering limited functionality and poor transferability. In recent years, foundation models have revolutionized NLP and…

September 17, 2025