Archives AI News

The NazoNazo Benchmark: A Cost-Effective and Extensible Test of Insight-Based Reasoning in LLMs

arXiv:2509.14704v1 Announce Type: new Abstract: Benchmark saturation and contamination undermine confidence in LLM evaluation. We present Nazonazo, a cost-effective and extensible benchmark built from Japanese children’s riddles to test insight-based reasoning. Items are short (mostly one sentence), require no specialized…

September 19, 2025

From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing

arXiv:2509.14289v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to automate or augment penetration testing, but their effectiveness and reliability across attack phases remain unclear. We present a comprehensive evaluation of multiple LLM-based agents, from single-agent to…

September 19, 2025

Enhancing Retrieval Augmentation via Adversarial Collaboration

arXiv:2509.14750v1 Announce Type: new Abstract: Retrieval-augmented Generation (RAG) is a prevalent approach for domain-specific LLMs, yet it is often plagued by “Retrieval Hallucinations”–a phenomenon where fine-tuned models fail to recognize and act upon poor-quality retrieved documents, thus undermining performance. To…

September 19, 2025

Communication Efficient Split Learning of ViTs with Attention-based Double Compression

arXiv:2509.15058v1 Announce Type: cross Abstract: This paper proposes a novel communication-efficient Split Learning (SL) framework, named Attention-based Double Compression (ADC), which reduces the communication overhead required for transmitting intermediate Vision Transformers activations during the SL training process. ADC incorporates two…

September 19, 2025

OpenLens AI: Fully Autonomous Research Agent for Health Infomatics

arXiv:2509.14778v1 Announce Type: new Abstract: Health informatics research is characterized by diverse data modalities, rapid knowledge expansion, and the need to integrate insights across biomedical science, data analytics, and clinical practice. These characteristics make it particularly well-suited for agent-based approaches…

September 19, 2025

Automatic Mapping of AutomationML Files to Ontologies for Graph Queries and Validation

arXiv:2504.21694v2 Announce Type: replace Abstract: AutomationML has seen widespread adoption as an open data exchange format in the automation domain. It is an open and vendor neutral standard based on the extensible markup language XML. However, AutomationML extends XML with…

September 19, 2025

Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems

arXiv:2509.14956v1 Announce Type: new Abstract: This paper proposes a novel architectural framework aimed at enhancing security and reliability in multi-agent systems (MAS). A central component of this framework is a network of Sentinel Agents, functioning as a distributed security layer…

September 19, 2025

Set Contribution Functions for Quantitative Bipolar Argumentation and their Principles

arXiv:2509.14963v1 Announce Type: new Abstract: We present functions that quantify the contribution of a set of arguments in quantitative bipolar argumentation graphs to (the final strength of) an argument of interest, a so-called topic. Our set contribution functions are generalizations…

September 19, 2025

3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection

arXiv:2410.10901v2 Announce Type: replace-cross Abstract: Large Language Models(LLMs) excel in general tasks but struggle in specialized domains like healthcare due to limited domain-specific knowledge.Supervised Fine-Tuning(SFT) data construction for domain adaptation often relies on heuristic methods, such as GPT-4 annotation or…

September 19, 2025

A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making

arXiv:2509.14998v1 Announce Type: new Abstract: Medical decision-making often involves integrating knowledge from multiple clinical specialties, typically achieved through multidisciplinary teams. Inspired by this collaborative process, recent work has leveraged large language models (LLMs) in multi-agent collaboration frameworks to emulate expert…

September 19, 2025