Archives AI News

STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation

arXiv:2509.20190v1 Announce Type: cross Abstract: In modern automotive development, security testing is critical for safeguarding systems against increasingly advanced threats. Attack trees are widely used to systematically represent potential attack vectors, but generating comprehensive test cases from these trees remains…

September 25, 2025

Analysis of approximate linear programming solution to Markov decision problem with log barrier function

arXiv:2509.19800v1 Announce Type: new Abstract: There are two primary approaches to solving Markov decision problems (MDPs): dynamic programming based on the Bellman equation and linear programming (LP). Dynamic programming methods are the most widely used and form the foundation of…

September 25, 2025

DRES: Benchmarking LLMs for Disfluency Removal

arXiv:2509.20321v1 Announce Type: cross Abstract: Disfluencies — such as “um,” “uh,” interjections, parentheticals, and edited statements — remain a persistent challenge for speech-driven systems, degrading accuracy in command interpretation, summarization, and conversational agents. We introduce DRES (Disfluency Removal Evaluation Suite),…

September 25, 2025

LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation

arXiv:2509.19839v1 Announce Type: new Abstract: Achieving robust safety alignment in large language models (LLMs) while preserving their utility remains a fundamental challenge. Existing approaches often struggle to balance comprehensive safety with fine-grained controllability at the representation level. We introduce LATENTGUARD,…

September 25, 2025

Exploring Explainable Multi-agent MCTS-minimax Hybrids in Board Game Using Process Mining

arXiv:2503.23326v3 Announce Type: replace Abstract: Monte-Carlo Tree Search (MCTS) is a family of sampling-based search algorithms widely used for online planning in sequential decision-making domains and at the heart of many recent advances in artificial intelligence. Understanding the behavior of…

September 25, 2025

CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain

arXiv:2509.19925v1 Announce Type: new Abstract: As enterprises increasingly integrate cloud-based large language models (LLMs) such as ChatGPT and Gemini into their legal document workflows, protecting sensitive contractual information – including Personally Identifiable Information (PII) and commercially sensitive clauses – has…

September 25, 2025

TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot

arXiv:2407.10999v2 Announce Type: replace-cross Abstract: With the rapid development of large language models (LLM), the evaluation of LLM becomes increasingly important. Measuring text generation tasks such as summarization and article creation is very difficult. Especially in specific application domains (e.g.,…

September 25, 2025

Embodied AI: From LLMs to World Models

arXiv:2509.20021v1 Announce Type: new Abstract: Embodied Artificial Intelligence (AI) is an intelligent system paradigm for achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications and driving the evolution from cyberspace to physical systems. Recent breakthroughs in Large…

September 25, 2025

Bridging Information Gaps with Comprehensive Answers: Improving the Diversity and Informativeness of Follow-Up Questions

arXiv:2502.17715v2 Announce Type: replace-cross Abstract: Generating diverse follow-up questions that uncover missing information remains challenging for conversational agents, particularly when they run on small, locally hosted models. To address this, we develop an information-gap-driven knowledge distillation pipeline in which a…

September 25, 2025

MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM

arXiv:2509.20067v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated notable potential in medical applications, yet they face substantial challenges in handling complex real-world clinical diagnoses using conventional prompting methods. Current prompt engineering and multi-agent approaches typically optimize isolated…

September 25, 2025