Archives AI News

OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature

arXiv:2505.22945v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are known to memorize and recall English text from their pretraining data. However, the extent to which this ability generalizes to non-English languages or transfers across languages remains unclear. This paper…

October 8, 2025

AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems

arXiv:2510.05432v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate impressive capabilities across a wide range of tasks, yet it remains unclear whether such success reflects genuine reasoning or sophisticated recall. We introduce AInstein, a framework for testing whether LLMs…

October 8, 2025

RooseBERT: A New Deal For Political Language Modelling

arXiv:2508.03250v2 Announce Type: replace-cross Abstract: The increasing amount of political debates and politics-related discussions calls for the definition of novel computational methods to automatically analyse such content with the final goal of lightening up political deliberation to citizens. However, the…

October 8, 2025

NASP-T: A Fuzzy Neuro-Symbolic Transformer for Logic-Constrained Aviation Safety Report Classification

arXiv:2510.05451v1 Announce Type: new Abstract: Deep transformer models excel at multi-label text classification but often violate domain logic that experts consider essential, an issue of particular concern in safety-critical applications. We propose a hybrid neuro-symbolic framework that integrates Answer Set…

October 8, 2025

Do Code Models Suffer from the Dunning-Kruger Effect?

arXiv:2510.05457v1 Announce Type: new Abstract: As artificial intelligence systems increasingly collaborate with humans in creative and technical domains, questions arise about the cognitive boundaries and biases that shape our shared agency. This paper investigates the Dunning-Kruger Effect (DKE), the tendency…

October 8, 2025

AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives

arXiv:2510.04983v2 Announce Type: replace-cross Abstract: Identifying cultural capital (CC) themes in student reflections can offer valuable insights that help foster equitable learning environments in classrooms. However, themes such as aspirational goals or family support are often woven into narratives, rather…

October 8, 2025

VAL-Bench: Measuring Value Alignment in Language Models

arXiv:2510.05465v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for tasks where outputs shape human decisions, so it is critical to test whether their responses reflect consistent human values. Existing benchmarks mostly track refusals or predefined safety…

October 8, 2025

EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

arXiv:2510.05942v1 Announce Type: cross Abstract: We present EvalMORAAL, a transparent chain-of-thought (CoT) framework that uses two scoring methods (log-probabilities and direct ratings) plus a model-as-judge peer review to evaluate moral alignment in 20 large language models. We assess models on…

October 8, 2025

Vul-R2: A Reasoning LLM for Automated Vulnerability Repair

arXiv:2510.05480v1 Announce Type: new Abstract: The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research has formulated AVR as a sequence generation problem and has leveraged large language models (LLMs) to…

October 8, 2025

Emergent AI Surveillance: Overlearned Person Re-Identification and Its Mitigation in Law Enforcement Context

arXiv:2510.06026v1 Announce Type: cross Abstract: Generic instance search models can dramatically reduce the manual effort required to analyze vast surveillance footage during criminal investigations by retrieving specific objects of interest to law enforcement. However, our research reveals an unintended emergent…

October 8, 2025