Archives AI News

Beyond Introspection: Reinforcing Thinking via Externalist Behavioral Feedback

arXiv:2501.01457v3 Announce Type: replace Abstract: While inference-time thinking allows Large Language Models (LLMs) to address complex problems, the extended thinking process can be unreliable or inconsistent because of the model’s probabilistic nature, especially near its knowledge boundaries. Existing approaches attempt…

Spatio-Temporal Hierarchical Causal Models

arXiv:2511.20558v2 Announce Type: replace-cross Abstract: The abundance of fine-grained spatio-temporal data, such as traffic sensor networks, offers vast opportunities for scientific discovery. However, inferring causal relationships from such observational data remains challenging, particularly due to unobserved confounders that are specific…

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

arXiv:2511.20798v2 Announce Type: replace Abstract: Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be…

Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

arXiv:2505.17703v2 Announce Type: replace-cross Abstract: Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens. This symbolic search is fundamentally limited by its…

Lightweight ML-Based Air Quality Prediction for IoT and Embedded Applications

arXiv:2511.21857v1 Announce Type: new Abstract: This study investigates the effectiveness and efficiency of two variants of the XGBoost regression model, the full-capacity and lightweight (tiny) versions, for predicting the concentrations of carbon monoxide (CO) and nitrogen dioxide (NO2). Using the…

The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning

arXiv:2511.21799v1 Announce Type: new Abstract: Real-world machine learning (ML) pipelines rarely produce a single model; instead, they produce a Rashomon set of many near-optimal ones. We show that this multiplicity reshapes key aspects of trustworthiness. At the individual-model level, sparse…

Multiclass threshold-based classification and model evaluation

arXiv:2511.21794v1 Announce Type: new Abstract: In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of softmax outputs with a geometric one on the multidimensional…