Archives AI News

Exploring the Performance of ML/DL Architectures on the MNIST-1D Dataset

arXiv:2602.13348v1 Announce Type: new Abstract: Small datasets like MNIST have historically been instrumental in advancing machine learning research by providing a controlled environment for rapid experimentation and model evaluation. However, their simplicity often limits their utility for distinguishing between advanced…

ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees

arXiv:2602.07047v2 Announce Type: replace-cross Abstract: Pixel-level feature attributions are an important tool in eXplainable AI for Computer Vision (XCV), providing visual insights into how image features influence model predictions. The Owen formula for hierarchical Shapley values has been widely used…

Finding Highly Interpretable Prompt-Specific Circuits in Language Models

arXiv:2602.13483v1 Announce Type: new Abstract: Understanding the internal circuits that language models use to solve tasks remains a central challenge in mechanistic interpretability. Most prior work identifies circuits at the task level by averaging across many prompts, implicitly assuming a…

Solving Inverse Parametrized Problems via Finite Elements and Extreme Learning Networks

arXiv:2602.14757v1 Announce Type: cross Abstract: We develop an interpolation-based reduced-order modeling framework for parameter-dependent partial differential equations arising in control, inverse problems, and uncertainty quantification. The solution is discretized in the physical domain using finite element methods, while the dependence…

Online Posterior Sampling with a Diffusion Prior

arXiv:2410.03919v2 Announce Type: replace Abstract: Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we…

Preventing Rank Collapse in Federated Low-Rank Adaptation with Client Heterogeneity

arXiv:2602.13486v1 Announce Type: new Abstract: Federated low-rank adaptation (FedLoRA) has facilitated communication-efficient and privacy-preserving fine-tuning of foundation models for downstream tasks. In practical federated learning scenarios, client heterogeneity in system resources and data distributions motivates heterogeneous LoRA ranks across clients.…

Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

arXiv:2506.13593v5 Announce Type: replace Abstract: We introduce time-to-unsafe-sampling, a novel safety measure for generative models, defined as the number of generations required by a large language model (LLM) to trigger an unsafe (e.g., toxic) response. While providing a new dimension…

TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

arXiv:2602.13498v1 Announce Type: new Abstract: Muon-style optimizers leverage Newton-Schulz (NS) iterations to orthogonalize updates, yielding update geometries that often outperform Adam-series methods. However, this orthogonalization discards magnitude information, rendering training sensitive to step-size hyperparameters and vulnerable to high-energy bursts. To…