Archives AI News

MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling

arXiv:2511.00056v1 Announce Type: new Abstract: The substantial memory demands of pre-training and fine-tuning large language models (LLMs) require memory-efficient optimization algorithms. One promising approach is layer-wise optimization, which treats each transformer block as a single layer and optimizes it sequentially,…

November 4, 2025

Localist LLMs — A Mathematical Framework for Dynamic Locality Control

arXiv:2510.09338v2 Announce Type: replace-cross Abstract: We present a novel framework for training large language models with continuously adjustable internal representations that span the full spectrum from localist (interpretable, rule-based) to distributed (generalizable, efficient) encodings. The key innovation is a locality…

November 4, 2025

Automatically Finding Rule-Based Neurons in OthelloGPT

arXiv:2511.00059v1 Announce Type: new Abstract: OthelloGPT, a transformer trained to predict valid moves in Othello, provides an ideal testbed for interpretability research. The model is complex enough to exhibit rich computational patterns, yet grounded in rule-based game logic that enables…

November 4, 2025

Extremal Contours: Gradient-driven contours for compact visual attribution

arXiv:2511.01411v1 Announce Type: cross Abstract: Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, we present a training-free explanation method that replaces dense masks…

November 4, 2025

EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics

arXiv:2511.00064v1 Announce Type: new Abstract: Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can be highly sensitive. I introduce EVINGCA (Evolving Variance-Informed Nonparametric Graph Construction Algorithm),…

November 4, 2025

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

arXiv:2511.01724v1 Announce Type: cross Abstract: Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic…

November 4, 2025

Lightning-prediction tool could help protect the planes of the future

The new approach maps aircraft sections most vulnerable to lightning, including on planes with experimental designs.

November 4, 2025

Variational Inference in Location-Scale Families: Exact Recovery of the Mean and Correlation Matrix

arXiv:2410.11067v3 Announce Type: replace-cross Abstract: Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$. This is typically done by minimizing the exclusive Kullback-Leibler divergence, $text{KL}(q||p)$. In practice, $Q$…

November 4, 2025

New Encoders for German Trained from Scratch: Comparing ModernGBERT with Converted LLM2Vec Models

arXiv:2505.13136v2 Announce Type: replace-cross Abstract: Encoders remain essential for efficient German NLP and NLU scenarios despite the rise of decoder-only LLMs. This work studies two routes to high-quality German encoders under identical data and training constraints: 1) training from scratch…

November 4, 2025

Scientific Machine Learning with Kolmogorov-Arnold Networks

arXiv:2507.22959v2 Announce Type: replace Abstract: The field of scientific machine learning, which originally utilized multilayer perceptrons (MLPs), is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding. This shift is driven by the limitations of MLPs, including poor interpretability, fixed activation…

November 4, 2025