Archives AI News

LSHBloom: Memory-efficient, Extreme-scale Document Deduplication

arXiv:2411.04257v3 Announce Type: replace Abstract: Contemporary large language model (LLM) training pipelines require the assembly of internet-scale databases full of text data from a variety of sources (e.g., web, academic, and publishers). Preprocessing these datasets via deduplication — detecting and…

CLEF: Clinically-Guided Contrastive Learning for Electrocardiogram Foundation Models

arXiv:2512.02180v1 Announce Type: new Abstract: The electrocardiogram (ECG) is a key diagnostic tool in cardiovascular health. Single-lead ECG recording is integrated into both clinical-grade and consumer wearables. While self-supervised pretraining of foundation models on unlabeled ECGs improves diagnostic performance, existing…

AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping

arXiv:2505.18738v2 Announce Type: replace Abstract: Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method validated across NLP and CV domains. However, LoRA faces an inherent low-rank bottleneck: narrowing its performance gap with full finetuning requires increasing the rank…

Enforcing Orderedness to Improve Feature Consistency

arXiv:2512.02194v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) have been widely used for interpretability of neural networks, but their learned features often vary across seeds and hyperparameter settings. We introduce Ordered Sparse Autoencoders (OSAE), which extend Matryoshka SAEs by (1)…

Implicit Hypergraph Neural Network

arXiv:2508.14101v2 Announce Type: replace Abstract: Hypergraphs offer a generalized framework for capturing high-order relationships between entities and have been widely applied in various domains, including healthcare, social networks, and bioinformatics. Hypergraph neural networks, which rely on message-passing between nodes over…

WhAM: Towards A Translative Model of Sperm Whale Vocalization

arXiv:2512.02206v1 Announce Type: new Abstract: Sperm whales communicate in short sequences of clicks known as codas. We present WhAM (Whale Acoustics Model), the first transformer-based model capable of generating synthetic sperm whale codas from any audio prompt. WhAM is built…