Archives AI News

Approximately Unimodal Likelihood Models for Ordinal Regression

arXiv:2510.00122v1 Announce Type: new Abstract: Ordinal regression (OR, also called ordinal classification) is classification of ordinal data, in which the underlying target variable is categorical and considered to have a natural ordinal relation for the underlying explanatory variable. A key…

GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models

arXiv:2507.14725v3 Announce Type: replace Abstract: Prompt-based continual learning (CL) provides a parameter-efficient approach for adapting large language models (LLMs) across task sequences. However, most existing methods rely on task-aware inference and maintain a growing set of task-specific prompts, which introduces…

Federated Learning Meets LLMs: Feature Extraction From Heterogeneous Clients

arXiv:2510.00065v1 Announce Type: new Abstract: Federated learning (FL) enables collaborative model training without sharing raw data, making it attractive for privacy-sensitive domains such as healthcare, finance, and IoT. A major obstacle, however, is the heterogeneity of tabular data across clients,…

Linear Regression in p-adic metric spaces

arXiv:2510.00043v1 Announce Type: new Abstract: Many real-world machine learning problems involve inherently hierarchical data, yet traditional approaches rely on Euclidean metrics that fail to capture the discrete, branching nature of hierarchical relationships. We present a theoretical foundation for machine learning…

Krony-PT: GPT2 compressed with Kronecker Products

arXiv:2412.12351v2 Announce Type: replace Abstract: We introduce Krony-PT, a compression technique for GPT-2 based on Kronecker products. We specifically target the feed-forward weights of each transformer block, and systematically compress the feed-forward layer matrices to various degrees. We introduce a…

BigBang-Proton Technical Report: Next-Word-Prediction is Scientific Multitask Learner

arXiv:2510.00129v1 Announce Type: new Abstract: We introduce BigBang-Proton, a unified sequence-based architecture for auto-regressive language modeling pretrained on cross-scale, cross-structure, cross-discipline real-world scientific tasks to construct a scientific multi-task learner. BigBang-Proton incorporates three fundamental innovations compared to mainstream general-purpose LLMs:…

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

arXiv:2505.20076v3 Announce Type: replace Abstract: Post-hoc interpretability methods typically attribute a model’s behavior to its components, data, or training trajectory in isolation. This leads to explanations that lack a unified view and may miss key interactions. While combining existing methods…