Archives AI News

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training

arXiv:2510.18830v1 Announce Type: cross Abstract: The adoption of long context windows has become a standard feature in Large Language Models (LLMs), as extended contexts significantly enhance their capacity for complex reasoning and broaden their applicability across diverse scenarios. Dynamic sparse…

October 22, 2025

Charts can be social artifacts that communicate more than just data

Researchers find that design elements of data visualizations influence viewers’ assumptions about the source of the information and its trustworthiness.

October 22, 2025

DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training

arXiv:2509.05542v2 Announce Type: replace Abstract: Training multimodal process reward models (PRMs) is hard due to (i) distribution shift between training set and test set and (ii) quality imbalance across training data samples. While domain-level reweighting (e.g., DreamPRM) aligns training with…

October 22, 2025

Enabling Automatic Differentiation with Mollified Graph Neural Operators

arXiv:2504.08277v2 Announce Type: replace Abstract: Physics-informed neural operators offer a powerful framework for learning solution operators of partial differential equations (PDEs) by combining data and physics losses. However, these physics losses rely on derivatives. Computing these derivatives remains challenging, with…

October 22, 2025

Sign-SGD is the Golden Gate between Multi-Node to Single-Node Learning: Significant Boost via Parameter-Free Optimization

arXiv:2506.03725v3 Announce Type: replace Abstract: Quite recently, large language models have made a significant breakthrough across various disciplines. However, training them is an extremely resource-intensive task, even for major players with vast computing resources. One of the methods gaining popularity…

October 22, 2025

Analyse comparative d’algorithmes de restauration en architecture d’epli’ee pour des signaux chromatographiques parcimonieux

arXiv:2510.18760v1 Announce Type: cross Abstract: Data restoration from degraded observations, of sparsity hypotheses, is an active field of study. Traditional iterative optimization methods are now complemented by deep learning techniques. The development of unfolded methods benefits from both families. We…

October 22, 2025

FedMeld: A Model-dispersal Federated Learning Framework for Space-ground Integrated Networks

arXiv:2412.17231v2 Announce Type: replace Abstract: To bridge the digital divide, the space-ground integrated networks (SGINs), which will be a key component of the six-generation (6G) mobile networks, are expected to deliver artificial intelligence (AI) services to every corner of the…

October 22, 2025

Steering Generative Models with Experimental Data for Protein Fitness Optimization

arXiv:2505.15093v2 Announce Type: replace-cross Abstract: Protein fitness optimization involves finding a protein sequence that maximizes desired quantitative properties in a combinatorially large design space of possible sequences. Recent advances in steering protein generative models (e.g., diffusion models and language models)…

October 22, 2025

A novel Information-Driven Strategy for Optimal Regression Assessment

arXiv:2510.14222v2 Announce Type: replace-cross Abstract: In Machine Learning (ML), a regression algorithm aims to minimize a loss function based on data. An assessment method in this context seeks to quantify the discrepancy between the optimal response for an input-output system…

October 22, 2025

L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts

arXiv:2510.17898v1 Announce Type: new Abstract: The Mixture of Experts (MoE) architecture enables the scaling of Large Language Models (LLMs) to trillions of parameters by activating a sparse subset of weights for each input, maintaining constant computational cost during inference. Concurrently,…

October 22, 2025