Archives AI News

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

arXiv:2412.07067v5 Announce Type: replace Abstract: The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy, and Performance (CAP), making…

November 5, 2025

RobustFSM: Submodular Maximization in Federated Setting with Malicious Clients

arXiv:2511.02029v1 Announce Type: new Abstract: Submodular maximization is an optimization problem benefiting many machine learning applications, where we seek a small subset best representing an extremely large dataset. We focus on the federated setting where the data are locally owned…

November 5, 2025

Training Language Models to Reason Efficiently

arXiv:2502.04463v4 Announce Type: replace Abstract: Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly in…

November 5, 2025

Predicting Microbial Interactions Using Graph Neural Networks

arXiv:2511.02038v1 Announce Type: new Abstract: Predicting interspecies interactions is a key challenge in microbial ecology, as these interactions are critical to determining the structure and activity of microbial communities. In this work, we used data on monoculture growth capabilities, interactions…

November 5, 2025

Noise-based reward-modulated learning

arXiv:2503.23972v3 Announce Type: replace Abstract: The pursuit of energy-efficient and adaptive artificial intelligence (AI) has positioned neuromorphic computing as a promising alternative to conventional computing. However, achieving learning on these platforms requires techniques that prioritize local information while enabling effective…

November 5, 2025

Quantum-Enhanced Generative Models for Rare Event Prediction

arXiv:2511.02042v1 Announce Type: new Abstract: Rare events such as financial crashes, climate extremes, and biological anomalies are notoriously difficult to model due to their scarcity and heavy-tailed distributions. Classical deep generative models often struggle to capture these rare occurrences, either…

November 5, 2025

Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants

arXiv:2511.02043v1 Announce Type: new Abstract: Bad charactors when submitting to arXiv: Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion…

November 5, 2025

Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model

arXiv:2506.05801v2 Announce Type: replace Abstract: A phenomenon known as ”Neural Collapse (NC)” in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that…

November 5, 2025

Learning to Steer: Input-dependent Steering for Multimodal LLMs

arXiv:2508.12815v2 Announce Type: replace-cross Abstract: Steering has emerged as a practical approach to enable post-hoc guidance of LLMs towards enforcing a specific behavior. However, it remains largely underexplored for multimodal LLMs (MLLMs); furthermore, existing steering techniques, such as mean steering,…

November 5, 2025

Investigating the Robustness of Knowledge Tracing Models in the Presence of Student Concept Drift

arXiv:2511.00704v2 Announce Type: replace Abstract: Knowledge Tracing (KT) has been an established problem in the educational data mining field for decades, and it is commonly assumed that the underlying learning process being modeled remains static. Given the ever-changing landscape of…

November 5, 2025