Archives AI News

Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis

arXiv:2508.19831v2 Announce Type: replace-cross Abstract: Evaluating instruction-tuned Large Language Models (LLMs) in Hindi is challenging due to a lack of high-quality benchmarks, as direct translation of English datasets fails to capture crucial linguistic and cultural nuances. To address this, we…

October 16, 2025

Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check

arXiv:2510.12981v1 Announce Type: new Abstract: Current unlearning metrics for generative models evaluate success based on reference responses or classifier outputs rather than assessing the core objective: whether the unlearned model behaves indistinguishably from a model that never saw the unwanted…

October 16, 2025

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

arXiv:2510.13744v1 Announce Type: cross Abstract: Large language model (LLM)-based reasoning systems have recently achieved gold medal-level performance in the IMO 2025 competition, writing mathematical proofs where, to receive full credit, each step must be not only correct but also sufficiently…

October 16, 2025

CSI-4CAST: A Hybrid Deep Learning Model for CSI Prediction with Comprehensive Robustness and Generalization Testing

arXiv:2510.12996v1 Announce Type: new Abstract: Channel state information (CSI) prediction is a promising strategy for ensuring reliable and efficient operation of massive multiple-input multiple-output (mMIMO) systems by providing timely downlink (DL) CSI. While deep learning-based methods have advanced beyond conventional…

October 16, 2025

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law

arXiv:2408.03459v5 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated remarkable capabilities but often struggle to align with human preferences, leading to harmful or undesirable outputs. Preference learning, which trains models to distinguish between preferred and non-preferred responses based…

October 16, 2025

Max It or Miss It: Benchmarking LLM On Solving Extremal Problems

arXiv:2510.12997v1 Announce Type: new Abstract: Test-time scaling has enabled Large Language Models (LLMs) with remarkable reasoning capabilities, particularly in mathematical domains, through intermediate chain-of-thought (CoT) reasoning before generating final answers. However, the specific sources and mechanisms underlying these reasoning capabilities…

October 16, 2025

Probabilistic QoS Metric Forecasting in Delay-Tolerant Networks Using Conditional Diffusion Models on Latent Dynamics

arXiv:2504.08821v3 Announce Type: replace Abstract: Active QoS metric prediction, commonly employed in the maintenance and operation of DTN, could enhance network performance regarding latency, throughput, energy consumption, and dependability. Naturally formulated as a multivariate time series forecasting problem, it attracts…

October 16, 2025

AMORE: Adaptive Multi-Output Operator Network for Stiff Chemical Kinetics

arXiv:2510.12999v1 Announce Type: new Abstract: Time integration of stiff systems is a primary source of computational cost in combustion, hypersonics, and other reactive transport systems. This stiffness can introduce time scales significantly smaller than those associated with other physical processes,…

October 16, 2025

A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders

arXiv:2506.16096v2 Announce Type: replace Abstract: Recent developed graph-based methods for diagnosing brain disorders using functional connectivity highly rely on predefined brain atlases, but overlook the rich information embedded within atlases and the confounding effects of site and phenotype variability. To…

October 16, 2025

Escaping Local Optima in the Waddington Landscape: A Multi-Stage TRPO-PPO Approach for Single-Cell Perturbation Analysis

arXiv:2510.13018v1 Announce Type: new Abstract: Modeling cellular responses to genetic and chemical perturbations remains a central challenge in single-cell biology. Existing data-driven framework have advanced perturbation prediction through variational autoencoders, chemically conditioned autoencoders, and large-scale transformer pretraining. However, these models…

October 16, 2025