Archives AI News

Evaluating multiple models using labeled and unlabeled data

arXiv:2501.11866v3 Announce Type: replace Abstract: It remains difficult to evaluate machine learning classifiers in the absence of a large, labeled dataset. While labeled data can be prohibitively expensive or impossible to obtain, unlabeled data is plentiful. Here, we introduce Semi-Supervised…

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

arXiv:2505.16690v3 Announce Type: replace Abstract: Post-training of large language models is essential for adapting pre-trained language models (PLMs) to align with human preferences and downstream tasks. While PLMs typically exhibit well-calibrated confidence, post-trained language models (PoLMs) often suffer from over-confidence,…

Wavefront Coding for Accommodation-Invariant Near-Eye Displays

arXiv:2510.12778v1 Announce Type: cross Abstract: We present a new computational near-eye display method that addresses the vergence-accommodation conflict problem in stereoscopic displays through accommodation-invariance. Our system integrates a refractive lens eyepiece with a novel wavefront coding diffractive optical element, operating…

Balancing Synthetic Data and Replay for Enhancing Task-Specific Capabilities

arXiv:2510.11842v1 Announce Type: new Abstract: Adapting language models to new tasks through continued pretraining faces a fundamental trade-off: models must learn new capabilities while avoiding catastrophic forgetting of existing knowledge. While prior work has studied synthetic data generation techniques, the…

Evaluating Open-Source Vision-Language Models for Multimodal Sarcasm Detection

arXiv:2510.11852v1 Announce Type: new Abstract: Recent advances in open-source vision-language models (VLMs) offer new opportunities for understanding complex and subjective multimodal phenomena such as sarcasm. In this work, we evaluate seven state-of-the-art VLMs – BLIP2, InstructBLIP, OpenFlamingo, LLaVA, PaliGemma, Gemma3,…

Don’t Walk the Line: Boundary Guidance for Filtered Generation

arXiv:2510.11834v1 Announce Type: new Abstract: Generative models are increasingly paired with safety classifiers that filter harmful or undesirable outputs. A common strategy is to fine-tune the generator to reduce the probability of being filtered, but this can be suboptimal: it…

WaveletDiff: Multilevel Wavelet Diffusion For Time Series Generation

arXiv:2510.11839v1 Announce Type: new Abstract: Time series are ubiquitous in many applications that involve forecasting, classification and causal inference tasks, such as healthcare, finance, audio signal processing and climate sciences. Still, large, high-quality time series datasets remain scarce. Synthetic generation…

Z0-Inf: Zeroth Order Approximation for Data Influence

arXiv:2510.11832v1 Announce Type: new Abstract: A critical aspect of analyzing and improving modern machine learning systems lies in understanding how individual training examples influence a model’s predictive behavior. Estimating this influence enables critical applications, including data selection and model debugging;…