Archives AI News

Rex: Reversible Solvers for Diffusion Models

arXiv:2502.08834v2 Announce Type: replace-cross Abstract: Diffusion models have quickly become the state-of-the-art for numerous generation tasks across many different applications. Encoding samples from the data distribution back into the models underlying prior distribution is an important task that arises in…

Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models

arXiv:2510.07632v1 Announce Type: new Abstract: Frontier AI models have achieved remarkable progress, yet recent studies suggest they struggle with compositional reasoning, often performing at or below random chance on established benchmarks. We revisit this problem and show that widely used…

Evaluating Evaluation Metrics — The Mirage of Hallucination Detection

arXiv:2504.18114v2 Announce Type: replace-cross Abstract: Hallucinations pose a significant obstacle to the reliability and widespread adoption of language models, yet their accurate measurement remains a persistent challenge. While many task- and domain-specific metrics have been proposed to assess faithfulness and…

Tug-of-war between idioms’ figurative and literal interpretations in LLMs

arXiv:2506.01723v4 Announce Type: replace-cross Abstract: Idioms present a unique challenge for language models due to their non-compositional figurative interpretations, which often strongly diverge from the idiom’s literal interpretation. In this paper, we employ causal tracing to systematically analyze how pretrained…

Multimodal Safety Evaluation in Generative Agent Social Simulations

arXiv:2510.07709v1 Announce Type: new Abstract: Can generative agents be trusted in multimodal environments? Despite advances in large language and vision-language models that enable agents to act autonomously and pursue goals in rich settings, their ability to reason about safety, coherence,…

Towards Methane Detection Onboard Satellites

arXiv:2509.00626v3 Announce Type: replace-cross Abstract: Methane is a potent greenhouse gas and a major driver of climate change, making its timely detection critical for effective mitigation. Machine learning (ML) deployed onboard satellites can enable rapid detection while reducing downlink costs,…