Archives AI News

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

arXiv:2505.09558v2 Announce Type: replace-cross Abstract: End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models’ conversational performance has largely been overlooked. This is primarily due to the…

G”odel Test: Can Large Language Models Solve Easy Conjectures?

arXiv:2509.18383v1 Announce Type: new Abstract: Recent announcements from frontier AI model labs have highlighted strong results on high-school and undergraduate math competitions. Yet it remains unclear whether large language models can solve new, simple conjectures in more advanced areas of…

Generative Medical Event Models Improve with Scale

arXiv:2508.12104v2 Announce Type: replace-cross Abstract: Realizing personalized medicine at scale calls for methods that distill insights from longitudinal patient journeys, which can be viewed as a sequence of medical events. Foundation models pretrained on large-scale medical event data represent a…

Instruction-Following Evaluation in Function Calling for Large Language Models

arXiv:2509.18420v1 Announce Type: new Abstract: Function calling is a core capability of large language models, essential for AI agents. Existing benchmarks such as the Berkeley Function Calling Leaderboard (BFCL), tau^2-Bench (arXiv:2506.07982), and ACEBench (arXiv:2501.12851) evaluate argument correctness but do not…

Fine-Grained Detection of AI-Generated Text Using Sentence-Level Segmentation

arXiv:2509.17830v2 Announce Type: replace-cross Abstract: Generation of Artificial Intelligence (AI) texts in important works has become a common practice that can be used to misuse and abuse AI at various levels. Traditional AI detectors often rely on document-level classification, which…

Memory-QA: Answering Recall Questions Based on Multimodal Memories

arXiv:2509.18436v1 Announce Type: new Abstract: We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of…