Archives AI News

Multiple Instance Verification

arXiv:2407.06544v2 Announce Type: replace Abstract: We explore multiple instance verification, a problem setting in which a query instance is verified against a bag of target instances with heterogeneous, unknown relevancy. We show that naive adaptations of attention-based multiple instance learning…

ModalSurv: A Multimodal Deep Survival Framework for Prostate and Bladder Cancer

arXiv:2509.05037v3 Announce Type: replace Abstract: Accurate prediction of time-to-event outcomes is a central challenge in oncology, with significant implications for treatment planning and patient management. In this work, we present ModaliSurv, a multimodal deep survival model utilising DeepHit with a…

LLM-I: LLMs are Naturally Interleaved Multimodal Creators

arXiv:2509.13642v1 Announce Type: new Abstract: We propose LLM-Interleaved (LLM-I), a flexible and dynamic framework that reframes interleaved image-text generation as a tool-use problem. LLM-I is designed to overcome the “one-tool” bottleneck of current unified models, which are limited to synthetic…

Text-to-Speech for Unseen Speakers via Low-Complexity Discrete Unit-Based Frame Selection

arXiv:2408.17432v3 Announce Type: replace-cross Abstract: Synthesizing the voices of unseen speakers remains a persisting challenge in multi-speaker text-to-speech (TTS). Existing methods model speaker characteristics through speaker conditioning during training, leading to increased model complexity and limiting reproducibility and accessibility. A…

Sequential Data Augmentation for Generative Recommendation

arXiv:2509.13648v1 Announce Type: new Abstract: Generative recommendation plays a crucial role in personalized systems, predicting users’ future interactions from their historical behavior sequences. A critical yet underexplored factor in training these models is data augmentation, the process of constructing training…

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

arXiv:2505.18614v3 Announce Type: replace-cross Abstract: Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video…