Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback
arXiv:2511.10572v2 Announce Type: replace-cross Abstract: Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and ethical constraints. However, most learning-based allocation frameworks either assume…
