RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback
arXiv:2510.06186v1 Announce Type: cross Abstract: Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature…
