DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training
arXiv:2509.05542v2 Announce Type: replace Abstract: Training multimodal process reward models (PRMs) is hard due to (i) distribution shift between training set and test set and (ii) quality imbalance across training data samples. While domain-level reweighting (e.g., DreamPRM) aligns training with…
