4. RLVR

介绍可验证奖励强化学习(Reinforcement Learning with Verifiable Rewards, RLVR)的基本原理及其在大规模语言模型训练中的应用方法

Last updated

Was this helpful?