4. RLVR

介绍可验证奖励强化学习（Reinforcement Learning with Verifiable Rewards, RLVR）的基本原理及其在大规模语言模型训练中的应用方法

Last updated 46 minutes ago

Was this helpful?