4. RLVR
介绍可验证奖励强化学习(Reinforcement Learning with Verifiable Rewards, RLVR)的基本原理及其在大规模语言模型训练中的应用方法
Last updated
Was this helpful?
介绍可验证奖励强化学习(Reinforcement Learning with Verifiable Rewards, RLVR)的基本原理及其在大规模语言模型训练中的应用方法
Last updated
Was this helpful?
Was this helpful?