介绍数据合成(Data Synthesis)的基本原理及其在大规模语言模型训练中的应用方法
参考资料
Google DeepMind 合成数据经验:Best Practices and Lessons Learned on Synthetic Data for Language Modelsarrow-up-right
persona 驱动数据合成方法:Scaling Synthetic Data Creation with 1,000,000,000 Personasarrow-up-right
指令数据集生成方法:Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothingarrow-up-right
Last updated 46 minutes ago
Was this helpful?