7. vLLM

介绍 vLLM 库在 LLM 推理中的应用

参考资料
- vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blog
- Efficient Memory Management for Large Language Model Serving with PagedAttention

什么是 vLLM

PagedAttention

Previous6. Test Compute Time Next8. Text Generation Inference (TGI)

Last updated 46 minutes ago

Was this helpful?