Ruida Docs
search
⌘Ctrlk
GitBook Assistant
GitBook Assistant
Working...Thinking...
GitBook Assistant
Good morning

I'm here to help you with the docs.

⌘Ctrli
AI Based on your contextquestion-circle
Ruida Docs
  • a. 基础知识
  • b. PyTorch
  • c. LLM 基础
  • d. 分布式训练
  • e. Pre-Training
  • f. Post-Training
  • g. LLM Inference
    • 1. LLM Inference
    • 2. Quantization
    • 3. FlashAttention
    • 4. KV Cache
    • 5. Distillation
    • 6. Test Compute Time
    • 7. vLLM
    • 8. Text Generation Inference (TGI)
    • 9. TensorRT-LLM
  • h. Agent
  • i. 主流大模型技术
  • j. 其他
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. g. LLM Inference

7. vLLM

介绍 vLLM 库在 LLM 推理中的应用

  • 参考资料

    • vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blogarrow-up-right

    • Efficient Memory Management for Large Language Model Serving with PagedAttentionarrow-up-right

hashtag
什么是 vLLM

hashtag
PagedAttention

Previous6. Test Compute Timechevron-leftNext8. Text Generation Inference (TGI)chevron-right

Last updated 46 minutes ago

Was this helpful?

Created By Ruida

Was this helpful?