vLLM

A high-throughput LLM inference engine based on PagedAttention that significantly improves GPU utilization.

InferenceGPU

Visit vLLM website →

Category
Open Source
Official URL
https://github.com/vllm-project/vllm
Last updated
Wed Apr 08
Tags
Inference · GPU