vLLM

A high-throughput LLM inference engine based on PagedAttention that significantly improves GPU utilization.

InferenceGPU

Visit vLLM website →

Category: Open Source
Official URL: https://github.com/vllm-project/vllm
Last updated: Wed Apr 08
Tags: Inference · GPU