SGLang: High-Performance LLM Inference with RadixAttention and 27.7k+ GitHub Stars
SGLang is a high-performance serving framework for large language models and multimodal models, designed to deliver low-latency and high-throughput inference across single GPUs to large distributed clusters. With 27.7k+ GitHub stars and active development, SGLang powers over 400,000 GPUs worldwide, generating trillions of tokens daily in production environments.