- Published On
Learn how to set up a low-latency, streaming Text-to-Speech system using an Orpheus-style model, vLLM for efficient inference, and SNAC for audio decoding. This setup can run on a single 24GB GPU, or distribute the Orpheus model (vLLM/SGLang) and decoder (FastAPI/SNAC) across multiple GPUs.