TurboDiffusion: Bringing Video Diffusion into the Seconds Era
Built by Tsinghua University's ML group, TurboDiffusion combines attention acceleration, step distillation, and low-bit quantization to achieve 100-200x end-to-end speedups on a single RTX 5090 while preserving video quality.
See the Difference
Wan2.2 Original
~27 minutes generation time
Wan2.2 + TurboDiffusion
~9 seconds generation time
Project Overview
TurboDiffusion is an acceleration framework for video generation models. It targets practical deployment with support for both text-to-video and image-to-video pipelines based on Wan2.1 and Wan2.2 families, and has already landed in multiple production platforms.
Research Team
Authors span Tsinghua University, UC Berkeley, and industry partners led by Jun Zhu.
License
Apache License 2.0, friendly to commercial and non-commercial usage.
Community Impact
Often called the "DeepSeek Moment" for video diffusion in developer communities.
Core Technology Stack
TurboDiffusion delivers full-pipeline acceleration through graph compression, attention speedups, and quantization.
Attention Acceleration
SageAttention2++ provides low-bit attention acceleration, with SLA sparse attention layered on top.
Step Distillation
rCM distillation enables high-quality video in just 3-4 steps.
W8A8 Quantization
8-bit weights and activations boost linear layer throughput and reduce memory.
SLA Sparse Attention
Provides an additional 17-20x sparse speedup, orthogonal to low-bit acceleration.
Performance Benchmarks
Single RTX 5090 tests show 100-200x acceleration for 5-8 second high-quality video outputs.
| Model | Original Time | TurboDiffusion Time | Speedup |
|---|---|---|---|
| Wan2.1-T2V-1.3B-480P | ~166s | 1.8s | ~92x |
| Wan2.1-T2V-14B | ~1635s | 9.4s | ~174x |
| Vidu 1080p 8s | ~900s | ~8s | ~112x |
Supported Model Matrix
TurboWan2.1-T2V-1.3B-480P
Optimized for lightweight, real-time generation workflows.
TurboWan2.1-T2V-14B-720P
High-fidelity outputs for commercial-grade video generation.
TurboWan2.2-I2V-A14B-720P
Image-to-video support for storyboard-driven pipelines.
Installation & Quick Start
Recommended: Python ≥ 3.9, Torch ≥ 2.7.0. Use unquantized checkpoints for GPUs with 40GB+ VRAM.
Quick Install
conda create -n turbodiffusion python=3.12 conda activate turbodiffusion pip install turbodiffusion --no-build-isolation
Build from Source
git clone https://github.com/thu-ml/TurboDiffusion.git cd TurboDiffusion git submodule update --init --recursive pip install -e . --no-build-isolation
Ecosystem & Integrations
ComfyUI Plugin
Community-driven Comfyui_turbodiffusion plugin supports 100-200x fast video generation.
Industry Adoption
Adopted by leading teams such as Tencent, ByteDance, Alibaba, Baidu, and others.
Inference Engines
SageAttention is integrated into TensorRT and multiple accelerator platforms.
Roadmap
The team is expanding support for additional video generation paradigms, with ongoing work on parallelism optimization, vLLM-Omni integration, and broader model coverage.
Citation
@article{zhang2025turbodiffusion,
title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and
Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
journal={arXiv preprint arXiv:2512.16093},
year={2025}
}