Run 100B+ language models at home, BitTorrent‑style
- Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning.
- Inference runs at ≈ 1 sec per step (token) — 10x faster than possible with offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec.
- Beyond classic language model APIs — you can employ any fine-tuning and sampling methods by executing custom paths through the model or accessing its hidden states. You get the comforts of an API with the flexibility of PyTorch.
You're on the waitlist!
We will email you when the public swarm is ready.
Join our Discord
or subscribe via email
to follow Petals development:
This project is a part of the BigScience research workshop.