Petals

Run 100B+ language models at home, BitTorrent‑style

  • Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning.
  • Inference runs at ≈ 1 sec per step (token) — 10x faster than possible with offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec.
  • Beyond classic language model APIs — you can employ any fine-tuning and sampling methods by executing custom paths through the model or accessing its hidden states. You get the comforts of an API with the flexibility of PyTorch.

Join our Discord or subscribe via email
to follow Petals development:

We sent you an email to confirm your address. Click it and you're in!

Featured on:

This project is a part of the BigScience research workshop.