5 Comments
User's avatar
Paul's avatar

very well written, and super interesting

Piotr Mazurek's avatar

Thank you 🙏!

Kal's avatar

God’s work!! I hope you continue writing.

Piotr Mazurek's avatar

that's the plan :)

Neural Foundry's avatar

The async RL explantion with the token lag distribution is eye-opening. I never thought about how conventional RL concentrates the lag in later samples, but async RL spreads it evenly across rollouts. The multi-tenancy piece is clever too, you can batch requests from diferent custom models all using the same base. Makes sense why CoreWeave went after OpenPipe and W&B.