The async RL explantion with the token lag distribution is eye-opening. I never thought about how conventional RL concentrates the lag in later samples, but async RL spreads it evenly across rollouts. The multi-tenancy piece is clever too, you can batch requests from diferent custom models all using the same base. Makes sense why CoreWeave went after OpenPipe and W&B.
very well written, and super interesting
Thank you 🙏!
God’s work!! I hope you continue writing.
that's the plan :)
The async RL explantion with the token lag distribution is eye-opening. I never thought about how conventional RL concentrates the lag in later samples, but async RL spreads it evenly across rollouts. The multi-tenancy piece is clever too, you can batch requests from diferent custom models all using the same base. Makes sense why CoreWeave went after OpenPipe and W&B.