8 Comments
User's avatar
Sven Meyer's avatar

Wow, can't remember when I read such detailed write up. Are you doing this for work or was hat your Master's degree paper ?

Should I want to build an inference provider business, it would be all in here ! However, too competitive and expensive to launch, but still great insight.

However, I got some real value out of it as I now understand why why I won't get far with my 8GB GPU in my laptop as the KV store for a 131,072 context window would be itself need 40GB (nice diagram !)

Expand full comment
Piotr Mazurek's avatar

Thanks for the kind words; this is mostly some thoughts gathered over a few weekends.

You probably still lack a few key facts to run a competitive inference operation, but brace yourself for the future releases :)

Expand full comment
Maxx Yung's avatar

hidden gold mine of information!

Expand full comment
gilfoyle's avatar

Seriously, this post and "The MoE Inference Economics" should be a chapter in a textbook.

Expand full comment
Romker's avatar

Impressive article, thank you!

Could you please publish all parameters you used for Llama 8b numbers in Figure 22? Similarly to Fig 1 for Llama 70b. Thank you!

Expand full comment
Piotr Mazurek's avatar

https://github.com/tugot17/tokenomics-from-first-principles

you can just model this youself; we open-source the code

Expand full comment
225786's avatar

Hello l want to help every one my name is NAING SOE

Expand full comment
Max Hager's avatar

impressive!

Expand full comment