7 Comments
User's avatar
Sven Meyer's avatar

Wow, can't remember when I read such detailed write up. Are you doing this for work or was hat your Master's degree paper ?

Should I want to build an inference provider business, it would be all in here ! However, too competitive and expensive to launch, but still great insight.

However, I got some real value out of it as I now understand why why I won't get far with my 8GB GPU in my laptop as the KV store for a 131,072 context window would be itself need 40GB (nice diagram !)

Expand full comment
Piotr Mazurek's avatar

Thanks for the kind words; this is mostly some thoughts gathered over a few weekends.

You probably still lack a few key facts to run a competitive inference operation, but brace yourself for the future releases :)

Expand full comment
Maxx Yung's avatar

hidden gold mine of information!

Expand full comment
Romker's avatar

Impressive article, thank you!

Could you please publish all parameters you used for Llama 8b numbers in Figure 22? Similarly to Fig 1 for Llama 70b. Thank you!

Expand full comment
Piotr Mazurek's avatar

https://github.com/tugot17/tokenomics-from-first-principles

you can just model this youself; we open-source the code

Expand full comment
225786's avatar

Hello l want to help every one my name is NAING SOE

Expand full comment
Max Hager's avatar

impressive!

Expand full comment