The main product LLM companies offer these days is access to their models via an API, and the key question that will determine the profitability they can enjoy is the inference cost structure.
Wow, can't remember when I read such detailed write up. Are you doing this for work or was hat your Master's degree paper ?
Should I want to build an inference provider business, it would be all in here ! However, too competitive and expensive to launch, but still great insight.
However, I got some real value out of it as I now understand why why I won't get far with my 8GB GPU in my laptop as the KV store for a 131,072 context window would be itself need 40GB (nice diagram !)
Wow, can't remember when I read such detailed write up. Are you doing this for work or was hat your Master's degree paper ?
Should I want to build an inference provider business, it would be all in here ! However, too competitive and expensive to launch, but still great insight.
However, I got some real value out of it as I now understand why why I won't get far with my 8GB GPU in my laptop as the KV store for a 131,072 context window would be itself need 40GB (nice diagram !)
Thanks for the kind words; this is mostly some thoughts gathered over a few weekends.
You probably still lack a few key facts to run a competitive inference operation, but brace yourself for the future releases :)
hidden gold mine of information!
Impressive article, thank you!
Could you please publish all parameters you used for Llama 8b numbers in Figure 22? Similarly to Fig 1 for Llama 70b. Thank you!
https://github.com/tugot17/tokenomics-from-first-principles
you can just model this youself; we open-source the code
Hello l want to help every one my name is NAING SOE
impressive!