The main product LLM companies offer these days is access to their models via an API, and the key question that will determine the profitability they can enjoy is the inference cost structure.
Wow, can't remember when I read such detailed write up. Are you doing this for work or was hat your Master's degree paper ?
Should I want to build an inference provider business, it would be all in here ! However, too competitive and expensive to launch, but still great insight.
However, I got some real value out of it as I now understand why why I won't get far with my 8GB GPU in my laptop as the KV store for a 131,072 context window would be itself need 40GB (nice diagram !)
Wow, can't remember when I read such detailed write up. Are you doing this for work or was hat your Master's degree paper ?
Should I want to build an inference provider business, it would be all in here ! However, too competitive and expensive to launch, but still great insight.
However, I got some real value out of it as I now understand why why I won't get far with my 8GB GPU in my laptop as the KV store for a 131,072 context window would be itself need 40GB (nice diagram !)
Thanks for the kind words; this is mostly some thoughts gathered over a few weekends.
You probably still lack a few key facts to run a competitive inference operation, but brace yourself for the future releases :)
hidden gold mine of information!
Hello l want to help every one my name is NAING SOE
impressive!