Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've never understood the obsession with token/s. I'm fine with asking a question and then going on to another task (which might be making coffee).

Even with a cloud-based LLM where the response is pretty snappy, I still find that I wander off and return when I am ready to digest the entire response.



Your workflow is unusual, oftentimes there is a vigorous back and forth, or a desired output like code generation, etc where a low tk/s drastically effects ux and user productivity.

But the real kicker here is the 90s ttft, that means you ask a question and don't see anything for a full minute and a half.


You are fine with it. But may be rest of the world is not. Anyway, to compare performance/benchmark, we need metrics and this is one of the basic metric to measure.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: