> Most production AI applications aren't running 405B models. They're running 7B... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cubefox 1 day ago \| parent \| context \| favorite \| on: Nvidia's $20B antitrust loophole > Most production AI applications aren't running 405B models. They're running 7B-70B models that need low latency and high throughput. Really? At least for LLMs, most actual usage is concentrated on huge SOTA models. 1 trillion parameters or more. And LLMs seem to be the lion's share of AI compute demand.

wmf 1 day ago [–]

OpenAI is trying to move as many requests as they can to a "smaller" model (still suspected to be ~200B).

cubefox 1 day ago | [–]

I suspect it to be >1T, just without reasoning.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact