I personally find it "sort of" funny how nvidia is caught between what they need for gamers and what they need for IA, and how they're trying to pull all the tricks to not have their gamer cards eat up their "AI enthousiast" cards sales.
Absurd pricing, ridiculous vram offering, I'm sure they're trying very hard to find a way to stop AI like SD or LLMs to run on their gamer cards at this point.
It's reached a point where not only has ATI/AMD essentially caught up to them in rasterization, but they're frankly a better offer at every price point against the 4XXX generation for pure gaming, with only DLSS and brand recognition keeping nvidia ahead.
It seems super weird to say that given who they are and how many times they tried, but Intel is my hope right now. Their arc card where very good for a first salvo, they do great stride in drivers and modern tech (XeSS beating FSR on its first try for exemple), and they're really so scared of behing left behind by arm + cuda card that they just might take it seriously this time instead of killing it when it starts working.
I want to be hopeful, but these corporations will never do anything for the consumer. At their core they think in terms of squeezing the customer as much as they can. As Steve Jobs said, when the sales people run the company don't expect a good product.
A wonderful thing would be a GPU with open source drivers and a standartized API. AMD could do that and it would kill nVidia marketshare in a few years, but they don't do that because they want the server gpu money at a very inflated margin. Maybe Intel will do it but that depends who runs the company. With open source often corporations publish a feture stripped solution and keep the full drivers closed source. Sad reality we live in.
I've got my fingers crossed on them cramming a lot of vRAM in there. They were putting 16gb in the Xeon Phi 7110P when 8GB was acceptable for system RAM in gaming machines. If they dropped a 64GB+ prosumer card, I'd be so happy. Even if it was something like 32GB of super-fast memory and another 64-128 of slightly slower-but-unified memory, I'd be ecstatic. Being able to fine-tune a 175B LLM locally would be incredible.
AMD is still lagging on software though. I wish they would get their act together but their drivers are just not anywhere near as stable or compatible as Nvidia's. And then there's Intel, and it makes you appreciate the state of AMDs...
For pure gaming on windows I would disagree, as of the 7xxx cards. And I'm one who was on team ATI for a few generations before moving to nvidia because of terrible drivers.
The reason is because they want to sell entre/medium business workloads cards at 1/2k€, and gamers cards at 4-700€. Right now, it's burning them on both side, and the only "fix" they found was to give just enough vram on their card to not be useless.
That's why you have an 8 GB 4060 Ti then a 16 GB one then the 4070 Ti with 12 GB ... It's a mess.
Because people who do AI are willing to pay several k€ per card, gamers are hitting a wall before 1k€. It's the same cards, only the vram change, and the vram is "cheap" to add, comparatively of the total price.
So now you have gamers who are pissed off at their 4060 Ti or 4070 Ti barely having enough vram for 1440p at modern quality of gaming, and AI enthousiast wondering why buy they "pro" card when a 4090 is not even 2k€.
essentially, they're trying to virtually segment a market that is not segmented, to take advantage of the much higher buying power in one of the segment.
The 4090 wins in every benchmark for 1/3rd the price. Why would anybody buy this card? Is 8 GB more VRAM and lower power consumption really worth that much when the performance is so lackluster?
You know at first I was thinking the same thing, but after having 2x 3090's raise the temperature of my bedroom to 86 degrees in the middle of the night last week fine-tuning an LLM, I could see the draw of 64GB (for 2) with 400 watts total less heat than the 4090's in my work space.
They claim PCIe4.0 makes it irrelevant, but that doesn't really make sense and it's most likely the case they want to charge a fortune for their high-memory options.
It is if you can pool the memory. It's easier than having to do the split of the models in software (though that's a somewhat solved problem) and from what I know allows higher GPU utilization on both cards when they don't have to wait for information to pass back and forth.
it is barely relevant to big players, but is extremely valuable for small players: distributing your workflow to multiple gpus manually is not that simple thing to do and there are a lot of much more interesting/important problem to solve than shoving your model to a gpu.
There are a lot of ways to correct people.
While I agree it's clearly not adding a lot to the conversation, the forum is async so it's not really disrupting much, as the comment goes down the comment list.
I'm a non native speaker and surely make a lot of mistakes, and I would be OK with such a correction
I think the right way is, don't. This kind of "mistake" can not really result in any misunderstanding so there is not really any benefit gained from a correction. Correcting serves only to disrupt the flow of the conversation.
This is a tech forum for discussing technology. This isn't the place for that, and honestly I'd prefer not to have to sift through a bunch of pedantic grammar nitpicking on here.
I'm a native English speaker. I don't personally like corrections like this, and I would never criticise anyone for using the 'wrong' term. There is too much convention in English which isn't useful because it bleeds into pedantry and I think this is such a case. Using either word gets the job done cleanly, and at the bottom that's all that counts.
Well I was about to say correction but I think I got caught by a false cognate so it was indeed a critique but you weren't criticizing the previous post if my understanding of those words is correct.
Yes but I think a soft resistance is still necessary in a society towards language errors otherwise the language becomes too fluid to be usable and understandable. Where to draw the line between information and rudeness is the difficult part.
You don't need to jail or slap someone in the face for using less instead of fewer but you don't want gynecologist to mean oncologist and cancer to mean gonorrhea from one week to another otherwise nobody knows what we are talking about.
An odd thing, for a native English speaker to be given advice on the language by someone who is learning it, no?
In English there is both essential plumbing and pointless ornament, and I wish for that distinction to be recognised because confusing the two is damaging.
I am not the one who corrected the comment. I am only someone who said that this correction had a value and I am not telling you how you have to speak english.
My point was thatin my opinion I did not believe the correction had value. I was categorising it as a pointless ornament rather than something actually useful. But I emphasise, that's just my view.
All the "incorrect" uses of those words are fine and are a part of the language. Outside of prescriptivism, that's just the use of the language, not misuse.
Even "literally"? While my original comment was just a Stannis joke, I really think you need to rethink how useless it is to have a word that means both "not figuratively" and "figuratively" (by common usage) at the same time. If it means both, we literally have no further use for it in the English language because it modifies nothing.
Why not have the word "no" mean "yes" and "no" while we're at it?
The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.
- James Nicoll
"Literally" has meant figuratively for centuries[0]. It's a language construct known as a contronym[1,2].
Yes, English is not strictly typed, and doesn't conform to a formal spec or mathematical proof. A word can have multiple definitions. A word can have contradictory definitions. The definition of a word can change over time without needing to submit to an approval process. A word can mean something different the next town over. Dialects and creoles and slang exist in wild, flagrant abandon and disregard for your rules. And all of it is valid.
Despite all of this, people manage to be able to comprehend one another, even when using literally to not literally mean literally.
Even the user who used "less" instead of "fewer" upthread that started this. Everyone understood exactly what they meant. The two words mean literally the same thing. But some people insist on maintaining a meaningless formality and insisting upon rules that don't matter, or rather insisting that even in casual conversations, their rules must take precedence over everyone else's.
You're literally doing what you still think I was doing and are literally not realizing the hypocrisy. The "fewer" line was a Stannis Baratheon reference.
Yes it is correct. “Standard” English took one dialect and attempted it enforce it on hundreds of others. It’s authoritarian and not how language works
Prescriptivist? Nah. Mandarin blow rid nails for happy deranged elf nestle corruptive hand wither. If it seems I stopped making sense, it's because I was briefly testing where your binary labels lead. Apparently, there are things between descriptivist and prescriptivist and both of us fall within those categories.
You're making my point for me, as your gibberish is not in use so is unintelligible.
The person you replied to used perfectly common and comprehensible English, so much so that you knew exactly how to "correct" them.
So the comment was grand and your prescriptivist response adds nothing of value. Hopefully you can see why this is true and reflect on why you felt the need to "correct" it.
>So the comment was grand and your prescriptivist response
There we go. The "response" was prescriptivist. The person posting it (me) and yourself are both somewhere in between the two. Now I think you're starting to get it.
Regardless, the original response was a just a joke referring to Stannis Baratheon (from Game of Thrones) saying it. I think some people read it that way. You didn't.
Why do they have this naming… it’s just insane. RTX 5000 Ada… do they just put letters in front or names at the back these days? So confusing and the consumer cards will also be rtx 50xx
Was about to post something questioning the "forever" part because my memory only starts to link Nvidia generations with scientist somewhere around Kepler. And that's despite having followed GPU tech a lot more in the years before. But according to Wikipedia it goes back to the days of the Riva TNT: the wiki seems undecided about Fahrenheit-ness of earlier generations, but I'd consider that close enough for "forever".
I believe the issue with Lovelace is that you may find less than PG results typing that on a search engine. Hence using Ada primarily on the marketing.
I think the complaint is more with the consumer card being 4xxx but this is 5000 both on the same architecture.
Yeah they have different naming conventions on the workstation cards;
Quadro RTX 4000
RTX A4000
RTX 4000 Ada
Unfortunately they’ve had 3 separate naming conventions in 3 successive generations. Those 4000 series cards are in the same position in the lineup for each generation.
I just went through this with our Dell rep. The generations aren't totally successive, if you count the non-Quadro RTX 4000 series, which is Ada generation but not part of the RTX 4000 Ada series.
Add to it the card variants, and there's a chance that you might still end up with the wrong part if your purchaser isn't careful.
Yes but they historically do not put the architecture name in the product name like that. They also never use the naming scheme of their consumer graphics cards on their workstation line. They've always had a different system. This card breaks both of those conventions. Workstation cards are supposed to be Quadro. But it looks like they've rebranded the line as "Nvidia RTX". I can only assume that was an intentional move to make their lineup more confusing.
NVLink is essential for training large neural networks, which NVidia now earn majority of revenue from. Their sales of more expensive GPUs will be affected if they have NVLink in cheaper GPUs.
The issue is that this has caused severe shortages. The only new card with NVLink is at the very highest end and when trying to get a quote recently, I was told there was a 13 month delay in shipping. But if I don't need NVLink, just a few months.
At this pricing level, with this amount of RAM, I suspect a lot of use cases will be with ML and GenAI. Benchmarks for these use cases would have been interesting.
it is 20-30% slower than 4090 or H100 in compute, the only improvement is slightly more RAM. This card is not for ML (on purpose) - it is for more enterprisish tasks: some advanced video streaming/rendering, virtualization, etc.
It's for some ML tasks. Just not large language models.
If you're making, say, an ML-based on-premise CCTV system and you need to run several large ResNets at the same time? And you don't want to go rack-mounted, as some sites don't have a data centre? And you want the longer lifecycle and guaranteed spare parts availability of an enterprise product line? This could be the card for you.
Admittedly it's a rip-off, but the Workstation/Quadro line always has been.
Honestly I'm not sure how healthy the workstation market is right now - with the rise of work-from-home and hybrid working, I don't see many people using huge desktops any more. And when Adobe puts a powerful generative AI feature into Photoshop, they don't expect users to upgrade to powerful GPUs - they run it in the cloud, so it works for users with puny GPUs and Adobe can get that sweet sweet recurring revenue.
Sorry what does that mean? There are many people creating content and software for commercial use cases with these cards and I've never seen a EULA anywhere preventing this.
YetAnotherNick is probably referring to the geforce eula [1] which says:
"Customer may not [...] provide commercial hosting services with the SOFTWARE. [...] The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted."
So if you were e.g. an ad agency artist using a 4090 at your desk to generate images for commercial use - that's fine, because it's not in a datacenter, and the commercial services you're providing with it aren't hosting.
I've never heard of nvidia taking any enforcement action, or defining precisely when an office becomes a datacenter - I suspect this is mostly to ensure the big cloud providers don't offer 4090s.
I can't edit my comment now, but by commercial I meant "using 4090 for commercial purpose" that is renting, not not being allowed to create commercial content using the card.
Reading my comment again, obviously it wrong and I should have been more careful with the wording.
I wonder if places like RunPod have a special agreement with NVidia, there's some way they're working around this (not installing the drivers?), or if they're just flagrantly offering it as they offer several consumer card options in what's presumably a datacenter environment.
This limitation doesn’t really apply anymore with the open kernel driver. They can’t stop you from using a GPL-licensed driver in a datacenter or any other way you want.
IMHO it's the against the EULA to rent a 4090 - and this is why the big cloud platforms don't offer any. But using a 4090 in a company's own datacenter should be fine.
Lol, what? Of course it can. Nvidia can say they don't allow it in their user agreement, but last time I checked user agreements don't form actual legal law. Nvidia can sue you for breaking the agreement(I wonder what for, exactly?) But that's about it.
The worst thing that can happen is Nvidia declining any warranty repairs on these cards as they've been used outside of the intended use. But it's definitely not a legal issue.
And EULAs in general are just not enforceable outside of the US.
To be honest, the best benchmark you can run is your own training code. Everything else is a guess.
When I tested the A6000 against the H100, there wasn’t that big of a boost from the newer card. Perhaps GPU operations weren’t the bottleneck in that case.
> To be honest, the best benchmark you can run is your own training code. Everything else is a guess.
Yes, but the point of a review with benchmarks is that it is expensive and time-consuming for a customer to acquire the hardware just to run their own benchmarks on.
Stable Diffusion and various LLMs are available pretty easily.
A simple benchmark that this version of stable diffusion/llm was used with these settings and this is how long it took to produce image/we got this many tokens/sec would be a nice comparison that you are in a good position to do with access to all the hardware.
I get what you’re saying, but I think too much complexity hides in those numbers. Some of the optimizations for LLMs (flash attention) are hardware specific.
And anyways, it’s not that expensive. You can rent the same gpu from cloud providers for a few dollars per hour. If you are serious about buying a GPU it is an extremely small cost in terms of time and money compared to the price of the gpu itself.
The alternative is trusting an online review running some training or inference code which is likely not comparable to what most people are doing.
> You can rent the same gpu from cloud providers for a few dollars per hour.
The big three cloud providers don't offer things like the RTX 5000 32GB or the 4090 do they?
AWS will happily rent me a H100, A100, V100, K80, A10G, T4, or M60. Not to mention a Trainium, Inferentia, Inferentia2, Gaudi or Qualcomm AI 100.
And don't forget to benchmark each one of those in a 1, 2, 4, and 8 GPU configuration; and with a variety of batch sizes; and with and without distributed training. Remember to work out the performance-per-dollar for on-demand, reserved, and spot instances, times three different cloud providers. Now, to compare to on-premise pricing we start by calling our air conditioning and backup generator vendors...
I don’t want to be promoting any particular cloud GPU services but you literally can rent those GPUs in single and multi GPU configurations if you need to. Even for H100 it’s only around $2/GPU/hour and comsumer grade ones like the 4090 are even cheaper of course.
I don't see why the parent comment would have meant the big 3 cloud companies. The GPU rental space is pretty commoditised, unlike for instance CDN/object storage where staying locked into a single vendor makes things simpler.
Not explicitly, but RcouF1uZ4gsC talked about "an extremely small cost in terms of time and money"
Starting a VM with a cloud provider you already know and use may be extremely fast, but adopting a new cloud provider isn't.
Even if your organisation is so unbureaucratic you can get a new provider set up without any due diligence work, every cloud provider comes with their own oddities. Will they have a wacky set of default firewall rules? Will they only offer Debian not Ubuntu, if you want an instance with the CUDA drivers already installed? Will they insist you learn what a 'provisioned iops' is before letting you start an instance? Will some GPUs only be available in certain regions? Will accounts have quota limits that vary by region, availability zone and GPU type, some of which default to zero?
This isn’t migrating your entire app to a new ecosystem, this is starting up an empty machine, cloning a repo, opening Jupyter, and running some dummy training tasks with dummy training data. It takes 5 minutes from setup to teardown. It’s not difficult.
This would end up memory-bound for LLMs (and you would better get x3 4090 for this) or by compute for diffusion models (and again 4090 is like 10% slower than H100 for 1% of the price).