Perhaps they're doubling down, but even doubling down is not enough to say that they're serious about it since they've been so neglectful for so many years - for example, right now they explicitly say that many of AMD GPUs are not supported by ROCm; if they're not willing to put their money where their mouth is and do the legwork to ensure support for powerful cards they sold just a few years ago, how can they say you should rely on their platform?
Unless a random gamer with a random AMD GPU can go to amd.com and download pre-packaged, officially supported tools that work out of the box on their machine and after a few clicks have working GPU-accelerated pytorch (which IMHO isn't the case, but admittedly I haven't tried this year) then their "doubling down" isn't even meeting table stakes.
People argue for ROCm to support older cards because that is all they have accessible to them. AMD has lagged on getting expensive cards into the hands of end users because they've focused only on building super computers.
I predict that access to the newer cards is a more likely scenario. Right now, you can't rent a MI250 or even MI300x, but that is going to change quickly. Azure is going to have them, as well as others (I know this, cause that's what I'm building now).
The way I see it, the whole point of ROCm support is being able to service the many users who have pre-existing AMD cards and nothing else available. If someone is going to rent a GPU, I don't need to bother with adding extra features for them, because they can just rent a CUDA-capable GPU instead.
I'm considering adding ROCm support for some ML-enabled tool - no matter if it's a commercial product or an open source library - the thing I need from AMD is to ensure that the ROCm support I make will work without hassle for these end-users with random old AMD gaming cards (because these are the only users who need the tool to have ROCm support), and if ROCm upstream explicitly drops support for some cards because AMD no longer regularly test it, well, the ML tool developers aren't going to do that testing for them either; that's simply AMD intentionally refusing to do even the bare minimum (do a lot of testing for a wide variety of hardware to fix compatibility issues) that I'd expect to be table stakes for saying that "AMD is doubling down on ROCm".
> The way I see it, the whole point of ROCm support is being able to service the many users who have pre-existing AMD cards and nothing else available.
ROCm is a stack of a whole lot of stuff. I don't see a stack of software being "the whole point".
> the thing I need from AMD is to ensure that the ROCm support I make will work without hassle for these end-users with random old AMD gaming cards (because these are the only users who need the tool to have ROCm support)
From wikipedia:
"ROCm is primarily targeted at discrete professional GPUs"
They are supporting Vega onwards and are clear about the "whole point" of ROCm.
I'm talking about what would be the point for someone to add ROCm support to various pieces of software which currently require CUDA, as IMHO this is the core context not only of this thread but of the whole discussion of this article - about ROCm becoming a widely used replacement or alternative for CUDA.
> "ROCm is primarily targeted at discrete professional GPUs"
That's kind of true, and that is a big part of the problem - while AMD has this stance, ROCm won't threaten to replace or even meet CUDA, which has a much broader target; if you and/or AMD want to go in this direction, that's completely fine, that is a valuable niche - but limiting the application to that niche clearly is not "doubling down on ROCm" as a competitor for CUDA, and that disproves the TFA claim by Intel that "the entire industry is motivated to eliminate CUDA", because ROCm isn't even trying to compete with CUDA at the core niches which grant CUDA its staying power unless it goes way beyond merely targeting discrete professional GPUs.
> what would be the point for someone to add ROCm support to various pieces of software which currently require CUDA
It isn't just old cards though, CUDA is a point of centralization on a single provider during a time when access to that providers higher end cards isn't even available and that is causing people to look elsewhere.
ROCm supports CUDA through the included HIP projects...
The later will regex replace your CUDA methods with HIP methods. If it is as easy as running hipify on your codebase (or just coding to HIP apis), it certainly makes sense to do so.
> People argue for ROCm to support older cards because that is all they have accessible to them.
What they really need is to support the less expensive cards, of which the older cards are a large subset. There are a lot of people who will make contributions and fix bugs if they can actually use the thing. Some CS student at the university has to pay tuition and therefore only has an old RX570, and that isn't going to change in the next couple years, but that kind of student could fix some of the software bugs currently preventing the company from selling more expensive GPUs to large institutions. If the stack supported their hardware.
ROCm works on RX570 ("gfx803", including RX470+RX580 too).
Support was dropped upstream, but only because AMD no longer regularly test it. The code is still there and downstream distributors (like if you just apt-get install libamdhip64-5 && pip3 install torch) usually flip it enabled again.
I ran 130,000 RX470-580 cards, so I know them quite well. Those cards aren't going to do anything useful with AI/ML. That technology is just too old and things are moving too quickly. It isn't just the card, but the mobo, disks, ram, networking...
I believe strongly in "where there is a will, there is a way."
Those kids and hobbyists can't even rent time on high end AMD hardware today. I see that as one piece of the puzzle that I'm personally dedicating my time/resources to resolving.
RX570 is going to do ML faster than a typical desktop CPU. That's all you need for the person who has one to want to use it for Llama or Stable Diffusion, and then want to improve the software for the thing they're now using.
What is nonsense is that you think that AMD should dedicate limited resources to supporting a 6 year old card with only 4-8gb of ram (the ones I ran had 8).
I didn't say they are bad cards... they are just outdated at this point.
If you really want to put your words to action... let me know. I'll put you in touch with someone to buy 130,000 of these cards, and you can sell them to every college kid out there... until then, I wouldn't hold AMD over the coals for not wanting to put effort into something like that when they are already lagging behind on their AI efforts as it is. I'd personally rather see them catch up a bit first.
8GB is enough for Stable Diffusion or Llama 13B q4. They're outdated, but non-outdated GPUs are still expensive, so they're all many people can afford.
> I'll put you in touch with someone to buy 130,000 of these cards, and you can sell them to every college kid out there...
Just sell them on eBay? They still go for $50-$100 each, so you're sitting on several million dollars worth of GPUs.
> I'd personally rather see them catch up a bit first.
Growing the community is how you catch up. That doesn't happen if people can't afford the only GPUs you support.
> That doesn't happen if people can't afford the only GPUs you support.
On this part, we are going to have to agree to disagree. I feel like being able to at least affordably rent time on the high end GPUs is another alternative to buying them. As I mentioned above, that is something I'm actively working on.
> I feel like being able to at least affordably rent time on the high end GPUs is another alternative to buying them.
There are two problems with this.
The first is high demand. GPU time on a lot of cloud providers is sold out.
The second is that this costs money at all, vs. using the GPU you already have. "Need for credit card" is a barrier to hobbyists and you want hobbyists, because they become contributors or get introduced to the technology and then go on to buy one of your more expensive GPUs.
You want the barrier to adoption to be level with the ground.
> The first is high demand. GPU time on a lot of cloud providers is sold out.
Something I'm trying to help with. =) Of course, I'm sure I'll be sold out too, or at least I hope so, cause that means buying more GPUs! But at least I'm actively putting my own time/energy toward this goal.
> The second is that this costs money at all, vs. using the GPU you already have.
As much as I'd love to believe in some utopia that there is a world where every single GPU can be used for science, I don't think we are ever going to get there. AMD, while large, isn't an infinite resource company. We're talking about a speciality level of engineering too.
> You want the barrier to adoption to be level with the ground.
100% agreed, it is a good goal, but that's a much larger problem than just AMD supporting their 6-7 year old cards.
Thing is, for there to be a realistic alternative to CUDA, something needs to become "the only game in town" because people definitely won't add support for ROCm and Mesa and RustiCL and something else; getting support for one non-CUDA thing already is proving to be too difficult, so if the alternatives are fragmented, that makes the situation even worse.
RustiCL is just an OpenCL implementation. You can't have one-size-fits-all because hardware-specific stuff varies a lot across generations. (Which is also why you have newer versions of e.g. Rocm dropping support for older hardware.) The best you can do is have baseline support + extensions, which is the Vulkan approach.
Well yeah. Before I go renting a super GPU in the cloud, I'd like to get my feet wet with the 5 year old but reasonably well specced AMD GPU (Vega 48) in my iMac...but I can't. It's more rational for me to get an fancy 2021 GPU or a Jetson and stick it in an enclosure or build a Linux box around it. At least I know CUDA is a mature ecosystem and is going to be around for a while, so whatever time I invest in it is likely to pay for itself.
I get your point about AMD not wanting to spend money on supporting old hardware, but how do they expect to build a market without a fan base?
> I get your point about AMD not wanting to spend money on supporting old hardware, but how do they expect to build a market without a fan base?
Look, I get it. You're right. They do need to work on building their market and they really screwed the pooch on the AI boat. The developer flywheel is hugely important and they missed out on that. That said, we can't expect them to go back in time, but we can keep moving forward. Having enough people making noise about wanting to play with their hardware is certainly a step in the right direction.
Unless a random gamer with a random AMD GPU can go to amd.com and download pre-packaged, officially supported tools that work out of the box on their machine and after a few clicks have working GPU-accelerated pytorch (which IMHO isn't the case, but admittedly I haven't tried this year) then their "doubling down" isn't even meeting table stakes.