Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On Ubuntu 24.04 (and Debian Unstable¹), the OS-provided packages should be able to get llama.cpp running on ROCm on just about any discrete AMD GPU from Vega onwards²³⁴. No docker or HSA_OVERRIDE_GFX_VERSION required. The performance might not be ideal in every case⁵, but I've tested a wide variety of cards:

    # install dependencies
    sudo apt -y update
    sudo apt -y upgrade
    sudo apt -y install git wget hipcc libhipblas-dev librocblas-dev cmake build-essential

    # ensure you have permissions by adding yourself to the video and render groups
    sudo usermod -aG video,render $USER
    # log out and then log back in to apply the group changes
    # you can run `rocminfo` and look for your GPU in the output to check everything is working thus far

    # download a model, build llama.cpp, and run it
    wget https://huggingface.co/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/resolve/main/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf?download=true -O dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
    git checkout b3267
    HIPCXX=clang-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON -DCMAKE_HIP_ARCHITECTURES="gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102" -DCMAKE_BUILD_TYPE=Release
    make -j16 -C build
    build/bin/llama-cli -ngl 32 --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -m ../dolphin-2.2.1-mistral-7b.Q5_K_M.gguf --prompt "Once upon a time"
I'd suggest RDNA 3, MI200 and MI300 users should probably use the AMD-provided ROCm packages for improved performance. Users that need PyTorch should also use the AMD-provided ROCm packages, as PyTorch has some dependencies that are not available from the system packages. Still, you can't beat the ease of installation or the compatibility with older hardware provided by the OS packages.

¹ https://lists.debian.org/debian-ai/2024/07/msg00002.html ² Not including MI300 because that released too close to the Ubuntu 24.04 launch. ³ Pre-Vega architectures might work, but have known bugs for some applications. ⁴ Vega and RDNA 2 APUs might work with Linux 6.10+ installed. I'm in the process of testing that. ⁵ The version of rocBLAS that comes with Ubuntu 24.04 is a bit old and therefore lacks some optimizations for RDNA 3. It's also missing some MI200 optimizations.



I was able to install (AMD provided) ROCm and Ollama on Ubuntu 22.04.5 with an RX 7900 XTX with no real problems to speak of, and I can execute LLMs using Ollama on ROCm just fine. Take that FWIW.


are there AMD cards with more than 24GB VRAM on the market right now at consumer friendly prices?


The Radeon Pro W6800, W7800 or W7900 would be the standard answer. A hacker-spirited alternative would be to purchase a used MI50, MI60 or MI100 and 3d print a fan adapter. There are versions of all of those cards with 32GB of VRAM and they can be found on ebay for between 350 USD and 1200 USD. Plus twenty bucks for a fan adapter and a fan.

Those old gfx906 or gfx908 cards are more competitive for fp64 than for low-precision AI workloads, but they have the memory and the price is right. I'm not sure I would recommend the hacker approach to the average user, but it is what I've done for some of the continuous integration servers I host for the Debian project.


Amazon prices:

$3,600 - 61 TFLOPS - AMD Radeon Pro W7900

$4,200 - 38.7 TFLOPS - NVidia RTX A6000 48GB Ampere

$7,200 - 91.1 TFLOPS - NVidia RTX A6000 48GB Ada


It sort of depends on how you define "consumer friendly prices". AFAIK, in the $1000 - "slightly over or under $1000" range, 24GB is all you can get. But there are Radeon Pro boards with 32GB or 48GB of RAM for various prices between around $2000 to about $3500. So not "cheap" but possibly within reach for a serious hobbyist who doesn't mind spending a little bit more.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: