Even worse on Linux. Even worse on more exotic distros like Bazzite where I still can’t get koboldcpp to run, which was already kind of a hassle on my previous distro.
Yeah I hear ya. How do you happen to be running it? I use NixOS and its a challenge there for me but I found atleast some success using docker since the dependencies are so out of control for AI at the moment.
Also give Ollama-rocm + Open Webgui a shot as an alternative to koboldcpp if you cant get atleast some text generation to work because that is the only thing I have got working with rocm.
At the moment I just don’t. I got kobolcpp to run through distrobox / boxbuddy but I can’t get it to compile with rocm, so I can only use CPU generation, which is abysmally slow. Might go back to NovelAI when they release their new model if I can’t find a solution.
What card do you use? I have a 6700XT and getting anything with ROCM running for me requires that I pass the HSA_OVERRIDE_GFX_VERSION=10.3.0 environmental variable to the related process, otherwise it just refuses to run properly. I wonder if it might be something similar for you too?
Ah, strange. I don’t suppose you specifically need a Fedora container? If not, I’ve been using this Ubuntu based distrobox container recipe for anything that requires ROCM and it has worked flawless for me.
If that still doesn’t work (I haven’t actually tried out kobolcpp yet), and you’re willing to try something other than kobolcpp, then I’d recommend the text-generation-webui project which supports a wide array of model types, including the GGUF types that Kobolcpp utilizes. Then if you really want to get deep into it, you can even pair it with SillyTavern (it is purely a frontend for a bunch of different LLM backends, text-generation-webui is one of the supported ones)!
I don’t think so, it’s just what I’m more familiar with and I usually try to avoid apt’s PPA hell as much as I can. But maybe I should try some others, as I couldn’t get Mullvad to run either yet. :/
Text gen web ui I tried quite a while before I went to koboldcpp on my previous distro and I could not get that to run without crashing whenever I tried to generate anything. Sillytavern is my standard frontend that I use, so any text gen software should inherently compatible with that anyway.
Hmm, gotcha. I just tried out a fresh copy of text-gen-webui and it seems like the latest version is borked with ROCM (I get the CUDA error: invalid device function error).
My next recommendation then would be LM Studio which to my knowledge can still output an OpenAI compatible API endpoint to be used in SillyTavern - I’ve used it in the past before and I didn’t even need to run it within Distrobox (I have all of the ROCM stuff installed locally, but I generally run most of the AI stuff in distrobox since it tends to require an older version of Python than Arch is currently using) - it seems they’ve recently started supporting running GGUF models via Vulkan, which I assume probably doesn’t require the ROCM stuff to be installed perhaps?
The advanced configuration settings no longer seem to directly mention GPU acceleration like it used to, however I can see it utilizing GPU resources in nvtop currently, and the speed it was generating at (the one in my screenshot was 83 tokens a second) couldn’t have possibly been done on the CPU so it seems to be fine on my side.
I tried LM Studio (directly in Bazzite) and it has the same issue of not running through the GPU. It also always seem to end up stopping generating anything after a few moments when I use it with SillyTavern.
Tried the koboldcpp AUR package through Distrobox, and when I select the ROCm option it crashes with a CUDA error. lol
Using the Vulkan option it still seems to run through the CPU for some reason.
Even worse on Linux. Even worse on more exotic distros like Bazzite where I still can’t get koboldcpp to run, which was already kind of a hassle on my previous distro.
Yeah I hear ya. How do you happen to be running it? I use NixOS and its a challenge there for me but I found atleast some success using docker since the dependencies are so out of control for AI at the moment.
Also give Ollama-rocm + Open Webgui a shot as an alternative to koboldcpp if you cant get atleast some text generation to work because that is the only thing I have got working with rocm.
At the moment I just don’t. I got kobolcpp to run through distrobox / boxbuddy but I can’t get it to compile with rocm, so I can only use CPU generation, which is abysmally slow. Might go back to NovelAI when they release their new model if I can’t find a solution.
What card do you use? I have a 6700XT and getting anything with ROCM running for me requires that I pass the
HSA_OVERRIDE_GFX_VERSION=10.3.0
environmental variable to the related process, otherwise it just refuses to run properly. I wonder if it might be something similar for you too?6650 XT. Honestly no idea. When I run
make LLAMA_HIPBLAS=1 GPU_TARGETS=gfx1032 -j$(nproc)
in the Fedora distrobox on kobolcpp it throws a bunch offatal error: 'hip/hip_fp16.h' file not found 36 | #include <hip/hip_fp16.h>
errors and koboldcpp does not give an option to use Vulkan.
Ah, strange. I don’t suppose you specifically need a Fedora container? If not, I’ve been using this Ubuntu based distrobox container recipe for anything that requires ROCM and it has worked flawless for me.
If that still doesn’t work (I haven’t actually tried out kobolcpp yet), and you’re willing to try something other than kobolcpp, then I’d recommend the text-generation-webui project which supports a wide array of model types, including the GGUF types that Kobolcpp utilizes. Then if you really want to get deep into it, you can even pair it with SillyTavern (it is purely a frontend for a bunch of different LLM backends, text-generation-webui is one of the supported ones)!
I don’t think so, it’s just what I’m more familiar with and I usually try to avoid apt’s PPA hell as much as I can. But maybe I should try some others, as I couldn’t get Mullvad to run either yet. :/
Text gen web ui I tried quite a while before I went to koboldcpp on my previous distro and I could not get that to run without crashing whenever I tried to generate anything. Sillytavern is my standard frontend that I use, so any text gen software should inherently compatible with that anyway.
Hmm, gotcha. I just tried out a fresh copy of text-gen-webui and it seems like the latest version is borked with ROCM (I get the
CUDA error: invalid device function
error).My next recommendation then would be LM Studio which to my knowledge can still output an OpenAI compatible API endpoint to be used in SillyTavern - I’ve used it in the past before and I didn’t even need to run it within Distrobox (I have all of the ROCM stuff installed locally, but I generally run most of the AI stuff in distrobox since it tends to require an older version of Python than Arch is currently using) - it seems they’ve recently started supporting running GGUF models via Vulkan, which I assume probably doesn’t require the ROCM stuff to be installed perhaps?
Might be worth a shot, I just downloaded the latest version (the UI has definitely changed a bit since I last used it) and just grabbed a copy of the Gemma model and ran it, and it seemed to work without an issue for me directly on the host.
The advanced configuration settings no longer seem to directly mention GPU acceleration like it used to, however I can see it utilizing GPU resources in
nvtop
currently, and the speed it was generating at (the one in my screenshot was 83 tokens a second) couldn’t have possibly been done on the CPU so it seems to be fine on my side.I tried LM Studio (directly in Bazzite) and it has the same issue of not running through the GPU. It also always seem to end up stopping generating anything after a few moments when I use it with SillyTavern.
Tried the koboldcpp AUR package through Distrobox, and when I select the ROCm option it crashes with a CUDA error. lol Using the Vulkan option it still seems to run through the CPU for some reason.