Evolving Architecture | Ollama on FreeBSD Using GPU Passthrough

Ollama on FreeBSD Using GPU Passthrough

ollama-gpu

LLMs and “AI” are a hot topic, and despite the hype, there are practical applications that convinced me to invest my time and money to set up a local LLM. This article isn’t about my use cases, but about the process of getting Ollama up and running with GPU acceleration on FreeBSD using bhyve. This is a relatively advanced setup, requiring familiarity with FreeBSD, virtualization, and NVIDIA drivers.

Initially, I explored running Ollama within a FreeBSD jail, but it only utilized CPU resources, quickly consuming all available cores. To leverage my GeForce RTX 3060’s 12GB of RAM and GPU acceleration, I shifted to a Bhyve VM setup. I prefer running my VMs with UEFI boot, as this makes migration to different hardware or hypervisors much easier. Let’s get started:

pkg install bhyve-firmware vm-bhyve

I use ZFS for storage and after creating a ZFS dataset (zfs create zroot/bhyve), I modified my /etc/rc.conf file with the following configuration:

vm_enable="YES"
vm_dir="zfs:zroot/bhyve"
vm_list="ollama"
vm_delay="5"

As you can see the VM will called ollama and use we utilize vm_list to start it at boot. vm-bhyve is an awesome tool and using vm passthru one can quickly identify the PCI device ids that represent the GPU to passthrough:

DEVICE     BHYVE ID     READY        DESCRIPTION
...
ppt0       1/0/0        No           GA104 [GeForce RTX 3060]
ppt1       1/0/1        No           GA104 High Definition Audio Controller
...

The Ready column "No" means that the ppt driver was not loaded yet, so we can’t passthrough yet. We can change the assigned driver using a command (see description in vmm(4) ) or update the boot loader config.

As I’m running AMD with an enabled IOMMU I have to enable hw.vmm.amdvi. We have to do these steps before we load the vmm. Note that the VM would simply not boot (i.e. directly die) if theses things are not setup properly.

So setting the /boot/loader.conf to the following lines, is what we have to do, before rebooting:

hw.vmm.amdvi.enable="1"
pptdevs="1/0/0 1/0/1"
vmm_load="YES"

After reboot vm passthru will now tell us, that we are ready for the GPU passthrough:

DEVICE     BHYVE ID     READY        DESCRIPTION
...
ppt0       1/0/0        Yes          GA104 [GeForce RTX 3060]
ppt1       1/0/1        Yes          GA104 High Definition Audio Controller
...

By the way I choose to forward the all of the cards PCI devices. Not just the GPU, as there have been reported instabilities by the community when not doing so, I did not verify this. As I don’t plan to use the audio, it doesn’t really matter for me anyways.

Next we perform vm create -s 200G ollama to create the machine with enough disk space for models.

I install debian using vm install debian-13.0.0-amd64-netinst.iso ollama, because I work with it for years at and thus know how to fix things. Having a small image is anyways not realistic with LLM software being installed. Expect to download gigabytes of unused and proprietary software.

After installation I vm stop ollama to add the PCI cards using vm configure bhyve:

cryptodev_load="YES"
loader="uefi"
graphics="yes"
cpu=2
memory=4096M
network0_type="virtio-net"
network0_switch="srvbr"
disk0_type="virtio-blk"
disk0_name="disk0.img"
disk0_dev="sparse-zvol"
uuid="fe26f494-7827-11f0-aad0-c4623705b122"
network0_mac="58:9c:fc:02:a0:5b"
# add the gpu:
passthru0="1/0/0=8:0"
passthru1="1/0/1=8:1"
pptdevs="msi=on"

Now, we can vm start ollama to start installing the actual software. I began with installing the NVIDIA CUDA drivers by the provided manual. The issue was, that the driver didn’t load after install.

Searching for the error on the web, lead me to the FreeBSD Forums that reported this issue already a while ago:

[ 77.208984] NVRM: Can't find an IRQ for your NVIDIA card!
[ 77.212697] NVRM: Please check your BIOS settings.
[ 77.212699] NVRM: [Plug & Play OS] should be set to NO
[ 77.212700] NVRM: [Assign IRQ to VGA] should be set to YES
[ 77.212702] nvidia: probe of 0000:00:07.0 failed with error -1

@Corvink thankfully already checked on the issue and provided some patches to bhyve. I tried them successfully immediately:

cd /usr/
rm -rf /usr/src
git clone https://github.com/beckhoff/freebsd-src /usr/src
cd /usr/src
git checkout -f origin/phab/corvink/14.2/nvidia-wip
cd /usr/src/usr.sbin/bhyve
make && make install

After another vm start ollama no more errors, the driver was loaded successfully, amazing:

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        On  |   00000000:00:08.0 Off |                  N/A |
|  0%   35C    P8              9W /  170W |      10MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Great, now up to the next step. Installing ollama curl -fsSL https://ollama.com/install.sh | sh. I know, scary, but after downloading about 4 GB of software, trust is here not the biggest issue any longer. I verified the script before, all seems fine to me.

Checking the ollama process gave me confidence that the test will be good, as the driver seems recognized by ollama.

...library=cuda variant=v12 compute=8.6 driver=13.0 name="NVIDIA GeForce RTX 3060"...

So we start ollama run gemma3 to see how it is performing.

Performance is great, but how to confirm that is in fact using the GPU? btop to the rescue. btop helps to monitor the CPU but also the GPU. After a quick check, I was sure, that everything was working as I wanted. The GPU is utilized and doing most of the work. Check the title picture of this article to see the GPU activity in btop during queries to ollama.

Thanks to the awesome freebsd community this was a matter of about three hours.

P.S.: We are now having a new Issue and Review, that will hopefully make this setup even easier for upcoming freebsd versions.