Ollama on FreeBSD Using GPU Passthrough
LLMs and “AI” are a hot topic, and despite the hype, there are practical applications that convinced me to invest my time and money to set up a local LLM. This article isn’t about my use cases, but about the process of getting Ollama up and running with GPU acceleration on FreeBSD using bhyve. This is a relatively advanced setup, requiring familiarity with FreeBSD, virtualization, and NVIDIA drivers.
Initially, I explored running Ollama within a FreeBSD jail, but it only utilized CPU resources, quickly consuming all available cores. To leverage my GeForce RTX 3060’s 12GB of RAM and GPU acceleration, I shifted to a Bhyve VM setup. I prefer running my VMs with UEFI boot, as this makes migration to different hardware or hypervisors much easier. Let’s get started:
pkg install bhyve-firmware vm-bhyve
I use ZFS for storage and after creating a ZFS dataset (zfs create zroot/bhyve
), I modified my /etc/rc.conf
file with the
following configuration:
vm_enable="YES"
vm_dir="zfs:zroot/bhyve"
vm_list="ollama"
vm_delay="5"
As you can see the VM will called ollama and use we utilize vm_list
to start it at boot. vm-bhyve
is an awesome tool and using
vm passthru
one can quickly identify the PCI device ids that represent the GPU to passthrough:
DEVICE BHYVE ID READY DESCRIPTION
...
ppt0 1/0/0 No GA104 [GeForce RTX 3060]
ppt1 1/0/1 No GA104 High Definition Audio Controller
...
The Ready column "No"
means that the ppt driver was not loaded yet, so we can’t passthrough yet.
We can change the assigned driver using a command
(see description in vmm(4) )
or update the boot loader config.
As I’m running AMD with an enabled IOMMU I have to enable hw.vmm.amdvi.
We have to do these steps before we load the vmm
. Note that the VM would simply not
boot (i.e. directly die) if theses things are not setup properly.
So setting the /boot/loader.conf
to the following lines, is what we have to do, before rebooting:
hw.vmm.amdvi.enable="1"
pptdevs="1/0/0 1/0/1"
vmm_load="YES"
After reboot vm passthru
will now tell us, that we are ready for the GPU passthrough:
DEVICE BHYVE ID READY DESCRIPTION
...
ppt0 1/0/0 Yes GA104 [GeForce RTX 3060]
ppt1 1/0/1 Yes GA104 High Definition Audio Controller
...
By the way I choose to forward the all of the cards PCI devices. Not just the GPU, as there have been reported instabilities by the community when not doing so, I did not verify this. As I don’t plan to use the audio, it doesn’t really matter for me anyways.
Next we perform vm create -s 200G ollama
to create the machine with enough disk space for models.
I install debian using vm install debian-13.0.0-amd64-netinst.iso ollama
, because I work with it for
years at and thus know how to fix things. Having a small image is anyways not realistic
with LLM software being installed. Expect to download gigabytes of unused and proprietary software.
After installation I vm stop ollama
to add the PCI cards using vm configure bhyve
:
cryptodev_load="YES"
loader="uefi"
graphics="yes"
cpu=2
memory=4096M
network0_type="virtio-net"
network0_switch="srvbr"
disk0_type="virtio-blk"
disk0_name="disk0.img"
disk0_dev="sparse-zvol"
uuid="fe26f494-7827-11f0-aad0-c4623705b122"
network0_mac="58:9c:fc:02:a0:5b"
# add the gpu:
passthru0="1/0/0=8:0"
passthru1="1/0/1=8:1"
pptdevs="msi=on"
Now, we can vm start ollama
to start installing the actual software. I began with installing
the NVIDIA CUDA drivers
by the provided manual. The issue was, that the driver didn’t load after install.
Searching for the error on the web, lead me to the FreeBSD Forums that reported this issue already a while ago:
[ 77.208984] NVRM: Can't find an IRQ for your NVIDIA card!
[ 77.212697] NVRM: Please check your BIOS settings.
[ 77.212699] NVRM: [Plug & Play OS] should be set to NO
[ 77.212700] NVRM: [Assign IRQ to VGA] should be set to YES
[ 77.212702] nvidia: probe of 0000:00:07.0 failed with error -1
@Corvink thankfully already checked on the issue and provided some patches to bhyve. I tried them successfully immediately:
cd /usr/
rm -rf /usr/src
git clone https://github.com/beckhoff/freebsd-src /usr/src
cd /usr/src
git checkout -f origin/phab/corvink/14.2/nvidia-wip
cd /usr/src/usr.sbin/bhyve
make && make install
After another vm start ollama
no more errors, the driver was loaded successfully, amazing:
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06 Driver Version: 580.65.06 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:00:08.0 Off | N/A |
| 0% 35C P8 9W / 170W | 10MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Great, now up to the next step. Installing ollama curl -fsSL https://ollama.com/install.sh | sh
.
I know, scary, but after downloading about 4 GB of software, trust is here not the biggest
issue any longer. I verified the script before, all seems fine to me.
Checking the ollama process gave me confidence that the test will be good, as the driver seems recognized by ollama.
...library=cuda variant=v12 compute=8.6 driver=13.0 name="NVIDIA GeForce RTX 3060"...
So we start ollama run gemma3
to see how it is performing.
Performance is great, but how to confirm that is in fact using the GPU? btop
to the rescue.
btop
helps to monitor the CPU but also the GPU. After a quick check, I was sure, that everything was working
as I wanted. The GPU is utilized and doing most of the work. Check the title picture of
this article to see the GPU activity in btop
during queries to ollama.
Thanks to the awesome freebsd community this was a matter of about three hours.
P.S.: We are now having a new Issue and Review, that will hopefully make this setup even easier for upcoming freebsd versions.