As part of a personal project, I equipped myself with an NVIDIA GPU (an RTX 3060) to properly run LLM models locally.

To easily use different models, I rely on OpenWebUI (with Ollama). Since the installation can be a bit of an adventure, I’m summarizing the steps here.

Image designed by author

Configuration Used

On my PC, I have:

This setup allows me to properly run 14B models (around thirty tokens/s).

Installing the Nvidia Drivers

There are several methods; I used the one from the Ubuntu site: NVIDIA drivers installation

In summary, run:

sudo ubuntu-drivers list 
sudo ubuntu-drivers install

Then verify with nvidia-smi. You should get something like this:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:01:00.0  On |                  N/A |
|  0%   39C    P8             10W /  170W |     664MiB /  12288MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2909      G   /usr/lib/xorg/Xorg                            281MiB |
|    0   N/A  N/A      3245      G   /usr/bin/gnome-shell                          138MiB |
|    0   N/A  N/A     20152      G   ...onEnabled --variations-seed-version        140MiB |
+-----------------------------------------------------------------------------------------+

Installing Docker with NVIDIA GPU Support

No particular difficulties here; just follow the documentation (and don’t forget to reboot your PC after installing the NVIDIA toolkit).

Docker

Follow the official documentation. I chose the method with apt repositories: install-using-the-repository. Don’t forget the post-installation steps: linux-postinstall (to avoid using sudo for each Docker command).

NVIDIA Container Toolkit

Same as before, I used the official documentation with apt: install-guide.html#installing-with-apt to install the Toolkit.

Then configure Docker: install-guide.html#configuring-docker

Proceed to the verification step to ensure Docker can indeed use the GPU: running-a-sample-workload.

You should get something like this:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
                                                                                                     
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:01:00.0  On |                  N/A |
|  0%   43C    P8             12W /  170W |     625MiB /  12288MiB |     31%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

Ollama and OpenWebUI in docker-compose

We’re almost done!

To easily manage Docker (on a local network), I like to use Portainer. But you can also do it with Vim, of course, I’m not picky.

To write my docker-compose.yaml, I used the OpenWebUI example: docker-compose.yaml.

To allow Docker to use the GPU, I relied on this example: docker-compose.gpu.yaml.

Which gives:

services:
  ollama:
    volumes:
      - ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu

  open-webui:
    build:
      context: .
      args:
        OLLAMA_BASE_URL: '/ollama'
      dockerfile: Dockerfile
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

volumes:
  ollama: {}
  open-webui: {}

That’s a basic installation for simple use (don’t you dare put this into production as is!).

Feel free to share your feedback and enjoy chatting with your LLMs :-)