Running Llama 3 with OpenWebUI Locally

Introduction

This guide will help you set up and run Llama 3 with OpenWebUI using Docker. Running large language models (LLMs) locally provides enhanced privacy, security, and performance. It is particularly beneficial for users with powerful GPUs.

Why Use a GPU?

Using a GPU significantly speeds up the processing of LLMs. It allows efficient handling of large data and complex computations required by models like Llama 3 70B, making it an ideal choice for users who need powerful computational resources.

Prerequisites

Powerful NVIDIA GPU: Required for running Llama 3 70B efficiently.
Linux OS or Windows with WSL: This guide covers both platforms.
Docker: Necessary for containerized application deployment.
NVIDIA Drivers and CUDA Toolkit: Needed for GPU support in Docker containers.

Installation Guide

1. Install Docker

For Linux:

Update Packages:
```
 sudo apt-get update
```

Install Docker:

 sudo apt-get install -y docker.io
 sudo systemctl start docker
 sudo systemctl enable docker

For Windows (WSL):

Set Up WSL 2:
- Open PowerShell as Administrator:
```
wsl --install
wsl --set-default-version 2
```
- Install a Linux distribution from the Microsoft Store (e.g., Ubuntu).

Install Docker in WSL:

Open your Linux distribution and install Docker:

sudo apt-get update
sudo apt-get install -y docker.io
sudo service docker start
sudo usermod -aG docker $USER
newgrp docker

2. Install NVIDIA Drivers and CUDA Toolkit

For Linux:

Install NVIDIA Drivers:

 sudo apt-get install -y nvidia-driver-530  # Replace 530 with the correct version for your GPU
 sudo reboot  # Restart to apply changes

Install CUDA Toolkit:
- Download and install the CUDA toolkit from the NVIDIA website.

For Windows:

Install NVIDIA Drivers:
- Download and install the latest drivers from the NVIDIA website.
Install CUDA Toolkit:
- Download and install the CUDA toolkit from the NVIDIA website.

3. Install NVIDIA Docker Toolkit

For Linux:

Add NVIDIA Docker Repository:

 distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
 curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
 curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
 sudo apt-get update

Install NVIDIA Docker Toolkit:

 sudo apt-get install -y nvidia-docker2
 sudo systemctl restart docker

For Windows (WSL):

Set Up NVIDIA Container Toolkit:

Open your Linux distribution and run:

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo service docker restart

Verify GPU Support:
```
 docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
- This command should display your GPU details if configured correctly.

4. Pull and Run OpenWebUI Docker Image

Create a Directory for OpenWebUI:

 mkdir -p ~/open-webui
 cd ~/open-webui

Pull the Docker Image:

 docker pull ghcr.io/open-webui/open-webui:main

Running OpenWebUI with GPU Support

For enhanced performance using an NVIDIA GPU, run:

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

Running Along Side Ollama

For installation if you Have Ollama Running on the System, run:

docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Installation for OpenAI API Usage Only

If you only need OpenWebUI for the OpenAI API, use the command below and replace your_secret_key with your actual API key:

Note: Consider setting your OpenAI key to an environment varriable rather than your cli input which may save the secret to your cli history.

docker run -d -p 3000:8080 -e OPENAI_API_KEY=$OPENAI_API_KEY -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Installing OpenWebUI with Bundled Ollama Support

For a streamlined setup that includes both OpenWebUI and Ollama, choose the appropriate command based on your hardware setup:

With GPU Support:

Utilize GPU resources by running the following command:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

For CPU Only:

If you don’t have a GPU, use this command instead:

docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Conclusion

By following this guide, you can efficiently set up and run Llama 3 with OpenWebUI locally on both Linux and Windows using WSL. Leveraging your NVIDIA GPU will provide optimal performance and ensure you maintain control over your data privacy and security.