llama-gpt
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
⭐ 10,973 stars on GitHub · 🍴 714 forks · 📜 License: mit · 💻 Language: TypeScript
What is llama-gpt?
A private ChatGPT-style assistant you can run on your own hardware, with no prompts or responses sent to a cloud provider. The differentiator is straightforward: local Llama 2 and Code Llama models wrapped in a familiar web UI, plus an OpenAI-compatible API for integrations.
Main components
- Browser-based chat interface for running a local AI assistant at
localhost:3000. - Llama 2 chat model support, including 7B, 13B, and 70B variants.
- Code Llama support for developer-focused coding help and technical Q&A.
- Docker-based deployment for x86 and arm64 systems, including M1/M2 Macs.
- Optional Nvidia GPU acceleration via CUDA for faster inference.
- OpenAI-compatible API on
localhost:3001for apps that expect OpenAI-style endpoints. - Kubernetes manifests for deploying into a homelab or internal cluster.
Clear use cases
- Run a private AI chatbot for sensitive notes, internal docs, or client data without sending content to OpenAI, Anthropic, or other SaaS APIs.
- Give developers a local coding assistant using Code Llama, useful for offline work or restricted environments.
- Add an OpenAI-compatible local backend to prototypes, internal tools, or automation scripts.
- Deploy an AI assistant on an Umbrel home server for household, lab, or small-office use.
- Test Llama 2 model sizes against your own hardware before committing to a larger inference setup.
The biggest strength is privacy-first local inference with a low-friction setup — you get a ChatGPT-like experience without handing your data to a third-party API. Compared with commercial AI assistants, llama-gpt is not trying to win on model quality or polished enterprise features; it wins when data control, offline access, and predictable self-hosting matter more than having the latest frontier model.
There are trade-offs. You need enough RAM for the model you choose, ranging from roughly 6GB for smaller 7B models to more than 40GB for the 70B option, and response speed depends heavily on CPU/GPU hardware. Custom model support is also listed as roadmap work, so if you want a flexible model playground, you may hit limits.
Best for privacy-conscious developers, homelab users, and small teams that want a local ChatGPT-style assistant with Docker, GPU support, and an OpenAI-compatible API.
Topics: the project is tagged with popular topics:
- 🏷️
ai - 🏷️
chatgpt - 🏷️
code-llama - 🏷️
codellama - 🏷️
gpt - 🏷️
gpt-4 - 🏷️
gpt4all - 🏷️
llama - 🏷️
llama-2 - 🏷️
llama-cpp
📸 Screenshots

Quick install
The project supports Docker Compose:
git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt
docker compose up -d
Check the README in the repo for required env variables.
Minimum system requirements
| Component | Recommended |
|---|---|
| RAM | 4096 MB |
| CPU | 2 vCPU |
| Disk | 50 GB SSD |
| OS | Ubuntu 22.04 LTS / Debian 12 |
| Docker | 24.0+ |
⚡ Deploy fast on VSIS
Use the VSIS VPS Standard 4GB RAM / 2 vCPU / 50GB SSD (~150k/tháng) plan from VSIS.NET — high-speed VN-based VPS, 24/7 support, ideal for running llama-gpt smoothly.
🎯 Benefits:
- One-command
docker compose up -ddeploy in 2 minutes - Dedicated IPv4, root access, unmetered domestic bandwidth
- Daily snapshot backup
- Free install assistance from the VSIS team
👉 See matching VPS plans at vsis.net
Resources
- 🔗 GitHub: getumbrel/llama-gpt
- 🌐 Homepage: https://apps.umbrel.com/app/llama-gpt
- 📚 Official docs: see README in the repo
- 💬 Community: GitHub Issues + Discussions
Article compiled from GitHub data on 05/05/2026. Star/fork counts may have changed — see live numbers via the GitHub link.
