Table of Contents
When starting with a local LLM, the go-to tool to install first is Ollama. It handles almost all the messy setup for you, so you can download a model and start chatting with a single command. This article walks through installation, core commands, choosing models, GUIs, API usage, customization, and troubleshooting—end to end, for beginners.
The bottom line first: Ollama is like "Docker for LLMs." Just type ollama run and it fetches, launches, and lets you chat with a quantized model. Run it first, then—once you're comfortable—embed it into your own apps via the API. We'll cover it in that order.
One command, a local LLM
— It handles almost all the setup hassle for you
✅ Free / OSS
🖥️ Win/Mac/Linux
🔌 Local API
⏱️ Minutes to set up
1. What is Ollama? The go-to local-LLM runtime
Ollama is a free, open-source tool for running local LLMs easily on your own PC. It handles the hassle—downloading models, dealing with quantization formats, configuring GPU use—behind the scenes, so all you do is "name a model and run it."
💡 In a nutshell: Ollama is "Docker for LLMs." Fetch a model with ollama pull, chat with ollama run. It also spins up a local API server, so your own apps and chat UIs can call it too.
A similar tool is LM Studio. Roughly: Ollama = CLI-first, for developers, APIs, and automation; LM Studio = GUI-first, for non-engineers getting started. Both are free and install in minutes. This article centers on Ollama (which also covers APIs and embedding); if you want a GUI, jump to Section 5.
2. Installation (Win / Mac / Linux)
Just grab the installer from the official site, ollama.com. Here's the flow per OS.
🪟 Windows / 🍎 Mac
Just download the app from the official site and run it. Launching the app also starts the API server in the background. Then the ollama command is available in your terminal (PowerShell / Terminal).
🐧 Linux
Install with the official one-line script. Also well suited to server use and Docker deployments (an official Docker image is available).
🔌 Check it works: after installing, ollama --version should print a version. Your first model is just one line: ollama run qwen3 (the first run triggers a download).
3. Essential commands at a glance
There are very few commands to learn. Here they are, most-used first.
ollama run <model>
Launch a model and chat. Downloads it first if not present. Exit with /bye.
ollama pull <model>
Download a model only (no chat). Handy for fetching in advance.
ollama list
Show downloaded models and their sizes (ollama ls works too).
ollama ps
Show models currently running (loaded in memory).
ollama rm <model>
Delete a model to free up disk space.
ollama serve
Start the API server (default localhost:11434). Automatic on Win/Mac when the app launches.
4. Getting and choosing models
Specify a model by name + size tag. For example, llama3.2 is the standard size, and llama3.2:3b is the 3B version. The rule of thumb: pick a size that fits in your VRAM.
💡 Which model? Decide by use case (general / coding / your language) and size. For picks by lineage and use case, see our best local LLM models comparison; for the VRAM each size needs, see the hardware requirements article. When unsure, start small (7B class).
5. Using a GUI (Open WebUI and more)
Not a fan of the terminal? No problem—you can put a chat screen (GUI) on top of Ollama.
A popular ChatGPT-style screen you connect to your local Ollama. Supports chat history, model switching, and multiple users.
Want a GUI from the start? LM Studio
A single app that handles model search, download, and chat. Ideal for non-engineers getting started. On Apple Silicon it can be fast via the MLX format.
6. Using the API (embed it in apps)
Ollama's real strength is its local API. The server runs at localhost:11434, and by sending requests to it, your own apps, scripts, and tools can use a local LLM.
Native API
POST localhost:11434
/api/chat
/api/generate
Ollama's own simple format.
OpenAI-compatible API
POST localhost:11434
/v1/chat/completions
Reuse existing OpenAI code by just changing the endpoint.
🔌 OpenAI compatibility is powerful: many libraries and tools support the OpenAI API. Point them at Ollama's /v1 endpoint and you can use local instead of cloud—a handy fallback when the cloud goes down.
7. Customizing (Modelfile, env vars)
It's plenty useful out of the box, but two things are worth knowing if you want to go further.
📝 Modelfile
A config file like a Dockerfile. Add a system prompt and parameters to a base model to make "your own model" (e.g., one that always answers in polite English).
⚙️ Environment variables
Tune operations with OLLAMA_HOST (change the bind address to use it from other devices on your LAN), OLLAMA_MODELS (model storage path, e.g., move to another drive), and more.
8. Troubleshooting
Here are the common snags and fixes, up front.
Slow or stalling
Likely the model doesn't fully fit in VRAM. Go one size smaller, or use a more heavily quantized version.
Crashes from low memory
Budget at least 8 GB RAM for 7B, 16 GB for 13B+. Long inputs use even more, so shorten the context length.
API won't connect
Check that ollama serve is running and port 11434 is free. If the app isn't running, the API is down too.
Model not found
Usually a typo in the name or size tag. Check the correct name in the official model list.
Summary
Ollama is the fastest way to get into local LLMs. Three takeaways:
- Set up in minutes: install from the official site, then just
ollama run <model>. Very few commands to learn. - Choose models by size: stay within your VRAM. When unsure, start at the 7B class and pick a lineage by use case.
- The API is the real value: the OpenAI-compatible API at
localhost:11434lets you embed it in your own apps and chat UIs—and serve as a cloud fallback.
Start by typing ollama run qwen3. The best way to learn is to run it while checking the differences from the cloud and how to choose a model.
FAQ
Q. Is Ollama free? Can I use it commercially?
A. Ollama itself is free and open-source. However, each model you run has its own license, and commercial use depends on the model. Check each model's terms before product use (see the licensing section of our model comparison).
Q. Ollama or LM Studio—which is better?
A. For commands, APIs, automation, and embedding into your own apps, Ollama; if you want to start easily with a GUI, LM Studio. Both are free, so when unsure, install both and compare.
Q. Is my data sent externally?
A. Inference in Ollama stays entirely on your PC; your input is not sent out (apart from the initial model download). That's a big advantage of local LLMs.
Q. Can I use it with existing OpenAI code?
A. Yes. Ollama exposes an OpenAI-compatible API at localhost:11434/v1, so in most cases you only change the endpoint URL and the model name. Handy for switching from cloud to local, or as a fallback.
Q. What kind of PC do I need?
A. As a guide, at least 8 GB RAM for 7B models and 16 GB+ for 13B and up. For comfort, a supported GPU (8 GB+ VRAM) or a Mac with ample unified memory helps. See the hardware requirements article for details.