Ollama Complete Guide [2026]: Setup, Commands, API

Q: Is Ollama free? Can I use it commercially?

Ollama itself is free and open-source. However, each model you run has its own license, and commercial use depends on the model. Check each model&#039;s terms before product use (see the licensing section of our model comparison).

Q: Is my data sent externally?

Inference in Ollama stays entirely on your PC; your input is not sent out (apart from the initial model download). That&#039;s a big advantage of local LLMs.

Q: Can I use it with existing OpenAI code?

Yes. Ollama exposes an OpenAI-compatible API at localhost:11434/v1, so in most cases you only change the endpoint URL and the model name. Handy for switching from cloud to local, or as a fallback.

Ollama Complete Guide [2026]: Install, Commands & API Usage

Table of Contents

1. What is Ollama? The go-to local-LLM runtime
2. Installation (Win / Mac / Linux)
3. Essential commands at a glance
4. Getting and choosing models
5. Using a GUI (Open WebUI and more)
6. Using the API (embed it in apps)
7. Customizing (Modelfile, env vars)
8. Troubleshooting
Summary
FAQ

When starting with a local LLM, the go-to tool to install first is Ollama. It handles almost all the messy setup for you, so you can download a model and start chatting with a single command. This article walks through installation, core commands, choosing models, GUIs, API usage, customization, and troubleshooting—end to end, for beginners.

The bottom line first: Ollama is like "Docker for LLMs." Just type ollama run and it fetches, launches, and lets you chat with a quantized model. Run it first, then—once you're comfortable—embed it into your own apps via the API. We'll cover it in that order.

LOCAL LLM RUNTIME

One command, a local LLM

— It handles almost all the setup hassle for you

$ ollama pull qwen3
$ ollama run qwen3
>>> Hi! What can you do?

✅ Free / OSS

🖥️ Win/Mac/Linux

🔌 Local API

⏱️ Minutes to set up

1. What is Ollama? The go-to local-LLM runtime

Ollama is a free, open-source tool for running local LLMs easily on your own PC. It handles the hassle—downloading models, dealing with quantization formats, configuring GPU use—behind the scenes, so all you do is "name a model and run it."

💡 In a nutshell: Ollama is "Docker for LLMs." Fetch a model with ollama pull, chat with ollama run. It also spins up a local API server, so your own apps and chat UIs can call it too.

A similar tool is LM Studio. Roughly: Ollama = CLI-first, for developers, APIs, and automation; LM Studio = GUI-first, for non-engineers getting started. Both are free and install in minutes. This article centers on Ollama (which also covers APIs and embedding); if you want a GUI, jump to Section 5.

2. Installation (Win / Mac / Linux)

Just grab the installer from the official site, ollama.com. Here's the flow per OS.

🪟 Windows / 🍎 Mac

Just download the app from the official site and run it. Launching the app also starts the API server in the background. Then the ollama command is available in your terminal (PowerShell / Terminal).

🐧 Linux

Install with the official one-line script. Also well suited to server use and Docker deployments (an official Docker image is available).

🔌 Check it works: after installing, ollama --version should print a version. Your first model is just one line: ollama run qwen3 (the first run triggers a download).

3. Essential commands at a glance

There are very few commands to learn. Here they are, most-used first.

ollama run <model>

Launch a model and chat. Downloads it first if not present. Exit with /bye.

ollama pull <model>

Download a model only (no chat). Handy for fetching in advance.

ollama list

Show downloaded models and their sizes (ollama ls works too).

ollama ps

Show models currently running (loaded in memory).

ollama rm <model>

Delete a model to free up disk space.

ollama serve

Start the API server (default localhost:11434). Automatic on Win/Mac when the app launches.

4. Getting and choosing models

Specify a model by name + size tag. For example, llama3.2 is the standard size, and llama3.2:3b is the 3B version. The rule of thumb: pick a size that fits in your VRAM.

# Try a lightweight model (entry)
ollama run gemma3:4b
# A solid all-rounder, strong multilingual
ollama run qwen3
# For coding
ollama run qwen3-coder

💡 Which model? Decide by use case (general / coding / your language) and size. For picks by lineage and use case, see our best local LLM models comparison; for the VRAM each size needs, see the hardware requirements article. When unsure, start small (7B class).

5. Using a GUI (Open WebUI and more)

Not a fan of the terminal? No problem—you can put a chat screen (GUI) on top of Ollama.

Open WebUI

A popular ChatGPT-style screen you connect to your local Ollama. Supports chat history, model switching, and multiple users.

Want a GUI from the start? LM Studio

A single app that handles model search, download, and chat. Ideal for non-engineers getting started. On Apple Silicon it can be fast via the MLX format.

6. Using the API (embed it in apps)

Ollama's real strength is its local API. The server runs at localhost:11434, and by sending requests to it, your own apps, scripts, and tools can use a local LLM.

Native API

POST localhost:11434
　/api/chat
　/api/generate

Ollama's own simple format.

OpenAI-compatible API

POST localhost:11434
　/v1/chat/completions

Reuse existing OpenAI code by just changing the endpoint.

🔌 OpenAI compatibility is powerful: many libraries and tools support the OpenAI API. Point them at Ollama's /v1 endpoint and you can use local instead of cloud—a handy fallback when the cloud goes down.

7. Customizing (Modelfile, env vars)

It's plenty useful out of the box, but two things are worth knowing if you want to go further.

📝 Modelfile

A config file like a Dockerfile. Add a system prompt and parameters to a base model to make "your own model" (e.g., one that always answers in polite English).

⚙️ Environment variables

Tune operations with OLLAMA_HOST (change the bind address to use it from other devices on your LAN), OLLAMA_MODELS (model storage path, e.g., move to another drive), and more.

8. Troubleshooting

Here are the common snags and fixes, up front.

Slow or stalling

Likely the model doesn't fully fit in VRAM. Go one size smaller, or use a more heavily quantized version.

Crashes from low memory

Budget at least 8 GB RAM for 7B, 16 GB for 13B+. Long inputs use even more, so shorten the context length.

API won't connect

Check that ollama serve is running and port 11434 is free. If the app isn't running, the API is down too.

Model not found

Usually a typo in the name or size tag. Check the correct name in the official model list.

Summary

Ollama is the fastest way to get into local LLMs. Three takeaways:

Set up in minutes: install from the official site, then just ollama run <model>. Very few commands to learn.
Choose models by size: stay within your VRAM. When unsure, start at the 7B class and pick a lineage by use case.
The API is the real value: the OpenAI-compatible API at localhost:11434 lets you embed it in your own apps and chat UIs—and serve as a cloud fallback.

Start by typing ollama run qwen3. The best way to learn is to run it while checking the differences from the cloud and how to choose a model.

FAQ

Q. Is Ollama free? Can I use it commercially?

A. Ollama itself is free and open-source. However, each model you run has its own license, and commercial use depends on the model. Check each model's terms before product use (see the licensing section of our model comparison).

Q. Ollama or LM Studio—which is better?

A. For commands, APIs, automation, and embedding into your own apps, Ollama; if you want to start easily with a GUI, LM Studio. Both are free, so when unsure, install both and compare.

Q. Is my data sent externally?

A. Inference in Ollama stays entirely on your PC; your input is not sent out (apart from the initial model download). That's a big advantage of local LLMs.

Q. Can I use it with existing OpenAI code?

A. Yes. Ollama exposes an OpenAI-compatible API at localhost:11434/v1, so in most cases you only change the endpoint URL and the model name. Handy for switching from cloud to local, or as a fallback.

Q. What kind of PC do I need?

A. As a guide, at least 8 GB RAM for 7B models and 16 GB+ for 13B and up. For comfort, a supported GPU (8 GB+ VRAM) or a Mac with ample unified memory helps. See the hardware requirements article for details.

Ollama Complete Guide [2026]: Install, Commands & API Usage

One command, a local LLM

1. What is Ollama? The go-to local-LLM runtime

2. Installation (Win / Mac / Linux)

3. Essential commands at a glance

4. Getting and choosing models

5. Using a GUI (Open WebUI and more)

6. Using the API (embed it in apps)

7. Customizing (Modelfile, env vars)

8. Troubleshooting

Summary

FAQ

Related Articles

Generative AI Knowledge Cutoff Dates Compared: ChatGPT, Claude, Gemini & More

What Is Generative AI? How It Differs from Traditional AI

Generative AI Strengths and Weaknesses — What It Can and Cannot Do with Real Examples

What Is an LLM? How Large Language Models Work, Top Models & Use Cases

Comments

Leave a Comment