Claude Code + Ollama: Run AI Coding Locally in 2026
What Is This and Why Does It Matter?
Ollama recently added support for the Anthropic Messages API, which means Claude Code can now interact with any Ollama-hosted model. This opens up faster, more flexible coding workflows without being tied to the cloud — you can run models locally on your machine, connect to cloud models hosted by Ollama, and use Claude Code features like multi-turn conversations, tool calling, and vision inputs.
Privacy first: Your prompts and code never leave localhost — useful for working with sensitive data like legal records, medical information, or private customer details.
Prerequisites
Before you begin, make sure you have:
| Requirement | Details |
|---|---|
| RAM | 16GB minimum, 32GB+ recommended |
| Storage | Several GB free (models are large) |
| Node.js | Latest LTS version |
| GPU (optional) | NVIDIA or Apple Silicon for faster inference |
Apple Silicon note: At 16GB you can run smaller models, but expect rough edges — more wrong edits, more retries, slower throughput. 32GB is the sweet spot.
Step 1: Install Ollama
Linux:
curl -fsSL https://ollama.com/install.sh | shmacOS / Windows: Download the installer from ollama.com and follow the on-screen instructions.
Verify Ollama is running:
ollama -vIf you get an error, the service may not be running yet. Start it manually:
ollama serveVersion requirement: You'll want Ollama v0.14+ for full tool-use support. Streaming tool calls are required for Claude Code's agentic loop.
Step 2: Pull a Coding Model
Choose a model based on your hardware:
1# Lightweight / mid-range (16–32GB RAM)
2ollama pull qwen2.5-coder:7b
3
4# High performance local model (recommended for agentic use)
5ollama pull glm-4.7-flash
6
7# Large model for powerful hardware (64GB+ / GPU)
8ollama pull gpt-oss:20b
9
10# Cloud model — no GPU needed, free tier available
11ollama pull kimi-k2.5:cloudContext window tip: Claude Code requires a large context window — at least 64k tokens is recommended. Verify your chosen model supports this.
Test your model:
ollama run glm-4.7-flashStep 3: Install Claude Code
# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash
# Windows CMD
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmdStep 4: Connect Claude Code to Ollama
You have two options — pick whichever fits your workflow.
Option A: Using ollama launch (Simplest)
# Interactive model picker
ollama launch claude
# Or specify the model directly
ollama launch claude --model glm-4.7-flash
ollama launch claude --model qwen2.5
ollama launch claude --model kimi-k2.5:cloudOption B: Manual Environment Variables
Add these to your ~/.bashrc or ~/.zshrc for a persistent setup:
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"Then launch Claude Code pointing to your model:
claude --model glm-4.7-flashOn Windows (PowerShell)
$env:ANTHROPIC_AUTH_TOKEN = "ollama"
$env:ANTHROPIC_API_KEY = ""
$env:ANTHROPIC_BASE_URL = "http://localhost:11434"
claude --model glm-4.7-flash --allow-dangerously-skip-permissionsStep 5: Migrate Existing Anthropic SDK Code
If you already have applications using Anthropic's SDK, switching to Ollama only requires updating the base URL.
Python:
1import anthropic
2
3client = anthropic.Anthropic(
4 base_url='http://localhost:11434',
5 api_key='ollama', # required field but ignored by Ollama
6)
7
8message = client.messages.create(
9 model='qwen2.5-coder:7b',
10 max_tokens=1024,
11 messages=[
12 {'role': 'user', 'content': 'Write a function to check if a number is prime'}
13 ]
14)
15print(message.content)JavaScript / TypeScript:
1import Anthropic from '@anthropic-ai/sdk';
2
3const client = new Anthropic({
4 baseURL: 'http://localhost:11434',
5 apiKey: 'ollama',
6});
7
8const message = await client.messages.create({
9 model: 'qwen2.5-coder:7b',
10 max_tokens: 1024,
11 messages: [{ role: 'user', content: 'Write a function to check if a number is prime' }],
12});
13
14console.log(message.content);Using Ollama Cloud Models (Optional)
Ollama has :cloud variants that run on Ollama's infrastructure but use the exact same commands as local models — no API keys to manage:
ollama pull kimi-k2.5:cloud
ollama pull minimax-m2.1:cloudAll users get a free allocation of cloud model usage. A paid subscription is needed for heavy usage.
Recommended Models by Use Case
| Hardware | Recommended Model | Notes |
|---|---|---|
| 16GB RAM | qwen2.5-coder:7b | Decent quality, runs on CPU |
| 32GB RAM | glm-4.7-flash | Great tool-use for agentic tasks |
| 64GB+ / GPU | gpt-oss:20b | High-quality output |
| No local GPU | kimi-k2.5:cloud | Cloud model, free tier available |
Troubleshooting
unknown command "launch" for "ollama"
Your Ollama version is outdated. Update Ollama and the error will go away.
Claude can't read / write files
By default, almost all permissions are denied. Use /permissions inside Claude Code and add explicit allow rules for Bash and file access.
Out of memory errors
If you see "model requires more system memory than is available," your machine doesn't have enough free RAM. You need at least 12GB free to run most coding models. Try a smaller model like qwen2.5-coder:7b.
Models give wrong edits or loop repeatedly This is a sign the model is too small for your use case or lacks sufficient context. Upgrade to a larger model or use a cloud variant.
Summary
Claude Code is flexible — you're not locked into Anthropic's API. If you need complete privacy, a local model on a capable machine works well for day-to-day development. If you prefer fast, high-quality output without the privacy constraint, Ollama's cloud models are a cost-effective alternative. The setup takes under 10 minutes, and the workflow is identical regardless of which backend you choose.
Written by
Kirtesh Admute
Full-stack engineer and digital architect — building scalable, production-grade systems with real-world impact.

&w=3840&q=75)