JournalDeveloper Tools
Developer Tools

Claude Code + Ollama: Run AI Coding Locally in 2026

Kirtesh Admute
March 24, 2026
8 min read
Claude Code + Ollama: Run AI Coding Locally in 2026
Share

What Is This and Why Does It Matter?

Ollama recently added support for the Anthropic Messages API, which means Claude Code can now interact with any Ollama-hosted model. This opens up faster, more flexible coding workflows without being tied to the cloud — you can run models locally on your machine, connect to cloud models hosted by Ollama, and use Claude Code features like multi-turn conversations, tool calling, and vision inputs.

Privacy first: Your prompts and code never leave localhost — useful for working with sensitive data like legal records, medical information, or private customer details.


Prerequisites

Before you begin, make sure you have:

RequirementDetails
RAM16GB minimum, 32GB+ recommended
StorageSeveral GB free (models are large)
Node.jsLatest LTS version
GPU (optional)NVIDIA or Apple Silicon for faster inference

Apple Silicon note: At 16GB you can run smaller models, but expect rough edges — more wrong edits, more retries, slower throughput. 32GB is the sweet spot.


Step 1: Install Ollama

Linux:

bash
curl -fsSL https://ollama.com/install.sh | sh

macOS / Windows: Download the installer from ollama.com and follow the on-screen instructions.

Verify Ollama is running:

bash
ollama -v

If you get an error, the service may not be running yet. Start it manually:

bash
ollama serve

Version requirement: You'll want Ollama v0.14+ for full tool-use support. Streaming tool calls are required for Claude Code's agentic loop.


Step 2: Pull a Coding Model

Choose a model based on your hardware:

bash
1# Lightweight / mid-range (16–32GB RAM)
2ollama pull qwen2.5-coder:7b
3
4# High performance local model (recommended for agentic use)
5ollama pull glm-4.7-flash
6
7# Large model for powerful hardware (64GB+ / GPU)
8ollama pull gpt-oss:20b
9
10# Cloud model — no GPU needed, free tier available
11ollama pull kimi-k2.5:cloud

Context window tip: Claude Code requires a large context window — at least 64k tokens is recommended. Verify your chosen model supports this.

Test your model:

bash
ollama run glm-4.7-flash

Step 3: Install Claude Code

bash
# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash

# Windows CMD
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

Step 4: Connect Claude Code to Ollama

You have two options — pick whichever fits your workflow.

Option A: Using ollama launch (Simplest)

bash
# Interactive model picker
ollama launch claude

# Or specify the model directly
ollama launch claude --model glm-4.7-flash
ollama launch claude --model qwen2.5
ollama launch claude --model kimi-k2.5:cloud

Option B: Manual Environment Variables

Add these to your ~/.bashrc or ~/.zshrc for a persistent setup:

bash
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"

Then launch Claude Code pointing to your model:

bash
claude --model glm-4.7-flash

On Windows (PowerShell)

powershell
$env:ANTHROPIC_AUTH_TOKEN = "ollama"
$env:ANTHROPIC_API_KEY = ""
$env:ANTHROPIC_BASE_URL = "http://localhost:11434"
claude --model glm-4.7-flash --allow-dangerously-skip-permissions

Step 5: Migrate Existing Anthropic SDK Code

If you already have applications using Anthropic's SDK, switching to Ollama only requires updating the base URL.

Python:

python
1import anthropic
2
3client = anthropic.Anthropic(
4    base_url='http://localhost:11434',
5    api_key='ollama',  # required field but ignored by Ollama
6)
7
8message = client.messages.create(
9    model='qwen2.5-coder:7b',
10    max_tokens=1024,
11    messages=[
12        {'role': 'user', 'content': 'Write a function to check if a number is prime'}
13    ]
14)
15print(message.content)

JavaScript / TypeScript:

javascript
1import Anthropic from '@anthropic-ai/sdk';
2
3const client = new Anthropic({
4  baseURL: 'http://localhost:11434',
5  apiKey: 'ollama',
6});
7
8const message = await client.messages.create({
9  model: 'qwen2.5-coder:7b',
10  max_tokens: 1024,
11  messages: [{ role: 'user', content: 'Write a function to check if a number is prime' }],
12});
13
14console.log(message.content);

Using Ollama Cloud Models (Optional)

Ollama has :cloud variants that run on Ollama's infrastructure but use the exact same commands as local models — no API keys to manage:

bash
ollama pull kimi-k2.5:cloud
ollama pull minimax-m2.1:cloud

All users get a free allocation of cloud model usage. A paid subscription is needed for heavy usage.


Recommended Models by Use Case

HardwareRecommended ModelNotes
16GB RAMqwen2.5-coder:7bDecent quality, runs on CPU
32GB RAMglm-4.7-flashGreat tool-use for agentic tasks
64GB+ / GPUgpt-oss:20bHigh-quality output
No local GPUkimi-k2.5:cloudCloud model, free tier available

Troubleshooting

unknown command "launch" for "ollama" Your Ollama version is outdated. Update Ollama and the error will go away.

Claude can't read / write files By default, almost all permissions are denied. Use /permissions inside Claude Code and add explicit allow rules for Bash and file access.

Out of memory errors If you see "model requires more system memory than is available," your machine doesn't have enough free RAM. You need at least 12GB free to run most coding models. Try a smaller model like qwen2.5-coder:7b.

Models give wrong edits or loop repeatedly This is a sign the model is too small for your use case or lacks sufficient context. Upgrade to a larger model or use a cloud variant.


Summary

Claude Code is flexible — you're not locked into Anthropic's API. If you need complete privacy, a local model on a capable machine works well for day-to-day development. If you prefer fast, high-quality output without the privacy constraint, Ollama's cloud models are a cost-effective alternative. The setup takes under 10 minutes, and the workflow is identical regardless of which backend you choose.

Written by

Kirtesh Admute

Full-stack engineer and digital architect — building scalable, production-grade systems with real-world impact.

March 24, 2026 8 min read

Newsletter

Stay in the
loop.

Weekly insights on system design and digital craft. 2,000+ developers subscribed.

No spam. Unsubscribe anytime.