Developer Tools

Claude Code + Ollama: Run AI Coding Locally in 2026

Kirtesh Admute

March 24, 2026

8 min read

Claude Code + Ollama: Run AI Coding Locally in 2026

What Is This and Why Does It Matter?

Ollama recently added support for the Anthropic Messages API, which means Claude Code can now interact with any Ollama-hosted model. This opens up faster, more flexible coding workflows without being tied to the cloud — you can run models locally on your machine, connect to cloud models hosted by Ollama, and use Claude Code features like multi-turn conversations, tool calling, and vision inputs.

Privacy first: Your prompts and code never leave localhost — useful for working with sensitive data like legal records, medical information, or private customer details.

Prerequisites

Before you begin, make sure you have:

Requirement	Details
RAM	16GB minimum, 32GB+ recommended
Storage	Several GB free (models are large)
Node.js	Latest LTS version
GPU (optional)	NVIDIA or Apple Silicon for faster inference

Apple Silicon note: At 16GB you can run smaller models, but expect rough edges — more wrong edits, more retries, slower throughput. 32GB is the sweet spot.

Step 1: Install Ollama

Linux:

bash

curl -fsSL https://ollama.com/install.sh | sh

macOS / Windows: Download the installer from ollama.com and follow the on-screen instructions.

Verify Ollama is running:

bash

ollama -v

If you get an error, the service may not be running yet. Start it manually:

bash

ollama serve

Version requirement: You'll want Ollama v0.14+ for full tool-use support. Streaming tool calls are required for Claude Code's agentic loop.

Step 2: Pull a Coding Model

Choose a model based on your hardware:

bash

1# Lightweight / mid-range (16–32GB RAM)
2ollama pull qwen2.5-coder:7b
3
4# High performance local model (recommended for agentic use)
5ollama pull glm-4.7-flash
6
7# Large model for powerful hardware (64GB+ / GPU)
8ollama pull gpt-oss:20b
9
10# Cloud model — no GPU needed, free tier available
11ollama pull kimi-k2.5:cloud

Context window tip: Claude Code requires a large context window — at least 64k tokens is recommended. Verify your chosen model supports this.

Test your model:

bash

ollama run glm-4.7-flash

Step 3: Install Claude Code

bash

# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash

# Windows CMD
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

Step 4: Connect Claude Code to Ollama

You have two options — pick whichever fits your workflow.

Option A: Using `ollama launch` (Simplest)

bash

# Interactive model picker
ollama launch claude

# Or specify the model directly
ollama launch claude --model glm-4.7-flash
ollama launch claude --model qwen2.5
ollama launch claude --model kimi-k2.5:cloud

Option B: Manual Environment Variables

Add these to your ~/.bashrc or ~/.zshrc for a persistent setup:

bash

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"

Then launch Claude Code pointing to your model:

bash

claude --model glm-4.7-flash

On Windows (PowerShell)

powershell

$env:ANTHROPIC_AUTH_TOKEN = "ollama"
$env:ANTHROPIC_API_KEY = ""
$env:ANTHROPIC_BASE_URL = "http://localhost:11434"
claude --model glm-4.7-flash --allow-dangerously-skip-permissions

Step 5: Migrate Existing Anthropic SDK Code

If you already have applications using Anthropic's SDK, switching to Ollama only requires updating the base URL.

Python:

python

1import anthropic
2
3client = anthropic.Anthropic(
4    base_url='http://localhost:11434',
5    api_key='ollama',  # required field but ignored by Ollama
6)
7
8message = client.messages.create(
9    model='qwen2.5-coder:7b',
10    max_tokens=1024,
11    messages=[
12        {'role': 'user', 'content': 'Write a function to check if a number is prime'}
13    ]
14)
15print(message.content)

JavaScript / TypeScript:

javascript

1import Anthropic from '@anthropic-ai/sdk';
2
3const client = new Anthropic({
4  baseURL: 'http://localhost:11434',
5  apiKey: 'ollama',
6});
7
8const message = await client.messages.create({
9  model: 'qwen2.5-coder:7b',
10  max_tokens: 1024,
11  messages: [{ role: 'user', content: 'Write a function to check if a number is prime' }],
12});
13
14console.log(message.content);

Using Ollama Cloud Models (Optional)

Ollama has :cloud variants that run on Ollama's infrastructure but use the exact same commands as local models — no API keys to manage:

bash

ollama pull kimi-k2.5:cloud
ollama pull minimax-m2.1:cloud

All users get a free allocation of cloud model usage. A paid subscription is needed for heavy usage.

Recommended Models by Use Case

Hardware	Recommended Model	Notes
16GB RAM	`qwen2.5-coder:7b`	Decent quality, runs on CPU
32GB RAM	`glm-4.7-flash`	Great tool-use for agentic tasks
64GB+ / GPU	`gpt-oss:20b`	High-quality output
No local GPU	`kimi-k2.5:cloud`	Cloud model, free tier available

Troubleshooting

unknown command "launch" for "ollama" Your Ollama version is outdated. Update Ollama and the error will go away.

Claude can't read / write files By default, almost all permissions are denied. Use /permissions inside Claude Code and add explicit allow rules for Bash and file access.

Out of memory errors If you see "model requires more system memory than is available," your machine doesn't have enough free RAM. You need at least 12GB free to run most coding models. Try a smaller model like qwen2.5-coder:7b.

Models give wrong edits or loop repeatedly This is a sign the model is too small for your use case or lacks sufficient context. Upgrade to a larger model or use a cloud variant.

Summary

Claude Code is flexible — you're not locked into Anthropic's API. If you need complete privacy, a local model on a capable machine works well for day-to-day development. If you prefer fast, high-quality output without the privacy constraint, Ollama's cloud models are a cost-effective alternative. The setup takes under 10 minutes, and the workflow is identical regardless of which backend you choose.

Written by

Kirtesh Admute

Full-stack engineer and digital architect — building scalable, production-grade systems with real-world impact.

March 24, 2026 8 min read

Claude Code + Ollama: Run AI Coding Locally in 2026

What Is This and Why Does It Matter?

Prerequisites

Step 1: Install Ollama

Step 2: Pull a Coding Model

Step 3: Install Claude Code

Step 4: Connect Claude Code to Ollama

Option A: Using `ollama launch` (Simplest)

Option B: Manual Environment Variables

On Windows (PowerShell)

Step 5: Migrate Existing Anthropic SDK Code

Using Ollama Cloud Models (Optional)

Recommended Models by Use Case

Troubleshooting

Summary

Kirtesh Admute

CI/CD Pipelines: The Foundation of Modern Software Delivery

Database Sharding & Partitioning: Scaling Beyond a Single Database

Microservices vs Monolith: Making the Right Architecture Choice in 2026

Stay in the
loop.

What Is This and Why Does It Matter?

Prerequisites

Step 1: Install Ollama

Step 2: Pull a Coding Model

Step 3: Install Claude Code

Step 4: Connect Claude Code to Ollama

Option A: Using ollama launch (Simplest)

Option B: Manual Environment Variables

On Windows (PowerShell)

Step 5: Migrate Existing Anthropic SDK Code

Using Ollama Cloud Models (Optional)

Recommended Models by Use Case

Troubleshooting

Summary

Kirtesh Admute

CI/CD Pipelines: The Foundation of Modern Software Delivery

Database Sharding & Partitioning: Scaling Beyond a Single Database

Microservices vs Monolith: Making the Right Architecture Choice in 2026

Stay in theloop.

Option A: Using `ollama launch` (Simplest)

Stay in the
loop.