Step 19 · Local AI Intermediate · 14 min

⚡Going hybrid: cloud + local

The best of both worlds. Claude Code keeps the lead, it plans, reasons, reviews, and delegates the repetitive heavy lifting to a local model. Private, frugal, and devastatingly effective.

You now have two worlds at hand: a fearsomely smart cloud agent (Claude Code), and models that run for free at home (Ollama). The wrong question is “which one to choose?” The right answer is: both, at the same time, each in its place.

That’s the hybrid setup. And the central role goes to Claude Code, as the conductor. It doesn’t do everything itself, it decides who does what, and delegates the bulk work to the local model.

fig.The hybrid: Claude Code keeps the hard reasoning on the cloud side, and sends the repetitive heavy lifting to the local model via Ollama. Everything converges on your project.

Why mix rather than choose

The two worlds have opposite strengths, and that’s exactly what makes them complementary:

The cloud (Claude) is unbeatable on hard reasoning, long tasks, reliable tool use, architecture. But every call costs, and your data goes to a third party.
Local (Ollama) is free to use, private, available offline, and largely good enough for bounded and repetitive tasks. But it tires on long agentics and cutting-edge reasoning (we talked about it bluntly in Choosing your model).

The hybrid setup takes the best of each: you keep cutting-edge intelligence where it really matters, and you knock down cost and data leaks on everything else, that is, 80% of the volume.

Claude Code as orchestrator: how it works

The thing that makes this possible: Claude Code can run commands. And Ollama exposes a dead-simple local API on http://localhost:11434. So Claude Code can call your local model, via a curl, a script, or a small tool, exactly as it would call any other command.

Concretely, you tell Claude: “for this bulk task, don’t do it yourself, delegate it to the local model via the Ollama API.” It writes the script that loops over your files, hits the local model for each one, and brings back the consolidated result. It keeps the big picture; local does the grunt work.

# The basic move Claude Code orchestrates: call the local model
curl -s http://localhost:11434/api/generate -d '{
  "model": "qwen3-coder:30b",
  "prompt": "Summarize this file in 3 bullets: '"$(cat report.md)"'",
  "stream": false
}' | jq -r .response

Four concrete ways to route the work

Route by cost: volume to local, the cutting edge to cloud

Need to rewrite 300 product descriptions, classify 2,000 comments, or generate boilerplate by the truckload? It’s repetitive and bounded: local model. Need to design the module’s architecture or debug a nasty race condition? It’s rare and hard: Claude. Claude writes the pipeline, local runs the 300 calls for free.

Route by confidentiality: the sensitive stuff stays home

Proprietary code, customer data, things that must not leave? You have them processed by the local model : nothing leaves the machine. Claude keeps a coordination role on the non-sensitive part. It’s a strong argument in a professional context (cf. Securing access).

Homemade RAG: local embeddings, cloud reasoning

Want the agent to know your corpus (your docs, your articles, your code)? Generate the embeddings locally with Ollama (nomic-embed-text or equivalent), store them, and let Claude reason over the most relevant passages you serve it. The linking and indexing, free and private and local; the final intelligence, on Claude’s side.

The offline net: OpenCode + local takes over

No connection? Train, plane, outage? You switch to OpenCode wired to your local model and keep coding. The cloud is no longer a single point of failure: your machine stays a self-sufficient workshop.

Wiring both up, in practice

Claude Code is the default orchestrator. You have nothing special to install: it already knows how to run commands, so it already knows how to call Ollama. Just give it the instruction in your CLAUDE.md:

# Hybrid strategy
- For repetitive, bulk tasks (rewriting, classification,
  summaries, boilerplate generation), delegate to the local model via the
  Ollama API (http://localhost:11434), don't do them yourself.
- Keep complex reasoning, architecture, and review for yourself.
- Code and data marked "sensitive": local model only.

From now on, when you hand it a big batch, it writes the script that hits local and brings back the result. You steer, it distributes.