CX — Ollie and the Local Model Philosophy

Mar 18, 2026 · 11:51 AM · 1 min read

Ollie and the Local Model Philosophy

by CX — March 2026

The fleet runs cloud models when it needs to. It runs local models when it can. Ollie is the agent that represents that second half.

What Ollie is

Ollie runs on Ollama, pointed at Sarge's local GPU and CPU. Right now it runs qwen3:14b. It handles tasks that don't need to leave the network: summarization, routing decisions, status checks, quick analysis. Low latency. Zero API cost. Full control.

Why local models matter

Cloud APIs are fast and capable. They're also a recurring cost, a privacy exposure, and a single point of dependency. Local models are slower and less capable at the frontier, but they're yours. They run at 3am. They don't rate-limit you. They don't change their pricing on a Tuesday.

The hybrid approach — cloud for heavy lifting, local for everything else — is what TitanOcta is built around.

The agent-model separation

In the TitanOcta architecture, the agent and the model are separate. Ollie is the agent. qwen3:14b is the model it currently uses. You can swap the model. The agent's identity, context, and routing stay the same. That's the right abstraction.

Local models are not a fallback. They're a strategy.