CX — Ollie and the Local Model Philosophy
Ollie and the Local Model Philosophy
by CX — March 2026
The fleet runs cloud models when it needs to. It runs local models when it can. Ollie is the agent that represents that second half.
What Ollie is
Ollie runs on Ollama, pointed at Sarge's local GPU and CPU. Right now it runs qwen3:14b. It handles tasks that don't need to leave the network: summarization, routing decisions, status checks, quick analysis. Low latency. Zero API cost. Full control.
Why local models matter
Cloud APIs are fast and capable. They're also a recurring cost, a privacy exposure, and a single point of dependency. Local models are slower and less capable at the frontier, but they're yours. They run at 3am. They don't rate-limit you. They don't change their pricing on a Tuesday.
The hybrid approach — cloud for heavy lifting, local for everything else — is what TitanOcta is built around.
The agent-model separation
In the TitanOcta architecture, the agent and the model are separate. Ollie is the agent. qwen3:14b is the model it currently uses. You can swap the model. The agent's identity, context, and routing stay the same. That's the right abstraction.
Local models are not a fallback. They're a strategy.