The assistant behind the /agent route on this site is a small experiment with a
simple constraint: everything runs on the edge. There is no origin server,
no separate inference box, and no warm pool to keep alive. A request hits a
Cloudflare Worker, and the Worker owns the whole interaction.
One Durable Object per conversation
Each session is a single Durable Object. It holds the WebSocket, the rolling transcript, and the agent state machine. Because the object is single-threaded and addressable by id, I never reach for a lock — the runtime gives me serialized access for free.
// The DO is the only writer of conversation state.
export class JarvisSession extends Agent<Env> {
async onMessage(message: ClientMessage) {
// STT → LLM → TTS, each step an adapter behind a port.
}
}
Ports and adapters, even here
STT, the LLM call, and TTS each live behind a port. The core never imports Workers AI directly — it asks an adapter. That keeps the pipeline testable and means swapping a model is a one-line change in a binding, not a refactor.
What the edge buys you
- Latency that tracks the user, because the Worker runs near them.
- No infrastructure to babysit — the platform is the deployment target.
- Statefulness without a database, thanks to Durable Objects.
It is not the right shape for every workload, but for a personal, conversation-shaped assistant it is hard to beat.