Last week, I wanted a simple thing: to talk to my AI assistant AUGUST (built on OpenClaw) hands-free so I can communicate and get things done using my voice, while walking or doing something else. Not a voice note. Not push-to-talk. A real call. Full duplex, real-time, private. Turns out, building that is more interesting than I expected.

This post is the honest version of how it went: what we built, what broke, what it can and can’t do today, and a few things I learned about how others are approaching the same problem.


The Voice Landscape Right Now

Before getting into the build, it’s worth knowing what already exists for OpenClaw voice, because most people don’t start from scratch.

The most common setup is Talk Mode, a desktop app that uses your computer’s built-in speech-to-text and ElevenLabs for text-to-speech. It connects to your OpenClaw gateway and works well enough. Simple to set up, but fairly basic.

Beyond that, there’s an official telephony plugin for Twilio, Plivo, and Telnyx, so you can literally call your assistant from your phone. Some people go further and run wake-word activated agents on hardware devices like Raspberry Pis. Others use ElevenLabs or Deepgram as the voice layer and route everything through their OpenClaw gateway as a custom LLM backend.

Most of these setups work the same way: voice goes in, the agent responds and can act on things in real time. That works well. But what I wanted was a layer on top of that. Something that wraps up the conversation at the end and turns it into structured work without me having to do anything extra. The agent doesn’t just respond. It files a summary, creates tasks, and pings me a link.


Why I Didn’t Just Use Twilio

My first instinct was to go the phone route. Buy a Twilio number. Handle inbound calls. Done.

But the more I looked at it, the more hidden costs appeared. You’re paying for telephony on top of the AI voice pipeline. That’s two meters running at once. Add country-specific number rules, compliance edge-cases, and the fact that a phone call interface is not the same thing as a tool-using agent, and suddenly it felt like the wrong foundation for something I’d be using every day as a personal productivity tool.

WebRTC made more sense for my situation. I’m on Wi-Fi or mobile data when I’m working. I don’t need a carrier involved. And crucially, WebRTC is easier to lock down. No public number to spam, no incoming call surface to secure.

So Twilio moved to the “maybe later, as a fallback” pile.


What We Actually Built

The goal was a private calling page at call.augustwheel.com that I could open, press Start, talk through whatever is on my mind, and end the call knowing that something useful happened with the conversation.

The access gate

The page is not public. To get in, you request a magic link. Only one email address is on the allowlist. This sounds like overkill until you remember the two failure modes you’re trying to avoid: someone finds the URL and prompt-injects your agent, or someone finds the URL and runs up your usage costs. The gate doesn’t make you immune to everything, but it stops the obvious, expensive attacks.

Stack for this: Resend for sending the login links, a one-email allowlist, and a short-lived session cookie.

The voice session

Voice agent in session

Once you’re in, the browser starts a WebRTC session. The server mints an ephemeral realtime token so the browser never sees the long-lived API key. Audio goes out, audio comes back. The UX goal is simple: press Start, talk naturally, interrupt when you need to, press End.

The end-of-call pipeline

This is the part that makes it useful rather than just cool.

A voice conversation that disappears after you hang up is not a system. It’s a vibe. So on End Call, we run a recap and write it somewhere work actually lives: a Call Notes entry in Notion with a summary, key takeaways, and action items, and a Telegram message with the Notion link so I can keep moving without opening a browser.

One important detail: we don’t store a full transcript. Instead, we collect compact “crumbs” during the call, short snippets and tool results, then run one bounded recap step at the end. Predictable tokens, predictable cost. If you do this naively with a full transcript plus a live second model, you’ll pay twice for the same conversation.


What It Can Do Right Now

Current reality:

  • Full-duplex WebRTC call, real-time
  • Notion tasks created automatically when you end the call
  • Call Notes entry generated without storing a full transcript
  • Telegram message with the recap link
  • Two read-only “voice tools” during the call: “What are my top tasks?” and “What’s overdue?” Backed by real Notion data, not guesswork

What It Can’t Do Yet (And Why That’s Fine)

The obvious next question is: can I say “vibe code me an app” mid-call and have it open a PR?

Not safely. Not by default.

A voice interface is the worst possible place to give unbounded system access. You’re one misunderstood phrase away from a very expensive mistake. The right design puts an approval gate between the voice conversation and any execution. You talk. The agent creates a task with a plan. You approve in Telegram. Only then does anything run.

That’s slower than “fully autonomous.” It’s also the difference between a tool and a liability.


The Messy Middle

None of this was a straight line.

We hit reverse proxy issues because ports 80 and 443 were already owned by a Docker nginx container. We hit click-tracking problems where email links got rewritten and broke in certain browsers. We hit silent frontend failures where a single escaping bug broke inline JavaScript with no visible error. We hit classic scope bugs that only became obvious once we forced a debug channel into the UI.

The takeaway isn’t “we’re bad at code.” It’s that agent systems fail in boring ways, and you need fast diagnostics. If you’re building anything like this: add a build stamp and an always-on debug panel from day one. You’ll thank yourself.


What’s Next

The immediate next step is more read-only voice tools during calls. Right now August can tell me my top tasks and what’s overdue. I want to add project status, task search, and “what did I work on yesterday” — the kind of context that makes a call feel like an actual briefing rather than just talking out loud.

After that, approval-gated coding execution. Say “build me a landing page” mid-call, the agent drafts a plan, I approve it via Telegram, it builds and deploys to a staging URL, and nothing goes live until I’ve reviewed it. Human in the loop at every step.

But the thing I’m most excited about is less technical. It’s the shift from typing to just talking. Walking to make a coffee, a quick “add a task to follow up with Jonah.” Commuting, thinking out loud about a content idea and having it captured. The OpenClaw community is already living this — people running voice-to-journal workflows from their commute, getting calendar briefs and task creation done hands-free, rebuilding entire websites from their phones without opening a laptop. The interface stops being something you sit down to use and starts being something that’s just always there.

That’s what I’m building toward.


The Bottom Line

A voice agent is not useful because it talks. It’s useful because it closes loops.

Most voice setups stop at the input layer. They let you speak instead of type. What I wanted was a system where ending a call meant something was already in motion: tasks created, a summary written, a message sent.

If you’re building something similar, start with private access, realtime voice, and an end-of-call pipeline that turns speech into work. Then add autonomy slowly, behind explicit approvals.

If you want more build logs like this, including the templates and guardrails behind what I’m shipping, subscribe to the August Wheel newsletter.


Discover more from August Wheel

Subscribe to get the latest posts sent to your email.

One response to “I Built a Private Ai I Can Actually Call on OpenClaw (Here’s What I Learned)”

  1. […] I Built a Private Ai I Can Actually Call on OpenClaw (Here’s What I Learned) […]

Leave a Reply

Trending

Discover more from August Wheel

Subscribe now to keep reading and get access to the full archive.

Continue reading