Self-Hosted WhatsApp AI Agent with Python
AI agents have been everywhere lately. OpenClaw, n8n AI, every SaaS tool adding an "agent" mode. I wanted to actually understand how this stuff works under the hood, so I built one myself instead of just signing up for something.
The result is Ops Agent: a self-hosted Python service that fires at 07:00, pulls data from six connectors, asks GPT-4o-mini to compose a concise Dutch brief, and sends the result to my WhatsApp. Between briefs it stays live as a conversational assistant: send agenda to get today's meetings, inbox for unread mail, taak: bel accountant to add a task, or just ask a free-form question in Dutch and get an answer backed by live data.
This post covers what I learned building it: the connector abstraction, the conversational command router, the policy engine, the LLM prompt design, and why rolling your own beats the managed alternatives.
Why Not OpenClaw?
The direct alternative I considered was OpenClaw, which sits in the AI-native personal agent space and covers a similar use case. I tried it and the main issue was speed: every interaction felt sluggish, with noticeable latency between sending a message and getting a response. For a morning brief that fires once a day that is tolerable, but for an assistant I want to query throughout the day it becomes friction fast enough that I stop using it.
The deeper issue is that managed platforms, OpenClaw included, add layers you cannot control or remove. Your request goes to their infrastructure, through their orchestration layer, out to the third-party API, back through their infrastructure, and finally to you. Each hop adds latency and a potential failure point. Building directly means the round trip is: WhatsApp webhook, local connector call, GPT-4o-mini, WhatsApp send. That is it.
There is also the data question. OpenClaw processes your Gmail tokens and calendar events on their servers. For a personal ops tool that sees everything about your day, that is not a trade-off I wanted to make.
n8n and Make have the same data concern, and their visual workflow model is a poor fit for a conversational agent anyway. You end up building a fragile decision tree in a GUI instead of a clean command router in code, and the LLM is just one node in a graph rather than the intelligent fallback it should be.
The goal was lean and fast: a single Python package, a local SQLite file, two systemd services, and response times measured in milliseconds not seconds.
Requirements: What a Personal AI Agent Actually Needs
The requirements were deliberately narrow:
- One WhatsApp message per day, under 300 words
- On-demand queries and tasks via inbound WhatsApp messages
- Free-form AI conversation for anything not matched by a command
- No cloud service that sees my calendar or inbox
- Any destructive action must go through an approval step
- If the OpenAI call fails, I still get a message
That last point shaped the whole architecture. The LLM is an enhancement, not a dependency.
Connector Abstraction
Every data source implements a BaseConnector with two methods: healthcheck() and a domain-specific summarize_*() method. The health endpoint at GET /health calls all connectors and returns their status, which makes debugging a misconfigured token much easier than grepping logs.
class BaseConnector:
name: str
def healthcheck(self) -> ConnectorHealth:
raise NotImplementedErrorThe concrete connectors are: GmailConnector, GoogleCalendarConnector, GitHubConnector, WhatsAppMetaConnector, ShellConnector, and WeatherConnector (Open-Meteo, no API key required). Each one is responsible for its own error handling and returns a plain string summary that the briefing module can use directly.
The Conversational Command Router
The most interesting part of the system is conversation.py, which handles every inbound WhatsApp message. It works as a priority-ordered command router: structured commands are matched first against a lookup table, and anything that does not match falls through to a free-form GPT-4o-mini call.
def handle_message(text: str) -> str:
cmd = text.strip().lower()
if cmd in ("brief", "ochtend", "goedemorgen"):
return build_morning_brief()
if cmd in ("weer", "weather"):
return WeatherConnector().summarize_day()
if cmd in ("agenda", "calendar"):
return GoogleCalendarConnector().summarize_today()
if cmd in ("inbox", "mail", "email"):
return GmailConnector().summarize_inbox()
if cmd.startswith("taak:"):
return add_task(text[5:].strip())
# ...more commands...
return _chat(text) # free-form fallbackThe full command set covers the daily workflow: brief for the full morning overview, weer for weather, agenda for today's meetings, inbox for unread email, followups for sent threads with no reply in three or more days, taken to list open tasks, taak: <title> to add one, and klaar <nr> to mark one done.
Anything that does not match a command hits _chat(), which sends the message to GPT-4o-mini with a system prompt that instructs it to stay concise, use plain text, answer in Dutch, and keep responses under 200 words. In practice this handles questions like "what should I focus on today?" or "summarize what followups I have" naturally, without needing to define every possible query as a structured command.
The separation between commands and free-form chat is important. Commands are deterministic and fast: no LLM call, no latency, no token cost. The AI path is reserved for queries that genuinely need it.
The Policy Engine
Before Ops Agent can send anything or run a shell command, the action passes through the policy engine. There are four operating modes:
| Mode | Behaviour |
|---|---|
observe | Read connectors, build context, no sends |
draft | Build brief, print to CLI, no sends |
act_with_approval | Sends and writes create an approval record first |
trusted_auto | 07:00 brief and WhatsApp replies to owner are auto-approved |
The shell connector has its own two-tier policy on top of this: an explicit allowlist of safe read-only commands (git status, pytest, ruff check, etc.) and a blocklist of destructive ones (rm -rf, sudo, dd, shutdown). Everything else requires an approval record.
The FastAPI layer exposes the approval queue so I can review and act on pending items from my phone:
GET /approvals list pending approvals
GET /approvals/{id} get one approval
POST /approvals/{id}/approve
POST /approvals/{id}/deny
This pattern, read freely, write only with approval, is the right default for any agent that touches real data. It means I can expand what the agent can do without worrying about an LLM hallucination triggering a destructive action.
LLM Composition with a Fallback
The compose_morning_brief function takes a MorningBriefContext dataclass and returns a string. If OPENAI_API_KEY is not set, or if the API call fails for any reason, it falls back to render_morning_brief, a plain-text template renderer that always works:
def compose_morning_brief(context: MorningBriefContext) -> str:
if not settings.openai_api_key:
return render_morning_brief(context)
try:
resp = httpx.post(
"https://api.openai.com/v1/chat/completions",
json={
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
"max_tokens": 500,
"temperature": 0.3,
},
timeout=30,
)
resp.raise_for_status()
return resp.json()["choices"][0]["message"]["content"].strip()
except Exception:
return render_morning_brief(context)The system prompt is written in Dutch and has three hard constraints: plain text with bullet points (no markdown, no bold), total message under 300 words, and end with exactly three priority actions for the day. GPT-4o-mini at temperature 0.3 is reliable enough that the output format rarely drifts.
Using gpt-4o-mini instead of gpt-4o is deliberate throughout the system. Morning brief composition is formatting and prioritization, not reasoning. Conversational replies are short and context-light. Mini is faster, cheaper, and more than capable for both tasks.
WhatsApp Delivery via Meta Cloud API
WhatsApp delivery goes through the Meta Cloud API directly, not through a third-party service. The setup requires a Meta for Developers account, a WhatsApp Business phone number, and a system user token. Once configured, sending a message is a single POST:
httpx.post(
f"https://graph.facebook.com/v20.0/{phone_number_id}/messages",
headers={"Authorization": f"Bearer {token}"},
json={
"messaging_product": "whatsapp",
"to": owner_number,
"type": "text",
"text": {"body": brief_text},
},
)Inbound messages arrive at POST /webhooks/meta/whatsapp. The webhook handler extracts the message text, verifies the sender is the owner number, and dispatches the reply in a background task so the 200 response goes back to Meta immediately. The handle_message function runs in a thread pool to keep the async event loop unblocked.
Scheduling and Deployment
The scheduler uses APScheduler with a cron trigger for 07:00 in the configured timezone. On the VPS, two systemd services handle this: one for the FastAPI server (uvicorn) and one for the scheduler process. Both restart on failure with a short RestartSec.
[Service]
ExecStart=/home/agent/ops-agent/.venv/bin/uvicorn app.main:app --host 127.0.0.1 --port 8000
Restart=always
RestartSec=5nginx sits in front as a reverse proxy, and certbot handles HTTPS with auto-renewal. The Google OAuth flow (GET /oauth/google/start?connector=both) stores tokens in the local SQLite database via SQLModel, so they survive restarts.
The entire application ships as a standard Python package. pip install -e . makes agent available as a CLI command via the [project.scripts] entry in pyproject.toml.
What's Coming Next
The core loop works well, but there is a clear list of things I want to add.
Async connectors. Connector calls are currently synchronous and sequential. Switching to httpx.AsyncClient and gathering them concurrently would meaningfully cut brief build time, especially on slow Gmail or Calendar responses.
Conversation memory. Each message is stateless right now. A short rolling window of recent messages passed into the GPT-4o-mini context would let follow-up questions work naturally without repeating yourself.
Email drafting. The most obvious write-back feature: send draft: reply to Jan about the invoice and the agent pulls the relevant thread from Gmail, drafts a reply, and sends it back for review before anything is sent. The approval queue is already there for exactly this kind of action.
Structured LLM output. The brief is free-form text today. Moving to a structured JSON response with named fields would make WhatsApp formatting more reliable and eliminate the occasional rogue markdown.
Calendar event creation. Along the same lines as email drafting: plan: meeting with Sara vrijdag 14:00 creates a draft event and asks for confirmation before writing to Calendar.
The Result: A Lean AI Agent That Stays Out of the Way
The agent runs on a €5/month VPS, costs roughly €0.001 per day in OpenAI tokens, and has replaced five morning browser tabs with one WhatsApp message. Between briefs, it handles on-demand queries and free-form questions without switching to a different interface. The approval queue gives me confidence to keep expanding it without anything going wrong behind the scenes.
Honestly, it was just a lot of fun to build. Wiring up connectors, designing the command router, figuring out where the LLM actually adds value versus where a simple string match is better: that hands-on process taught me more about how AI agents actually work than any amount of clicking through someone else's UI would have. If you want to understand this space, building something small yourself beats signing up for a managed platform every time.
If you are interested in building AI-powered tools like this, check out my work for more projects built with this kind of architecture, or see what I offer if you want something similar built for your workflow.