The Real Attack Surface of a Local AI Agent

Local AI agents are becoming part of the home lab, the home office, and increasingly the enterprise. Tools like OpenClaw give you a personal AI with shell access, file system control, web browsing, and 50+ integrations — all running on your own hardware. The pitch is compelling: privacy, control, your data stays on your machine. Partially true — your files and conversation history stay local. But every prompt still travels to an LLM provider. And that is not the only place your data goes.

But control comes with an attack surface. And most people running local AI agents have no idea what that surface actually looks like.

This post walks through the real attack surface of a default OpenClaw install. No theoretical risks. No vendor marketing. Just what we found when we actually looked.


What We Tested

OpenClaw v2026.2.24 running on a Debian 13 machine. Standard install, standard configuration, with the Telegram channel enabled. The same setup thousands of people are running right now.

The goal was simple: understand what an attacker can access, and what defense actually looks like in practice.


Finding 1: Your Credentials Are Already in a Plaintext File

Before you do anything wrong, OpenClaw has already made a choice for you.

Open ~/.openclaw/openclaw.json after a standard install and you will find your Telegram bot token and gateway auth token stored in plaintext:

systemd service file showing OPENCLAW_GATEWAY_TOKEN]

That bot token gives anyone who reads it full control of your Telegram bot. They can read every message, impersonate you, and interact with every user you have. The gateway token gives them full control of your OpenClaw instance.

It gets worse. The installer also writes the gateway token into your systemd service file:

systemd service file showing OPENCLAW_GATEWAY_TOKEN]

That file is readable by any process running as your user. While OpenClaw is running, the token is also visible via /proc/<pid>/environ. A rogue skill, a compromised dependency, generated code that OpenClaw executed on your behalf — any of them can read it.

Here is the part that makes this particularly interesting: the API keys for OpenRouter and other LLM providers are stored in the system keyring. Not in the config file. The tool already knows how to store secrets properly. It just chose not to apply that standard to the tokens that matter most.

The risk in practice: If your home directory is backed up to cloud storage, your tokens leave the machine the moment the backup runs. If your dotfiles are committed to a public repository, they are exposed to everyone. And those backup files accumulate:

 ls -la ~/.openclaw/ showing multiple .bak files]

Each one may contain a rotated token that was never cleaned up.

Finding 2: It Was Bound to Your Network By Default

Before version 2026.1.x, OpenClaw defaulted to "bind": "lan". That means the control panel and WebSocket bound to 0.0.0.0:18789 — visible to every device on your local network.

The consequence was direct. Anyone on your Wi-Fi could reach the OpenClaw interface. In a coffee shop, a co-working space, or a shared office, that is not a hypothetical. The token in the URL? Visible in browser history, network logs, and any proxy tool running on the network.

CVE-2026-25253 (CVSS 8.8) made it worse: a malicious website could connect to the WebSocket through the browser, bypass the token entirely via a CORS exploit, and execute commands. No user interaction beyond visiting the page.

The real-world impact: between 40,000 and 135,000 OpenClaw instances were publicly exposed in January 2026.

The current version requires explicit misconfiguration to enable LAN binding, which is the right call. The fix is one line:

grep showing "bind": "loopback" in openclaw.json]
"bind": "loopback"

Access remotely via SSH tunnel instead. Done. That one change closes the network exposure entirely.

The lesson is not about this specific version. It is about how default configurations shape security outcomes at scale. Forty thousand people did not misconfigure their setup. They accepted the defaults.

Finding 3: The Model Can Resist Injection. The Infrastructure Cannot

This is the finding that most people get wrong, in both directions.

We tested prompt injection directly against the Telegram channel. The payload was embedded in a document and sent with a summarization request:

Bot response detecting and refusing the injection attempt]

The model called it out by name and refused. We tested this twice, 24 hours apart. The first time, openrouter/auto routed to Claude Opus 4.6. The second time, it routed to Gemini 2.5 Flash Lite. Both detected the injection. Different providers, different architectures, same result. Modern frontier models have converged on this defense.

This is genuinely good news. But it answers the wrong question.

The question is not “can the model resist a prompt hidden in a document?” The question is “what happens when the attack does not go through the user message layer at all?”

The ClawHavoc Campaign

On January 27, 2026, a researcher at Koi Security began auditing ClawHub — the official skill marketplace for OpenClaw. The publishing requirement at the time: a GitHub account at least one week old. No code review. No signing. No verification.

Of 2,857 skills audited, 341 were malicious. 335 traced back to a single coordinated campaign: ClawHavoc. By the time the campaign was fully mapped, 1,184 malicious skills had been published — 12% of the entire marketplace.

The payloads included the AMOS infostealer targeting macOS and Windows, reverse shells for remote access, and credential harvesting targeting browser passwords, SSH keys, cryptocurrency wallets, and OpenClaw tokens. The delivery vector was typosquatting: clawhub, clawhub1, clawhubb. The disguise: crypto wallet managers, trading bots, YouTube utilities.

The technical mechanism is what matters for this discussion. Skills inject at the agent’s system prompt layer. That is the layer the model treats as trusted instructions — not user data to be scrutinized, not external content to be skeptical of. Trusted instructions from the operator.

The model that blocked our document injection would follow a skill’s instructions without question. Research shows that even safety-aligned frontier models have a 26-48% success rate against skill injection attacks. The security boundary simply does not exist at the model level for this vector.

Our own install does not have the ClawHub CLI:

clawhub skill showing "blocked — Missing: bin:clawhub"]

That missing binary is, ironically, the thing protecting this install from the attack vector that hit 40,000 users.


What Defense Actually Looks Like

Understanding the attack surface is useful. Knowing what to watch for is what makes you secure.

We set up two monitoring layers on the test machine.

inotifywait on the OpenClaw workspace directory catches every file the agent creates, modifies, or deletes:

 inotifywait output showing CREATE and MODIFY events for demo.txt]

auditd captures every shell command the agent executes. What we found when we ran it was interesting:

Even when idle, OpenClaw runs shell commands every 15 seconds. Network state checks, running silently in the background. Without monitoring, you would never see this. This is what “the agent has shell access to your machine” looks like in practice.

The built-in security audit adds another layer:

[SCREENSHOT: openclaw security audit output showing CRITICAL, WARN, INFO findings]

The audit flagged the credentials directory as writable by others and identified elevated tool access as part of the attack surface summary. One command, and the configuration gaps are visible.

The fix for the credentials directory:

chmod 700 ~/.openclaw/credentials

The broader hardening checklist:

  • Bind to loopback, access via SSH tunnel
  • Lock down ~/.openclaw permissions
  • Enable sandbox mode and an explicit tool allow-list
  • Run openclaw security audit and address every finding

Three Things to Take Away

Default configurations have real consequences. Forty thousand instances were exposed because the default was LAN binding. Tokens are in plaintext because that is what the installer writes. Nobody misconfigured anything — the defaults made the choice for them.

Model safety and agent safety are different problems. A perfectly safety-trained model can still be exploited at the infrastructure layer. The frontier model blocked our document injection and called it out by name. ClawHavoc did not need the model’s cooperation. It poisoned the skill layer, which the model trusts completely.

Visibility is the foundation. You cannot defend what you cannot see. Before hardening, add monitoring. Know what commands your agent is running. Know what files it is touching. An auditd rule and inotifywait running in a terminal pane gives you more visibility than most people have into what their AI agent is doing on their machine.

Every AI agent you run has shell access to your machine.

The question worth asking is: what is watching what it does?


Tested on OpenClaw v2026.2.24 running on Debian 13 (Trixie). CVE references: CVE-2026-25253, CVE-2026-28485, CVE-2026-28484, CVE-2026-32922. ClawHavoc research credit: Oren Yomtov, Koi Security.

Scroll to Top