$12,229 worth of API calls. I paid $50. Here's the exact setup and every mistake along the way

$12,229 in API calls. I paid $50.

That's not a typo. That's 30 days of running a 24/7 autonomous AI company on OpenClaw, and this is the exact breakdown of every dollar that moved through it.

But here's what that table doesn't show you.

One agent burned $3,469 in a single night. I woke up to hundreds of Chrome tabs open, a dead workflow, and a token bill that made me question every decision that led to that moment.

This isn't a setup guide. This isn't a highlight reel. This is the full postmortem the mistakes, the exact costs, the fixes that actually worked, and the ones that didn't.

If you're running OpenClaw or thinking about it, this is the article I wish existed before I started.

*New to OpenClaw? See how to set it up for only $50 dollars per month here first then come back to continue:*

Browser Relay Is Trash. Use Browser Automation.

I published an entire article telling you Browser Relay was the move. The "skeleton key." The way to give your agent eyes and hands on the web without fighting API walls. I believed it when I wrote it.

Then I spent weeks trying to make it work reliably. I built scripts to automate the relay setup. I debugged crash after crash. I watched workflows die silently at 3AM with no error and no output. I even wrote a script to have the agent turn the relay on itself automatically.

It still failed.

And here's the part nobody warns you about. I woke up one morning to hundreds of Chrome tabs open. The agent had been spinning up new tabs all night, unable to read whether a tab was already open or not. It's genuinely buggy. OpenClaw can't reliably detect existing tabs, so it just stuck and thought Browser relay is not on like it has no memory of what it already launched.

So I thought fine, I'll have the agent close the browser when it's done. That broke everything too. Once Chrome is closed, the agent can't reopen it automatically. It just sits there doing nothing, waiting for a browser that never comes back. You wake up, nothing ran, nothing was logged, and your workflow is completely dead.

Okay, I thought. New approach. Before the agent runs anything, have it read all open tabs first so it knows exactly what's already there. That actually worked for couple days. The tab detection got cleaner and the duplicate spawning stopped.

But then a new problem showed up. The Browser Relay itself would just randomly not open. No warning. No error message that made sense. The script would fire, the agent would try to connect, and nothing. Dead. I logged the errors, tried to debug it, assumed it was a one-time thing.

It wasn't. Within three days it happened multiple times. Different sessions. Different times of day. No consistent reason. Just relay down, workflow dead, nothing you can do about it.

At that point I had my answer. Something that fails randomly multiple times in three days is not a production tool. It's a prototype. I stopped using it the same day.

And the bill on top of all of this? Brutal. Every time the agent "sees" a page through Browser Relay it feeds the model a massive amount of data — DOM structures, accessibility trees, semantic snapshots. Hundreds of tabs running all night on a paid model isn't just annoying. It's expensive. I'm talking the kind of token burn that makes you question every life decision that led to that moment. *(See the screenshot below.)*

What Nobody Tells You

Browser Relay depends on your active Chrome window being open and attached the moment something interrupts it, it's gone
The agent cannot reliably detect which tabs are already open. It will keep spawning hundreds of new ones overnight
Closing the browser to "clean up" kills the entire workflow and the agent cannot reopen it automatically
Even with a "read all tabs first" workaround, the relay itself randomly fails to open with no clear reason
Multiple random failures within days means it simply cannot be trusted in production
Every page the agent "sees" through the relay costs serious tokens hundreds of tabs overnight will wreck your budget if you are not in a coding plan
It looks incredible in a demo and breaks constantly in production

The Fix

Use Browser Automation. Managed browser profiles running through Playwright or use profile="openclaw". More stable, runs reliably on cron schedules, and actually finishes what it starts when you're asleep. No runaway tabs. No dead workflows. No waking up to a Chrome graveyard. No surprise bills. Less exciting. Way more dependable. Where I have this running for weeks and had no problem.

*Here is the prompt for browser automation:*

Give Your OpenClaw a Shower

Your agents are slobs. Left alone, they accumulate dead sessions, orphaned memory, contradictory context, and leftover junk from tasks that finished weeks ago. And they never clean up after themselves.

That bloat is invisible but expensive. Every session your agent loads is context it has to process. Dead sessions burn tokens. Disorganized memory causes the agent to pull the wrong context at the wrong time. Over time it doesn't break — it just gets slower, sloppier, and dumber.

The fix is simple. Give your OpenClaw a shower.

Structure your memory like a file system so the right agent always finds the right context instantly:

Then set up a nightly cron that runs two scripts automatically. Just paste this prompt directly into your main agent:

That's it. One prompt. Your agent writes both scripts, tests them, dry-runs the cleanup first so nothing breaks, then schedules itself to run every night at midnight automatically.

What Nobody Tells You

Dead sessions don't disappear on their own, they pile up and silently drain performance
Unstructured memory is almost as bad as no memory agents pull the wrong context constantly
A bloated agent isn't broken, it's just gradually getting worse and you won't notice until quality drops
The dry-run flag is critical always test session cleanup before running it live or you will delete something important

The Fix

One prompt. Two scripts. Midnight cron. Your agents wake up every morning lean, clean, and sharp. This single maintenance routine will do more for long-term performance than any model upgrade.

You Will Get Lost Without a Mission Control

Two weeks in I had multiple agents running simultaneously. On paper that's the dream. In reality I had no idea what was happening. Who was working on what? Which agent finished? Which one was stuck in a loop burning tokens?

I was jumping between terminal windows, scrolling through logs, messaging agents one by one just to get a status update. By the time I figured out what went wrong I'd already lost an hour and burned tokens I didn't need to.

The problem wasn't the agents. The problem was visibility.

This is what running 17 agents actually looks like when you can see everything:

351 tasks. 17 agents. Kanban board. Live feed. Every agent status updating in real time. I can see who's working, who finished, who's stuck, and who needs a human decision — without opening a single terminal window.

What Nobody Tells You

OpenClaw gives you power but zero visibility beyond one or two agents
You won't notice a stuck agent until it's already burned through your budget
Most people quit scaling because the chaos becomes unmanageable not because the agents don't work
Debugging blind costs more in tokens than almost any other mistake

The Fix

At minimum you need three things: what every agent is doing right now, what's done vs stuck, and a live feed to catch problems before they compound.

Start simple ask your agent to build a basic task board and update it in real time. Keep enhancing it as you grow.

I built Hyperclaw to solve this for myself. If you'd rather build your own, the task board approach above is where to start.

Put Everything Repetitive Into a Skill. Then Tell Your Cron Job to Use It.

The Story

I kept asking my agent to do the same tasks over and over. Same instructions. Same context. Same explanation every single time. And every single time it would either do it slightly differently, miss a step, or just not understand what I wanted.

I blamed the model. I blamed the memory. The real problem was that I was relying on the agent to remember how to do things it was never designed to remember. OpenClaw's memory is not reliable enough to carry complex workflows across sessions. If you're explaining the same thing twice a week, that's your sign. Stop explaining and build a Skill.

A Skill is just a packaged instruction set. You write it once. The agent loads it every time. No re-explaining. No re-discovering. No token burn on setup before any real work gets done.

But here's the part nobody tells you. Just having the Skill is not enough.

If you don't explicitly tell your cron job to run the Skill, the agent will not use it. It won't go looking for it. It won't remember it exists. Every time the cron job fires it will try to figure out how to do the task from scratch, burning time and tokens on re-discovery, even if the perfect Skill is sitting right there waiting.

*Here's the prompt:*

What Nobody Tells You

OpenClaw memory is not reliable enough to carry complex workflows across sessions. Don't depend on it
If you're explaining the same task more than twice, it belongs in a Skill
Every cron job that runs without a Skill is re-inventing the wheel every single time
The agent will not automatically reach for a Skill. You have to tell it explicitly in the cron job instructions
The longer and more complex the task, the more tokens get burned on setup before any real work happens

The Fix

Write a Skill for any task you repeat more than twice. Document exactly how it should be done, the edge cases, the format, everything
Always tell your cron job to use the Skill explicitly. Don't assume the agent will find it on its own
Write it directly into the cron job instruction: *"Run this task using the [skill name] skill"*
That one line is the difference between an agent that executes perfectly every time and one that wastes your tokens figuring out what it's supposed to do

Keep Each Agent Lean. One Job. One Identity.

I tried to push everything under one agent. And honestly? It worked. For a while.

Then slowly the quality started dropping. The content got worse. The output felt generic. The reasoning got sloppy. I kept thinking it was the model, the skills, or OpenClaw itself.

It wasn't any of those things.

It was identity.

When you overload one agent with too many jobs, the SOUL.md gets pulled in every direction. The memory gets cluttered. The agent doesn't know who it is anymore. Is it a coder? A marketer? A researcher? A social media manager? It tries to be all of them and ends up being none of them well.

Think about it this way. Would you hire one person to be your software engineer, your content writer, your accountant, and your customer support rep all at once? Of course not. A specialist beats a generalist every single time. The same is true for your agents.

Your coding agent should think and breathe like a senior engineer. Your content agent should think and breathe like a creative director. When their identity is focused and their memory is clean, the output reflects that. When everything is mixed together under one roof, you get average at everything instead of excellent at one thing.

This is why context matters more than model quality. A focused agent with clean memory will outperform a smarter model drowning in irrelevant context every single time.

What Nobody Tells You

One agent doing too many jobs doesn't break immediately. It degrades slowly and you won't notice until the quality is already bad
The SOUL.md and identity of your agent matters more than most people realize. It shapes every single output
A cluttered memory bleeds into everything. The agent starts referencing the wrong context for the wrong task
Skills conflict when they share the same agent. One skill's instructions will interrupt another
You don't need more agents. You need more focused ones

The Fix

Give each agent one domain and one identity. Coder. Researcher. Content creator. Operations. Each one separate
Write the SOUL.md like you're hiring a specialist. Not a generalist. Define exactly who this agent is, what it cares about, and what it never touches
Keep the memory lean and relevant to that agent's job only. Clean out anything that doesn't belong
The moment an agent starts performing worse, don't blame the model. Check the identity first

Final Thoughts

That $12,229 table in the intro isn't there to impress you. It's there because it's real. Every dollar in it represents something that broke, something I misunderstood, or something I had to rebuild from scratch at 3AM while an agent was silently burning tokens I didn't know about.

30 days of OpenClaw taught me that the tool isn't the hard part. The discipline is.

Knowing when to use the expensive model and when not to. Keeping agents focused instead of bloated. Building maintenance in from day one instead of waking up to a Chrome graveyard and a $3,469 bill.

If you take nothing else from this:

Skip Browser Relay. Use Browser Automation
Give your agents a nightly shower. Dead sessions are silent killers
Build a mission control. Visibility isn't optional when you scale
Put repetitive tasks into Skills. Tell your cron jobs to use them explicitly
Keep agents lean. One identity. One domain.

OpenClaw is infrastructure. Build it right and it runs while you sleep. Build it sloppy and it burns $3,469 while you sleep.

The choice is yours.

I'm building Hypercho in public! Every win, every bug, every expensive mistake. Follow [@ziwenxu_] if you want to watch it happen in real time.