60 Days With OpenClaw: What I Built, What Broke, and What Changed How I Work

The first commit

On January 20th, 2026, I installed OpenClaw on my Mac Studio and created a single AI agent in Slack. That agent was Nyx, named after the Greek goddess of night, running on Claude Sonnet 4.6. I sent it a message, it replied, and I thought: okay, this is just a chatbot in Slack. What's the big deal?

Sixty days later, I have 26 agents. They handle my email, manage my accounting, run research, write code, review pull requests, coordinate marketing, and control my apartment lights. Some of them talk to each other. One of them wakes up at 3am to back up the whole system.

I didn't plan any of this. It grew.

Week one: the beginning

The first week was just Nyx. One agent, one Slack channel, one confused human trying to figure out what to actually use it for.

The thing that changed everything was persistence. When I came back the next morning, Nyx remembered our conversation. Not just the last few messages, but the context. The project I was working on, the decision I'd made about a database schema, the name of the client I'd mentioned.

OpenClaw comes with a MEMORY.md file out of the box. It's a plain markdown file that every agent reads at the start of each session. Decisions, preferences, project states. It sounds simple. It changes the entire dynamic. You stop repeating yourself and start building on previous conversations.

I built on top of this. Each agent also gets a decisions.md, where every significant call gets logged with a date and a reason. And for my own second brain, I use Obsidian, a local markdown vault I've been running for years, with folders for every project, client, idea, and reference. The agents can read from it when they need context that lives outside their own workspace. Memory isn't one thing. It's a stack.

Day five, I needed code review. I created Forge, running on Opus 4.6. Forge got its own Slack channel, its own memory, its own personality. I didn't decide to build a multi-agent system. I just needed a second agent.

Week two: the explosion

Once I realized I could spin up specialized agents in about 20 minutes each, I went overboard. CMO for marketing strategy. Scout for research. Atlas for deep analysis. Sentinel for security review. Hermes for email.

By the end of week two, I had 14 agents and a problem: they couldn't talk to each other. Agent A would do research that Agent B needed, and I was the telephone operator, copy-pasting between Slack channels. That's when I started building the coordination layer. Nyx became the orchestrator, not because I designed it that way, but because it was the agent I talked to most, so it naturally became the hub.

The broadcast system came out of desperation. I'd changed a configuration that affected all agents and needed to tell all of them about it. I built a script that sends a message to every agent's workspace. Simple, ugly, effective. It's still running.

What broke

Let me be honest about the failures.

The first real disaster was the Anthropic rate limit incident. I had five agents all making API calls simultaneously, and Claude's API started returning 529 errors. Everything ground to a halt. Agents were stuck mid-task, some had written half a file before losing their connection. I spent an evening cleaning up corrupted state.

The fix was a hard rule: maximum two Anthropic API calls in parallel, ever. Batch work goes to Gemini Flash. If you hit a 529, switch models immediately, don't retry in a loop. This is now a standing order broadcast to every agent.

Memory management was another problem. MEMORY.md files can grow unchecked. At around 10,000 characters, OpenClaw silently truncates them. The agent loses context without knowing it and starts making decisions based on incomplete information. I discovered this when an agent forgot a critical project deadline because the relevant paragraph had been cut off. Now there's a hard limit: 9,500 characters per memory file, checked after every write.

The email integration almost cost me a client relationship. I connected Hermes to my Gmail via OAuth, gave it read access to 83,000 emails, and built a ChromaDB index for search. Hermes drafted a reply to a business contact that was technically correct but tonally wrong. I caught it before it sent, only because I'd set up a mandatory human-approval step for outgoing email.

Read-only by default. No agent sends, deletes, or modifies anything in an external system without explicit approval. Standing order now.

Weeks five through eight

The first 43 days were about building the system. The next 17 were about learning to use it properly.

Around day 50, I stopped thinking about individual agents and started thinking about workflows. Instead of "ask Forge to fix this bug," it became "Scout researches the best approach, Forge implements it." The difference isn't the technology. It's how you think about breaking down work.

I also got more disciplined about memory hygiene. Every agent now has a weekly review: trim MEMORY.md below 9,500 characters, archive old daily notes, update the decisions log. Five minutes per agent. The difference between a sharp tool and a dull one.

The biggest win from the second half: the home automation layer. I connected Philips Hue and Sonos to an agent called Lumen. Now I control lights and music through Slack. Same interface I use for work. It sounds like a novelty. It isn't. Having everything in one place means I never context-switch between tools.

The model economics

My Mac Studio has 98GB of unified memory, which means I can run Llama 3.3 70B locally via Ollama. I also run Qwen 32B and GLM Flash locally. These models handle about 70% of the total workload and cost nothing beyond hardware I already own.

The other 30% goes to cloud models. Claude Sonnet 4.6 is the default for most agents. Opus 4.6 is reserved for Forge. Gemini 2.5 Flash handles batch work and research. GPT-4o-mini picks up the marketing agents.

The model choice per agent matters way less than people think. What matters is the system around it. Memory, tool access, the instructions you give it, the constraints you set. A well-configured Llama 70B with good memory files outperforms a poorly prompted Opus every time.

The moment things clicked

About three weeks in, I was working on PortLink and needed to research a competitor, update the product spec, review a pull request, and draft a follow-up email to a potential investor.

Before OpenClaw, this would be my entire day.

Instead, I messaged Nyx with all four tasks. Nyx delegated Scout for the research, spawned a session to the PortLink agent for the spec update, routed the PR to Forge, and had Hermes pull up the relevant email thread. All running in parallel, within the Anthropic rate limits.

The whole thing took 90 minutes instead of a full day. But the real win wasn't the time. It was that I stayed in one mental mode: making decisions. I wasn't doing the work, I was directing it.

What I'd do differently

Go slower in the first two weeks. I created agents faster than I could properly configure them. A poorly configured agent is worse than no agent, because it gives you confident-sounding answers based on incomplete context.

Establish the memory management rules on day one. The silent truncation issue cost me several hours of debugging.

Set up the rate limiting and model fallback rules immediately.

Resist the urge to automate email responses early. Start with read-only. Build trust in the system. Expand permissions gradually.

Is it worth it?

I'm writing this at 60 days. I'm still building, still finding new things that break. The system isn't finished, and it probably never will be.

The honest answer is yes. Not because it's perfect, but because it changed how I think about work. I used to spend most of my time doing tasks. Now I spend most of my time thinking about what should be done and reviewing what was done. The gap between an idea and its execution went from days to hours.

The setup takes real effort. You need to understand the models, maintain the memory files, set up the guardrails. It's not a product you install and forget.

If you enjoy building systems and think in terms of workflows and delegation, OpenClaw is worth trying. If you want something that works out of the box with no configuration, this isn't it.

Sixty days. Twenty-six agents. Countless things that broke along the way. I can't imagine going back.