How to Set Up a Multi-Agent Organization With OpenClaw

Don't start with 26 agents

I need to say this upfront because I know how the last post sounded. I have 26 agents. That doesn't mean you should start with 26 agents. I didn't start with 26 agents. I started with one, and each new agent was created to solve a specific problem that the existing setup couldn't handle well.

If you create 15 agents on day one, you'll spend all your time configuring them instead of using them. Start with one general-purpose agent. Use it for a week. Notice where it struggles. That's where your second agent comes from.

The org chart principle

Here's the mental model that helped me the most: treat your agents like employees at a small company. You wouldn't hire 26 people for a two-person job. But as the work grows and specializes, you need specialists.

Think about your own work for a moment. What are the categories? For me, it broke down into roughly seven areas: orchestration (making decisions about what to do), engineering (writing code), research (finding information), marketing (creating content and strategy), business operations (finance, legal), communication (email, outreach), and product work (specific projects I'm building).

Each of those became a tier in my agent organization. Not on day one. Over weeks, as needs emerged.

The tier model

I organize agents into tiers based on their function. Here's what mine looks like and why.

The core tier is small. Two agents. Nyx is my primary interface, the orchestrator, the agent I talk to most. It runs on Claude Sonnet 4.6 because orchestration requires strong reasoning and the ability to decide which other agent should handle a task. Synapse runs on a tiny local model (Llama 3.1 8B) and handles background maintenance: cron jobs, knowledge syncing, cleanup tasks. This split matters. You don't want your orchestrator bogged down with routine maintenance, and you don't want to pay cloud API rates for tasks that a local model can handle.

Engineering has two agents: Forge and Sentinel. Forge is the builder, running on Opus 4.6 (the most capable model I use) because architecture decisions and complex code generation need the strongest reasoning. Sentinel handles code review and security analysis on Sonnet 4.6. The split between building and reviewing is intentional. You don't want the same agent reviewing its own code. Different perspectives catch different problems.

Research is Scout (rapid, surface-level research) and Atlas (deep analysis and feasibility studies). Both run on Gemini 2.5 Flash because research is search-heavy, often involves processing large amounts of text, and doesn't need the absolute best reasoning model. Gemini Flash is fast, cheap, and handles this workload well.

Marketing is the biggest tier, with seven agents. CMO for strategy, Content for writing, SEO for optimization, Analytics for data, Email Marketing for campaigns, Ads for paid channels, and Partnerships for outreach. Most of these run on GPT-4o-mini because the work is structured enough that a lighter model handles it. The CMO sits at the top and the others execute within their specialties.

Business has two agents for finance and legal. Communication is Hermes, which manages email through Gmail OAuth. Products each get their own agent, one per project I'm actively building. Personal has fitness coaching and home automation.

When to create a new agent

The trigger for creating a new agent is always the same: I notice that an existing agent is being asked to do two fundamentally different types of work, and the context needed for each type is cluttering the other.

My first split was separating Forge from Nyx. Nyx was handling both orchestration and coding. The problem was that Nyx's memory file was filling up with code conventions and project-specific technical details, which made it worse at its primary job of deciding what work to do and routing it. As soon as I created Forge with its own memory focused on engineering, both agents got better at their jobs.

The second split was Hermes from Nyx. I was asking Nyx to handle email tasks, which meant Nyx needed to know about Gmail labels, email threading, contact management. None of that was relevant to orchestration. Hermes got its own workspace, its own ChromaDB index, its own label system. Nyx just says "check my email" and Hermes handles the rest.

The pattern is consistent: split when the memory file is getting overloaded with context from different domains. If you're writing project-specific database schemas next to marketing strategy notes next to home automation commands, that's three agents waiting to be born.

But don't split prematurely. An agent doing two related things is fine. My Budget agent handles both the budget app development and receipt processing. Those are different tasks but the same domain. The context overlaps. No need to split.

One channel per agent

Every agent gets its own Slack channel. This isn't optional in my setup. It's the core architectural decision.

One channel means one conversation thread per agent. When I'm in #ai-forge, I'm talking to an engineer. When I'm in #ai-hermes-email, I'm talking to my email manager. The channel is the interface boundary.

This also means I can scroll back through the history of any agent's work. If I need to remember what Forge said about a database migration last week, I go to #ai-forge and search. The conversation log is the work log.

I name channels with a consistent prefix: #ai-[name] or #ai-[name]-[specialty]. So #ai-nyx, #ai-forge, #ai-hermes-email, #ai-scout, #ai-portlink. The prefix makes them sort together in Slack's sidebar and makes them instantly identifiable as agent channels rather than human channels.

Agent isolation and coordination

Agents are isolated by default. Forge can't read Hermes's memory. Scout can't modify Forge's workspace. Each agent has its own directory with its own files, and it can only access what's in its scope.

This is a feature, not a limitation. Isolation prevents agents from stepping on each other. If the marketing agent could modify the engineering agent's code, you'd have a mess. Clear boundaries prevent category confusion.

Coordination happens through Nyx. When I need cross-agent work, I tell Nyx. Nyx spawns sessions to the relevant agents, collects their outputs, and synthesizes the result. This is the hub-and-spoke model. Nyx is the hub. Everything else is a spoke.

The technical mechanism is sessions_spawn for one-off delegation and sessions_send for broadcasts. When I change a standing order that affects all agents (like the Anthropic rate limit rule), Nyx broadcasts it to every workspace. Each agent reads the broadcast and updates its own memory. This takes about 30 seconds for all 26 agents.

Is this the only way to do it? No. You could build peer-to-peer communication between agents. But I tried that briefly and it got chaotic fast. Messages flying between agents without a central coordinator means you lose visibility into what's happening. The hub model keeps me in the loop.

When not to split

I've made mistakes in both directions. I split too early a few times, creating agents that didn't have enough work to justify their own memory and configuration overhead. An agent that gets used once a week doesn't accumulate enough context to be meaningfully better than your general-purpose agent.

Here's my rule of thumb: if you'd interact with the agent at least three times a week, it probably deserves its own channel. If it's once a week or less, keep the responsibility in an existing agent.

I also tried splitting agents by project: one agent per client, one per product. This works well for products where the codebase, database, and deployment are all distinct. My PortLink agent knows the PortLink Supabase schema, the PortLink repo structure, the PortLink deployment pipeline. That's a dense ball of context that would bloat any general-purpose agent.

But for smaller projects or early-stage work where the domain isn't well defined yet, a general-purpose agent works better. You can always split later when the context gets heavy.

The naming convention

This sounds trivial. It isn't.

Every agent in my system has a Greek mythology name. Nyx is the goddess of night (she works while I sleep). Hermes is the messenger god (email). Atlas carries the world (deep research). Forge is... okay, Forge isn't Greek mythology, it's just a good name for a builder. The point is that the names create mental associations.

When I'm about to message an agent, the name tells me what it does before I even open the channel. This matters more than you'd think when you have 20+ channels in your sidebar. If they were named "agent-1" through "agent-26," I'd have to remember which number does what. With named agents, the name carries meaning.

The names also help with agent identity in a more subtle way. When I write a SOUL.md file for Hermes, the name "Hermes" naturally shapes how I write the personality. It's a communication specialist, fast, message-oriented. When I write Forge's SOUL.md, the name suggests a builder, methodical, focused on craft. The names aren't just labels. They're design prompts for the agent's behavior.

Pick a naming convention you'll stick with. It doesn't have to be Greek mythology. Some people use job titles, some use abstract nouns. The convention matters less than consistency.

Starting your own org

If you have one agent running and you're ready to expand, here's the order I'd recommend.

First new agent: an engineer. If you write code, having a dedicated coding agent with its own memory of your conventions, stack, and project structure is the single biggest productivity gain. Give it the best model you can afford. Configure its memory with your coding standards, your repo locations, your deployment setup. This agent will pay for itself in the first week.

Second new agent: a researcher. Separate the "find information" tasks from the "make decisions" tasks. Your orchestrator shouldn't be doing web searches and reading documentation. That's a different cognitive mode. A research agent on a fast, cheap model (Gemini Flash is excellent for this) can gather context while your orchestrator focuses on planning.

Third new agent: whatever your biggest non-coding task is. For me, it was email management. For you, it might be content writing, data analysis, or project management. Look at where you spend the most time doing routine work that requires specific domain knowledge. That's your third agent.

After that, follow the pattern: notice where context is getting muddled, split the responsibilities, configure the new agent's memory with the relevant context, and route work through your orchestrator.

The maintenance overhead

I won't pretend this is zero-effort. A 26-agent system needs management.

Memory files need pruning. I have a hard limit of 9,500 characters per memory file because OpenClaw silently truncates at 10,000. If you don't prune, you lose context without knowing it. I check memory file sizes weekly. Old entries that are no longer relevant get moved to dated archive files in a memory/ subdirectory.

Standing orders need broadcasting. When I change a rule that affects all agents (new rate limits, new formatting guidelines, new security policies), I broadcast it to every workspace. Each agent needs to read and internalize the update. This is a few minutes of work, but it's recurring.

Agent configurations drift over time. Models get updated, tools get added or removed, workflow patterns change. I do a periodic review of each agent's configuration to make sure it's still accurate.

But here's the thing: this maintenance is the same kind of work you'd do managing a team of humans. Updating documentation, communicating policy changes, reviewing team structure. The difference is that AI agents don't get offended when you rewrite their job description, and the "hiring" process takes 20 minutes instead of two months.

The principle behind all of this

The core idea is simple: specialized, persistent agents with clear boundaries and centralized coordination. Every design decision I've made flows from that principle.

Specialization means each agent is good at its specific job. Persistence means context accumulates over time rather than evaporating between sessions. Clear boundaries mean agents don't interfere with each other. Centralized coordination means you maintain visibility and control.

Start with one agent. Split when the context gets heavy. Name them well. Keep the orchestrator clean. Prune the memory files. Broadcast standing orders.

That's the whole playbook. Everything else is iteration.