How PubNub is scaling from one-off prompts to a reliable, multi-agent workflow
If you’ve tried Claude Code for one-off prompting and thought "Great…but now make it a pipeline,” subagents are your next step. At PubNub, we are migrating from ad-hoc prompts to a subagent pipeline that designs features, reviews architecture, implements code, runs tests, and hands back clean PRs, repeatably and safely. Why? This modular approach is more effective than relying on ad-hoc, all-in-one prompting, where context can become crowded and task performance inconsistent.
This guide shares some of our experiences: concepts, best practices, and a hands-on setup (subagents, hooks, and MCP servers). It assumes you already use Claude Code, and we’ll focus on building a reliable agent architecture for any codebase.
Why subagents (and why now)
Claude Code subagents are specialized, autonomous assistants designed to execute specific, well-defined tasks within a larger workflow. Unlike a general-purpose agent that handles a wide range of requests, a subagent operates with its own distinct system prompt, a curated set of tool permissions, and an isolated context window. This modular design allows you to create a team of AI experts. Give them roles: Product Spec, Architect, Implementer/Tester, and chain them with Claude Code hooks to create a dependable software pipeline:
Reproducibility: Stop re-prompting. Subagents and hooks codify repeatable steps.
Separation of concerns: PM asks, Architect validates, Implementer builds & tests, QA verifies.
Governance & safety: Each agent has scoped tools & permissions, while hooks gate and log transitions.
Throughput: Serialize high-risk steps, parallelize safe ones.
One important advantage of agents is that they have their own context window and can provide a summary after doing extensive research to the main agent. That way you can save precious time within the main agent before it has to 'compact'.
Subagents are defined as Markdown with YAML frontmatter (name, description, optional tool list). They can be discovered and invoked by Claude Code from your project’s .claude/agents/
directory or from your user scope.
A Good Starting pattern (3 subagents)
We started with a three-stage pipeline that’s generic to any stack:
pm-spec → reads an enhancement, writes a working spec, asks clarifying questions, sets status
READY_FOR_ARCH
.architect-review → validates design against platform constraints (for PubNub real-time apps, use PubNub’s MCP server for ensuring the architect subagent is up-to-date on the best design patterns and latest SDKS). The architect considers performance/cost limits, produces an ADR, and sets status
READY_FOR_BUILD
.implementer-tester → implements code & tests (unit + optional UI via Playwright), updates docs, flips status
DONE
.
A hook watches our queue file and prints the next explicit command (e.g., “Use the architect-review subagent on ‘use-case-presets’.”). We keep a human in the loop (HITL) to approve that handoff (details below).
Concepts developers should know about subagents
Location:
.claude/agents/
(project) or user scope; project wins on name collision. Note you can use the/agent
command to create agents, and Anthropic recommends this.Definition: Markdown + YAML frontmatter (
name
,description
, optionaltools
) plus the system prompt.Usage: Claude can auto-delegate by description or you can call explicitly, e.g.:
Use the implementer-tester subagent on "use-case-presets".
Tool access: If you omit
tools
, the subagent inherits the thread’s tools (including MCP). Whitelist when you need tight control.Hooks: Attach shell commands to lifecycle events (e.g., SubagentStop, Stop) and print to STDOUT. This allows you to surface next steps in the Claude transcript. It's best to register both events to reliably catch the end of any run. Configure hooks in project/user settings.
Best practices
Follow these guides to take you from conceptual understanding to practical application.
1) Single-responsibility agents
Give each subagent one clear goal, input, output, and handoff rule. Keep descriptions action-oriented (“Use after a spec exists; produce an ADR and guardrails”). (Anthropic)
2) Permission hygiene
Scope tools per agent. PM & Architect are read-heavy (search, docs via MCP); Implementer gets Edit/Write/Bash plus UI testing; Release gets only what it needs. If you omit tools
, you’re implicitly granting access to all available tools. Be intentional. (Anthropic)
3) Chain with hooks, not prompt glue
Register a SubagentStop hook that reads your queue and prints the next suggested command. Also register Stop as a safety net. After editing settings, use Claude Code’s controls to review/apply changes so hooks go live. (Anthropic)
4) Treat hooks like prod code
Version them under .claude/hooks/
; validate JSON with jq
; keep them idempotent; make them executable; and, if your policy requires it, allow the command in settings. The hooks reference covers event names and behavior. (Anthropic)
5) Settings hygiene
Keep settings valid JSON with a single root object and avoid conflicting local overrides. Manage project-level settings in .claude/settings.json
and developer overrides in .claude/settings.local.json
; apply after edits so Claude Code loads them for the current session (verify via the hooks UI/commands). (Anthropic)
6) Bring the right MCP servers
Use MCP to ground agents in your current docs and to run real tests (e.g., PubNub’s MCP Server, and for UI testing, the Playwright MCP server).
Keeping humans in the loop (without losing velocity)
Subagents shine when humans stay in control of direction and quality. Our HITL pattern is lightweight but explicit:
1) Clear, visible handoffs
Hooks suggest, humans approve: the hook prints “Use the architect-review subagent on ‘use-case-presets’.” A human pastes it to proceed, preventing runaway chains and forcing a quick glance.
Definition of Done per agent: prompts end with checklists (PM: acceptance criteria + questions; Architect: ADR + guardrails; Implementer: code + tests green + summary). Missing DoD? Stop and fix.
2) Review the slug, not just prose
Every enhancement carries a “slug” (e.g., use-case-presets
) and leaves a quick audit trail:
Queue:
enhancements/_queue.json
showsslug
→status
.PM note:
docs/claude/working-notes/<slug>.md
.ADR:
docs/claude/decisions/ADR-<slug>.md
.Proof: Passing tests + commit messages with the slug; optional UI test artifacts.
Ask Claude Code to “summarize what changed for <slug>
since last run” for a diff-based review.
3) Pause, resume, or branch intentionally
Add queue statuses like ON_HOLD
or BLOCKED
. If a feature splits, create -a
/-b
slugs and cross-link in the PM note.
4) Minimal approvals that matter
Pre-implementation: a human signs off on the ADR.
Pre-PR: a human scans the Implementer’s summary (what, why, tests) before raising a PR.
5) Parallelize safely
Run subagents in parallel only for disjoint slugs (different modules/files). Architect flags potential conflicts; Implementer lists touched paths; hooks can warn when two READY_FOR_BUILD slugs touch the same directories.
6) Tiny logging, big payoff
Log hook runs to a simple hooks.log
(timestamp, queue snapshot) and include slugs in commit messages, e.g., feat(presets): implement 'use-case-presets'
.
7) Bake “ask first” rules into prompts
PM: If acceptance criteria are ambiguous, ask numbered questions and wait.
Architect: If the design implies a public API change, stop and ask before finalizing.
Implementer: If green tests require refactors beyond the ADR, ask before proceeding.
Involving other LLMs (e.g., GPT-5) as reviewers between subagents
Sometimes you want a second opinion before handing off; i.e. “static analysis of the spec,” “lint the ADR for migration risks,” or “review tests for flakiness.” With Claude Code you can slot in other LLMs in two practical ways:
Pattern A — MCP bridge to another LLM
Stand up a lightweight MCP server that calls your preferred model (e.g., GPT-5) behind an API. A subagent or a hook invokes a tool like external_llm.reviewSlug({ slug })
and writes the verdict into the working note or queue.
Where to place it
After PM: external LLM does spec QA; only then allow Architect to proceed.
After Architect: external LLM sanity-checks the ADR (breaking changes, migration).
After Implementer: external LLM reviews tests and change summary before PR.
Gatekeeping
The hook flips status to
PENDING_REVIEW
and prints “Run external-review on ‘’.”If the external LLM returns “pass,” the hook updates to
READY_FOR_*
. If “fail,” it writes comments into the PM note / ADR and leaves status unchanged.
Pattern B — Hook calls a local script that hits another LLM
If you don’t want to build an MCP server, a hook can call a local Node/Python script that hits the other LLM’s API, writes results to docs/claude/working-notes/<slug>.md
, and sets status accordingly. The hooks reference covers command hooks and lifecycle events. (Anthropic)
Either pattern keeps Claude Code as the orchestrator while you plug in best-of-breed reviewers. MCP is the standard glue for this kind of multi-tool, multi-model workflow. (Anthropic)
Implementation guide: make any repo “subagent ready”
Project layout (example)
Example subagents
pm-spec.md
architect-review.md
implementer-tester.md
Hooks & settings (project-level)
.claude/settings.json
Hook events, security, and behavior are covered in Anthropic’s Hooks reference. After edits, review/apply in Claude Code so the new hooks are live.
.claude/hooks/on-subagent-stop.sh
Add MCP servers (optionally)
Use Anthropic’s Claude Code ↔ MCP guide to connect external servers (HTTP/SSE/local). That’s how agents get access to your tools, APIs, and internal docs. For cross-model reviews, expose a simple MCP tool that forwards a request to your preferred model and returns a summarized verdict (pass/warn/block).
Operating the pipeline
PM:
Use the pm-spec subagent on "use-case-presets".
HOOK prints the next suggestion:
Use the architect-review subagent on "use-case-presets".
Architect runs and writes the ADR → hook suggests Implementer.
Implementer codes/tests; when green, sets
DONE
and summarizes changes.
Keep a short Definition of Done in each agent prompt and use slugs to tie artifacts together.
Troubleshooting (what actually happens in real life)
Hook output not visible → Print to STDOUT (not
/dev/tty
) and register both SubagentStop and Stop.Hooks not loading → Settings JSON wasn’t valid or wasn’t applied. Keep one top-level object and review/apply changes so the runtime reloads hooks.
No next-step suggestion → Queue wasn’t updated to
READY_FOR_*
; fix the agent or manually flip the status and re-run the hook.Tool sprawl → If you omit
tools
in a subagent, it inherits all available tools (including MCP). Whitelist intentionally.
Improving the work of subagents
When a subagent performs poorly—such as neglecting a necessary tool or mismanaging a task—developers are encouraged to use iterative prompting to refine agent behavior. This process typically involves:
Supplying context on the failed action (what the subagent did versus what it should have done).
Explaining the expected result, making it clear how the agent should act differently next time.
Passing in the relevant .md configuration file to allow Claude to analyze and suggest precise modifications.
Context, Customization, and Continuous Improvement
With each refinement, updated .md files maintain an auditable history of agent behavior and can be version-controlled. This approach allows custom subagents to become more specialized, efficient, and reliable over time—especially when developers proactively identify workflow failures and provide Claude Code with structured context for improvementment.
Plan and Thinking Mode Considerations
Unlike the main Claude agent, subagents in Claude Code currently do not support generating or executing stepwise plans; they begin executing assigned tasks right away. Additionally, there is no interactive “thinking” mode or transparent intermediate output, making it challenging to monitor progress or debug issues until after subagent execution finishes. This tradeoff yields parallel productivity but reduces real-time visibility.
Handling Subagent Permissions (with MCP Servers)
When integrating subagents with MCP servers, explicit attention must be paid to tool and resource permissions. subagents must be granted access to the specific MCP tools or resources required for their tasks—if these aren't included in the subagent’s .md configuration (or set via the /agents interface), the subagent will be unable to utilize those features. Failure to configure permissions correctly can lead to denied operations or persistent permission prompts that block execution, especially for file access or server-bound requests.
Practical Tips
Carefully specify which tools and server resources are accessible to each subagent via its tools field or the Anthropic agent management command.
Review permissions regularly, especially after changes to server setups or security settings, to avoid broken workflows or security gaps.
For workflows requiring observable, incremental steps, assign those tasks to the main Claude agent rather than subagents.
By understanding these operational details and configuration nuances, it’s possible to leverage subagents for specialized, parallelized tasks while maintaining secure, controlled access to sensitive tools and environments.
Closing thoughts
Subagents + hooks turn Claude Code from a helpful AI into a repeatable engineering system for us here at PubNub. We get faster onboarding, safer automation, and clean handoffs, yet still keeping humans firmly in the loop. If you already use Claude Code, you can turn any repo into a subagent workflow with a day’s effort.
Remember to keep experimenting: try different workflows, explore agents in a various ways you can work with them so they can refine how they work with Claude Code, and find a workflow that works the best for them and you.