Best Practices with Claude Code Subagents Part II: Moving from Prompts to Pipelines

0 MIN READ • Todd Greene on Feb 4, 2026

If you haven't read Part I yet, start here: Best Practices in Claude Code Subagents (Aug 2025).

A quick recap (Part I in one minute)

In August 2025, I was spending too much time doing prompt glue: re-explaining context, copy-pasting similar prompts, and hand-stitching artifacts (specs, ADRs, tests) into something a teammate could repeat.

So we switched to a simple pipeline:

pm-spec: reads the enhancement, asks questions, writes acceptance criteria, moves the slug to READY_FOR_ARCH
architect-review: checks platform constraints, writes an ADR, moves the slug to READY_FOR_BUILD
implementer-tester: implements + tests, summarizes changes, moves the slug to DONE

We kept humans in control: hooks suggested the next step, but a person explicitly ran it. That was the whole point. Repeatable progress, less chaos.

What changed between Aug 2025 and now

Since Part I, Claude Code subagents stopped being just "structured prompts" and started feeling like a real runtime.

The biggest practical shifts (the ones we actually felt in day-to-day work) are:

Forked contexts: subagents can run in an isolated child session so the main thread stays clean. No more "context rot" from giant logs and failed test spew.
Skills are now the unit of capability: instead of "slash commands vs skills" as two different things, one definition can be user-invoked and model-invoked.
Native backgrounding: you can push long work (tests, big refactors, research) into the background so your terminal - and your brain - can keep moving.
Plan Mode + a real plans directory: planning artifacts stop being ad-hoc notes and become a versioned contract for the work.
Hooks matured: from a single on-subagent-stop.sh glue script to a more complete event system (pre-tool gating, post-tool logging, subagent stop routing).

If Part I was "stop prompting, start pipelining" then Part II is: keep the pipeline, but make it sturdier, safer, and a lot less token-wasteful.

The single biggest upgrade: Forked context subagents

The first time we really felt this was during a gnarly test failure loop.

In 2025, an implementer agent would run tests, print 500 lines of output, grep around, rerun tests, print another 500 lines... and the main conversation would slowly turn into a landfill. Eventually you could tell the model was thinking through mud.

With context forking, you can define certain agents (usually implementer and qa) to run in a forked context. They can do all the messy work - npm test, giant diffs, iterative debugging - and then return a clean summary back to the orchestrator.

Here's what that looks like in practice:

.claude/agents/implementer.md

# Role

You are a senior engineer. Implement the plan and keep scope inside the ADR guardrails.

# Handoff

Write an implementation summary to .claude/plans/implementation-summary.md
Include: files changed, rationale, how to test, and any follow-ups

My rule of thumb: Fork anything that will generate noisy output or do a lot of trial-and-error.

Our pipeline became "Plan → Execute → Verify"

In Part I, we used a queue file and a couple docs folders. It worked.

Once we started treating planning as a first-class artifact, the workflow got smoother:

Plan: capture intent and constraints as a written contract
Execute: implement in an isolated context
Verify: run tests and sanity checks (also isolated)

It's the same philosophy as before: Dialogue / Document / Digest (which, coincidently, happens to be a PubNub core value!). The difference is the "Document" part is now the backbone.

A directory layout that matches how the work actually flows

This is what we've converged on as a clean baseline:

A few notes:

.claude/plans/active-plan.md becomes the source of truth for "what we're building right now." If nothing else exists, make this exist.
memory/CLAUDE.md is no longer a novel; it's a router (more on that below).
We still like "slugs" for traceability (feature IDs that show up in docs + commits), but the plan artifacts are the contract.

Skills: One capability, two ways to run it

In 2025, we had a little friction around tooling: some actions were "commands" that humans typed, others were "skills" the agent could call. You ended up duplicating logic.

The best practice now is to define a Skill once, and make it available to:

Humans (as a slash command)
Agents (as an autonomous capability)

A tiny example we use a lot: generate conventional commit messages.

.claude/skills/git-utils/SKILL.md

# Smart Commit

Run git diff --staged.
Summarize changes.
Generate a Conventional Commits message (feat/fix/chore).
Commit using the generated message.

The win isn't just convenience. It's that the pipeline becomes composable: the implementer can finish coding and then call smart-commit without you remembering the exact incantation.

Background work: Don't block on long tasks

The most "real life" improvement has been backgrounding.

If you're like us, a lot of the day is:

Run tests
Wait
Context switch
Forget what you were doing

Modern Claude Code flows are much better when long tasks are backgrounded (tests, big grep searches, multi-repo audits). In practice we use it like this:

Implementation agent kicks off a long test run in the background
Meanwhile, the orchestrator is writing the summary, updating docs, or spawning a QA agent
When the background task completes, you get a clean status and can decide what to do next

Tip: Backgrounding pairs beautifully with forked contexts. The agent can do the slow, noisy stuff somewhere else and only report back what matters.

CLAUDE.md evolved from "context dump" to "router"

I used to write CLAUDE.md like it was onboarding documentation for a new hire.

That works until it doesn't - because the file grows, becomes stale, and (ironically) makes the model less reliable.

Now we treat CLAUDE.md like a Project Constitution: short directives + pointers to where the truth lives.

.claude/memory/CLAUDE.md

This keeps "what we want" close at hand, and moves "what we know" into the docs where it belongs.

Hooks: From "next-step suggestion" to guardrails and governance

In Part I, we used a SubagentStop hook to print the next recommended command, based on the queue status. That was already a big upgrade.

The newer hook model is broader:

PreToolUse: block or validate dangerous operations
PostToolUse: log, summarize, or trigger follow-up actions
SubagentStop: route the next step when an agent finishes

A settings.json that does more than "run a script"

.claude/settings.json

We still keep "humans in the loop" as a principle, but we also like removing footguns. Pre-tool validation is a nice middle ground: it doesn't remove agency, it just adds a seatbelt.

MCP at scale: tool search and wildcard permissions

As soon as you connect a few real MCP servers, you hit a practical constraint: tool definitions are big.

Two tactics that helped:

Wildcard tool access for a server (when appropriate), e.g. mcp__postgres__* in a subagent's tool list.
Tool search / lazy loading so the model discovers the tool it needs on-demand instead of carrying the whole catalog in-context all the time.

One small “nice to have” here (especially if you’re building on a platform with a lot of surface area) is an MCP server that acts like a documentation + ops concierge. We’ve been experimenting with that internally, and—light plug—we also published a PubNub MCP server.

Important nuance on setup (this changed vs how some folks initially assumed): The “primary” credential you’ll typically configure is a PubNub Service Integration API key via PUBNUB_API_KEY (i.e., an Admin Portal/API credential), not your publish/subscribe keys. You can still run in a “fixed keyset” mode by providing publish/subscribe keys, but that’s optional—and often unnecessary if you want the agent to operate across multiple apps/keysets in one session.

So, adding it to Claude Code looks like:

If you’re already living in PubNub land, it’s a clean way to let subagents answer “how do I…?” questions with high confidence (by pulling canonical SDK docs), and optionally help with some setup workflows without leaving the editor.

The big idea is the same as everything else in this post: keep the root context clean, and load only what you need.

A note on enterprise guardrails (especially for teams)

If you're rolling this out across a team, you quickly want a few governance knobs:

Managed settings that override local project settings
Restrictions on what hooks can run
Allow/deny lists for MCP servers

Even if you never enable the strictest options, it's worth designing your pipeline assuming the day will come when you need them.

What we're doing at PubNub right now

Our current "default" flow is:

Orchestrator: Triage the request, write/update .claude/plans/active-plan.md
Architect (fork optional): Write guardrails and decisions into .claude/memory/decisions.md
Implementer (forked): Implement + unit tests, write .claude/plans/implementation-summary.md
QA Engineer (forked): Run regression checks, write .claude/plans/qa-summary.md

And then we do the human thing:

Skim the plan
Skim the summaries
Ship...well, no not really. Actually, CI/CD, smoke tests, and the usual blue green deploys.

The surprising part is how much calmer the work feels. Less "what did we decide again?" and more "cool, run the next step." That lines up pretty directly with how we like to operate: point positive, document the work, recognize excellence by making the excellent path the default.

If you're migrating from a Part I setup

Here's the simplest upgrade path:

Keep your existing agents and queue.
Add .claude/plans/ and start writing an active-plan.md for each slug.
Make implementer + QA forked contexts.
Move one or two repeatable actions into Skills.
Add a PreToolUse hook that blocks the most dangerous commands your team worries about.

Don't boil the ocean. Pick one repo, pick one slug, and run it end-to-end.

Closing thoughts

Part I was about turning "AI help" into repeatable progress.

Part II is about making that progress resilient: less context rot, more composable tooling, and fewer accidental sharp edges.

If you're setting this up in a PubNub repo and want a second set of eyes on your agents, hooks, or plan layout, reach out. We’ll be happy to pair on the first run.