Anthropic researchers found that AI agents fail like first-time managers: vague instructions, missing context, zero feedback loops. Here is how to fix that.
Delegating work to AI agents sounds simple in theory. You describe the task, press go, and wait for results. But anyone who has actually deployed an AI agent squad in a real business environment knows the truth: most failures have nothing to do with the AI itself — and everything to do with how the manager delegates.
Definition: AI agent delegation is the practice of assigning tasks, context, and decision-making boundaries to autonomous AI agents. Like managing human teams, effective delegation requires clear instructions, sufficient context, and structured feedback loops — not just a prompt and a prayer.
This insight comes directly from recent research at Anthropic, where engineer Erik Schluntz — who leads multi-agent research — observed that AI agents make "a lot of the same mistakes that first-time managers make." The parallel is striking and carries profound implications for anyone building or managing an AI agent squad.
According to Schluntz, the most common failure mode is giving agents incomplete or unclear instructions. A manager who tells a human employee "handle the client report" without specifying which client, what format, or the deadline will get disappointing results. AI agents are no different.
When a parent agent delegates to sub-agents in a multi-agent system, it tends to assume the sub-agent already has context it does not. The result: the sub-agent produces work that technically fulfills the prompt but completely misses the point.
The fix: Write delegation prompts as if the recipient knows absolutely nothing about your project. Include the goal, the constraints, the format expected, and what success looks like. Anthropic found that through training, Claude learned to become "much more verbose and much more detailed" when communicating with sub-agents — giving them the overall context of what is going on so they can produce work that contributes to the whole.
There is a strong temptation to build elaborate multi-agent architectures with dozens of specialized agents talking to each other. Schluntz warns explicitly against this: "I have seen overbuilt multi-agent systems spend too much time just talking back and forth with each other and not actually making progress on the main task."
He draws a direct parallel to human organizations: "As companies get bigger, you have more communication overhead and less and less work is actually the people on the ground making progress on things."
A McKinsey report on generative AI productivity supports this — organizations that see the highest ROI from AI start with focused, simple deployments and scale incrementally, not the other way around.
The fix: Start with the simplest possible agent configuration. Use a single agent loop first. Only add sub-agents when you have proven that the single-agent approach cannot handle the workload or latency requirements. Every layer of complexity should justify its existence with measurable improvement.
One of the most counterintuitive insights from the Anthropic interview: the tools you give your agents should mirror your UI, not your API. Schluntz gives the example of Slack integration — if your API has three separate endpoints for loading a conversation, resolving a user ID, and resolving a channel ID, giving those three tools to an agent forces it to make three separate calls just to understand one message.
"You want to create a tool or an MCP for the model that presents everything all at once with as little interaction as possible," Schluntz explains. "Just like for a user, it would be terrible if every time you had Slack you had to click on a user ID to see what the name was."
The fix: Design your agent tools from the perspective of the agent as a user. Bundle related information together. Minimize the number of tool calls needed to accomplish a single logical task. Forrester research on AI agent architectures shows that reducing tool-call overhead by consolidating endpoints can improve agent task completion rates by 30-40%.
Schluntz emphasizes a practice that most managers skip entirely: "Put yourself in Claude's shoes and read what it actually gets, what it sees as the model, and make sure there is actually enough information there for you to solve the problem."
Most people write prompts and design agent systems from their own perspective — they know the full context, the business logic, the unstated assumptions. But the agent only sees what you explicitly provide. The gap between what you know and what the agent sees is where most failures originate.
The fix: Regularly review the raw transcripts of your agent interactions. Look at the actual tool calls, the data returned, the prompts received. If you cannot solve the problem with only the information the agent has, neither can the agent. This practice — reviewing from the agent's perspective — is what separates effective agent squad operators from those who blame the AI for poor results.
Perhaps the most critical insight from the interview: the future of effective agents lies in self-verification. Schluntz describes the current state as one where "I have to be Claude's QA engineer" — the human manager is responsible for checking every output.
The evolution, already underway, is agents that can verify their own work. A coding agent that writes a web application and then opens it, tests it, and finds its own bugs before presenting the result. This "closing the loop of testing" is what transforms agents from unreliable assistants into autonomous team members.
The fix: Build verification steps into your agent workflows. After an agent produces output, have it (or a separate verification agent) check the work against defined criteria. Gartner's analysis of agentic AI shows that organizations implementing automated verification loops see a 60% reduction in agent error rates compared to those relying solely on human review.
The throughline of all these mistakes is a single insight: managing AI agents is management. The skills that make someone an effective people manager — clear communication, appropriate context-sharing, smart delegation, verification, and iterative improvement — are exactly the skills that make someone effective at running an AI agent squad.
This is not a technical problem. It is a leadership problem. And organizations that treat AI agent deployment as purely an engineering challenge will consistently underperform those that approach it as a management discipline.
At Agent Squad, this principle is foundational. The platform is designed around the reality that the quality of AI agent output is directly proportional to the quality of human delegation. The tools, the workflows, the monitoring — everything exists to make managers better at the one thing that matters most: giving their agent squad what it needs to succeed.
Most AI agent failures stem from delegation problems, not model limitations. According to Anthropic research, agents fail when they receive incomplete instructions, lack sufficient context, or operate without verification loops — the same reasons human employees underperform when poorly managed.
Design tools from the agent's perspective as a user, not from your API structure. Bundle related information into single tool calls. The goal is to minimize the number of interactions needed for the agent to understand and complete a logical task.
Start with a single agent and only add complexity when proven necessary. Multi-agent systems excel at parallelizable tasks, context-heavy workloads that benefit from delegation, and scenarios requiring specialized tool sets. Avoid multi-agent architectures for problems a single agent can solve — the communication overhead rarely justifies itself.
Build verification loops directly into your agent workflows. Have agents check their own outputs against defined success criteria, or use a separate verification agent. Review raw interaction transcripts regularly to ensure agents are receiving sufficient context to produce quality work.