Deep Expertise Track · Lesson 11
Production Considerations
Cost, latency, state, security — what changes when agents go live
Production Considerations: Cost, Latency, State, Security
Lesson 11 — what changes when you move agents from prototype to production
- The 4 production concerns: cost, latency, state management, security
- How multi-agent patterns multiply cost (and how to optimize)
- Context window management — the silent killer of production agents
- Security: least privilege, audit trails, guardrails at every step
The 4 Production Concerns
1. Cost
Microsoft's guidance: "Multi-agent orchestrations multiply model invocations. Each agent consumes tokens for instructions, context, reasoning, and tool interactions."
| Pattern | Cost profile | Optimization |
|---|---|---|
| Sequential | Linear (N agents × tokens each) | Use smaller models for simple steps |
| Concurrent | Spike (all agents at once) | Monitor API rate limits |
| Handoff | Variable (depends on routing) | Good triage agent prevents wrong routing |
| Magentic | Highest (manager iterates many times) | Hard to predict — monitor per-run cost |
Key optimization: "Not every agent requires the most capable model." Use cheaper models (DeepSeek-chat vs DeepSeek-reasoner) for classification, formatting, and simple tasks.
2. Latency
Sequential: latency = sum of all agent times. Concurrent: latency = max(all agent times) + aggregation. Handoff: latency = sum of agents in the chain. Magentic: unpredictable — can be very slow.
3. Context Window Management
"In multi-agent orchestrations, context windows can grow rapidly because each agent adds its own reasoning, tool results, and intermediate outputs. Monitor accumulated context size and use compaction techniques (summarization or selective pruning) between agents."
4. Security
Microsoft's security recommendations:
- Least privilege: Each agent should have minimum access needed. Don't give every agent database write access.
- Authentication: Secure communication between agents
- Audit trails: Log every tool call and handoff for compliance
- Guardrails at multiple points: User input, tool calls, tool responses, and final output — not just at the edges
- Security trimming: Agents with broad data access must not return data the user isn't authorized to see
Source: Microsoft Azure — AI Agent Orchestration Patterns
The one-sentence summary
Production agents need cost monitoring (use cheaper models for simple tasks), context compaction (summarize between agents), state persistence (external storage, not in-memory), and security at every layer (least privilege, audit trails, guardrails at each step).
Practice Drill
- Run your
ba-work-agentand count the total tokens used (check DeepSeek dashboard) - Run it 5 times. What's the cost variance? Is it predictable?
- Add context summarization: after each tool call, summarize the observation to 100 tokens before feeding back
- Think about security: what if your agent had access to a production database? What guardrails would you add?
Q1: Why does context grow in multi-agent systems, and what's the fix?
Show answer
Each agent adds reasoning, tool outputs, and intermediate results. By agent 5-10, the accumulated context can exceed the model's window. Fix: summarize or selectively prune between agents — pass a compressed version instead of the full raw context.
Q2: Which orchestration pattern is the most expensive and why?
Show answer
Magentic — the manager agent iterates many times, revising the plan and calling specialists repeatedly. Cost is hard to predict because the number of iterations depends on the problem. Microsoft notes it's "the most variable" in cost.
Want to see these patterns in action?
Explore the live apps built with these agent architectures.
Explore the Lab →