Deep Expertise Track · Lesson 8

Group Chat Orchestration

Multi-agent Pattern 4: agents collaborate through shared conversation

Group Chat Orchestration: Debate and Maker-Checker
Lesson 8 — Multi-agent Pattern 4: agents collaborate through a shared conversation thread

What you'll learn

What group chat orchestration is (roundtable, multi-agent debate, council)

The maker-checker sub-pattern (generator-verifier loop, reflection)

How it maps to Anthropic's "Evaluator-Optimizer" workflow

Why Microsoft recommends limiting to 3 agents max

The Pattern

┌──────────────────────────────────────────────────────────────┐ │ GROUP CHAT ORCHESTRATION │ │ (roundtable / debate / council) │ │ │ │ ┌────────────────────────────────────────────┐ │ │ │ CHAT MANAGER │ │ │ │ (decides who speaks next, when done) │ │ │ └────┬──────────┬──────────┬──────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ Agent A│ │ Agent B│ │ Agent C│ (shared conversation) │ │ │"env" │ │"budget"│ │"comm" │ │ │ └───┬────┘ └───┬────┘ └───┬────┘ │ │ │ │ │ │ │ └──────────┴──────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ Accumulating │ ──▶ Result (consensus) │ │ │ chat thread │ │ │ └──────────────┘ │ └──────────────────────────────────────────────────────────────┘

Source: Microsoft Azure — AI Agent Orchestration Patterns

The Maker-Checker Sub-Pattern

Microsoft defines a specific type of group chat called maker-checker (also known as evaluator-optimizer, generator-verifier, or reflection loop):

MAKER-CHECKER LOOP (evaluator-optimizer): ┌──────────────┐ ┌──────────────┐ │ MAKER │──── draft ────────▶│ CHECKER │ │ (generator) │ │ (evaluator) │ └──────┬───────┘ └──────┬───────┘ ▲ │ │ │ └──── "revise these issues" ────────┘ │ ▼ ┌──────────────┐ │ MAKER │──── revised draft ──▶ CHECKER └──────────────┘ │ ▼ ... (repeat until checker approves or max iterations)

Source: Anthropic — Building Effective Agents (Evaluator-Optimizer section)

When to Use vs Avoid

Use when	Avoid when
Consensus-building needed	Basic task delegation is sufficient
Quality control via debate	Real-time processing (chat is slow)
Multidisciplinary discussion	Deterministic workflow without discussion
Iterative refinement (maker-checker)	No clear way to determine completion

Microsoft's Recommendation

"To maintain effective control, consider limiting group chat orchestration to three or fewer agents." More agents = harder to manage conversation flow and prevent infinite loops.

Build It: Maker-Checker for BRD Quality

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
import os

llm = ChatOpenAI(model="deepseek-chat", api_key=os.getenv("DEEPSEEK_API_KEY"),
                 base_url="https://api.deepseek.com", temperature=0)

maker_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a BA document writer. Write or revise a BRD section based on "
               "the requirements and any feedback from the reviewer."),
    ("user", "Requirements: {requirements}\n\nPrevious draft: {draft}\n\n"
             "Reviewer feedback: {feedback}"),
])

checker_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a strict BRD quality reviewer. Evaluate the draft against:
1. Are user stories in proper format (As a... I want... So that...)?
2. Are acceptance criteria testable (Given/When/Then)?
3. Is scope clearly defined?
4. Are edge cases covered?

If ALL criteria are met, respond: "APPROVED"
If issues remain, respond: "REVISION NEEDED: [list issues]"""),
    ("user", "{draft}"),
])

def run_maker_checker(requirements: str, max_rounds: int = 3):
    draft = ""
    feedback = "No previous draft. Write the first version."
    
    for round_num in range(max_rounds):
        print(f"\n--- Round {round_num + 1} ---")
        
        # Maker generates/revises
        print("Maker: Writing draft...")
        draft = (maker_prompt | llm).invoke({
            "requirements": requirements,
            "draft": draft or "(none yet)",
            "feedback": feedback,
        }).content
        
        # Checker evaluates
        print("Checker: Reviewing draft...")
        review = (checker_prompt | llm).invoke({"draft": draft}).content
        
        if "APPROVED" in review:
            print("Checker: APPROVED!")
            return draft
        
        print(f"Checker: {review[:150]}...")
        feedback = review
    
    print("Max rounds reached. Returning last draft.")
    return draft

result = run_maker_checker("Login page with OTP, forgot password, social login, and session timeout")
print(f"\n=== FINAL BRD ===\n{result}")

The one-sentence summary

Group chat orchestration lets agents collaborate through a shared conversation — best for consensus-building and quality control, but keep it to 3 agents max and always set iteration caps to prevent infinite debate loops.

Practice Drill

Create ba-work-agent/maker_checker.py with the code above
Run it. How many rounds does it take to get APPROVED?
Make the checker stricter — add a 5th criterion. Does it take more rounds?
Try making the checker too strict (never approves). What happens at max_rounds?

⚡ Quick Check

Q1: What's the difference between maker-checker and the evaluator-optimizer pattern from Anthropic?

Show answer

They're the same pattern under different names. Anthropic calls it "evaluator-optimizer" — one LLM generates, another evaluates and gives feedback, loop until good enough. Microsoft calls it "maker-checker." Same concept: generate → evaluate → revise → repeat.

Q2: Why does Microsoft recommend max 3 agents for group chat?

Show answer

Conversation flow becomes unmanageable with more agents. The chat manager has to decide who speaks next, and with 5+ agents, turn-taking gets chaotic, loops become likely, and the conversation thread grows too long. 3 agents with distinct roles is the sweet spot.

Handoff Orchestration Magentic Orchestration

← Back to Deep Expertise Track

Want to see these patterns in action?

Explore the live apps built with these agent architectures.

Explore the Lab →

← Back to Deep Expertise Track