Deep Expertise Track · Lesson 4

Tool Design for Agents

Agent-Computer Interface design, poka-yoke, and why tool design matters

Tool Design for Agents: The Art of ACI

Lesson 4 — Agent-Computer Interface design, poka-yoke, and why tool descriptions matter more than prompts

What you'll learn
  1. What ACI (Agent-Computer Interface) is and why Anthropic says it's as important as HCI
  2. The 5 principles of good tool design for agents
  3. How to poka-yoke (mistake-proof) your tools so the LLM can't misuse them
  4. Why Anthropic spent more time optimizing tools than the overall prompt for SWE-bench

ACI: The New HCI

Anthropic's team, while building their coding agent for SWE-bench, discovered something counterintuitive:

Anthropic's Discovery

"While building our agent for SWE-bench, we actually spent more time optimizing our tools than the overall prompt."

"One rule of thumb: think about how much effort goes into human-computer interfaces (HCI), and plan to invest just as much effort in creating good agent-computer interfaces (ACI)."

Source: Anthropic — Building Effective Agents, Appendix 2

In other words: your tool definitions ARE your prompt. A well-designed tool is self-documenting. A poorly designed tool causes the LLM to make mistakes no amount of prompt engineering can fix.

The 5 Principles of Good Tool Design

┌────────────────────────────────────────────────────────────┐ │ GOOD TOOL vs BAD TOOL │ │ │ │ BAD: GOOD: │ │ ┌──────────────────────┐ ┌──────────────────────┐ │ │ │ @tool │ │ @tool │ │ │ │ def get_data(t): │ │ def get_stock_price( │ │ │ │ """Returns data""" │ │ ticker: str │ │ │ │ ... │ │ ) -> str: │ │ │ └──────────────────────┘ │ """Get current │ │ │ │ stock price. Use │ │ │ Problems: │ this FIRST before │ │ │ - What data? Stock? News? │ any analysis. │ │ │ - What format for t? │ Pass NSE symbol │ │ │ - When to use this? │ like 'SBIN'.""" │ │ │ - LLM will guess wrong │ ... │ │ │ └──────────────────────┘ │ │ │ │ LLM sees: "get_data" LLM sees: "get_stock_price"│ │ LLM guesses what it does LLM knows exactly what + │ │ Probably calls it wrong when + how to call it │ └────────────────────────────────────────────────────────────┘

Principle 1: Descriptive Names and Docstrings

The LLM reads the tool name + docstring to decide whether to call it. Write for the LLM, not for yourself.

BadGood
def get_data(t)
"Returns data"
def get_stock_price(ticker: str)
"Get current stock price. Use this FIRST before analysis. Pass NSE symbol like 'SBIN'."
def process(d)
"Process input"
def count_by_priority(file_path: str)
"Count JIRA tickets by priority from CSV. Use after read_jira_export."

Principle 2: Poka-yoke (Mistake-Proofing)

Anthropic found the LLM made mistakes with relative file paths. Their fix: require absolute paths always. The LLM used it flawlessly after that.

# BAD: LLM might pass "tickets.csv" or "./data/tickets.csv" or "data/tickets.csv"
def read_file(path: str):
    """Read a file"""

# GOOD: Forces the LLM to think about the full path
def read_file(absolute_path: str):
    """Read a file. MUST provide absolute path like '/home/user/data/tickets.csv'."""

Principle 3: Give the LLM Room to Think

"Give the model enough tokens to think before it writes itself into a corner."

Don't constrain the output format so tightly that the LLM can't reason. JSON outputs that require complex escaping (newlines, quotes inside strings) are harder for LLMs than plain text or simple JSON.

Principle 4: Keep Formats Natural

"Keep the format close to what the model has seen naturally occurring in text on the internet."

Markdown is easier for LLMs than custom JSON schemas. Plain text descriptions are easier than structured XML.

Principle 5: Test Your Tools With Real Inputs

"Run many example inputs to see what mistakes the model makes, and iterate."

This is the same as testing a UI with real users. You can't predict how the LLM will (mis)use your tools until you watch it try.

The one-sentence summary

Your tool descriptions are the most important prompt engineering you'll do — invest as much effort in ACI as you would in HCI, because the LLM's entire decision-making depends on understanding your tools.

Practice Drill

  1. Open ba-work-agent/tools/jira_tools.py and stock-research-agent/tools/market_tools.py
  2. Rate each tool's docstring against the 5 principles. Which ones are good? Which need improvement?
  3. Rewrite one tool's docstring using the principles above. Add "Use this FIRST" or "Use after X" guidance.
  4. Run the agent with the improved docstring. Does it call the tools in a better order?
⚡ Quick Check
Q1: Anthropic says they spent MORE time on tool design than on the main prompt for SWE-bench. Why?
Show answer

Because tools are how the agent interacts with the world. If a tool's description is ambiguous, the LLM calls it wrong, passes bad arguments, or doesn't call it at all. No amount of system prompt engineering fixes a poorly described tool. The tool description IS part of the prompt — the most important part.

Q2: What is poka-yoke in the context of agent tools?
Show answer

Poka-yoke (Japanese for "mistake-proofing") means designing the tool so it's HARD to use incorrectly. Example: requiring absolute file paths instead of relative ones, so the LLM can't get confused about which directory it's in. Or naming a parameter ticker_symbol instead of just t so the LLM knows to pass "SBIN" not "State Bank of India".

Where to Go Deeper

Want to see these patterns in action?

Explore the live apps built with these agent architectures.

Explore the Lab →

← Back to Deep Expertise Track