Deep Expertise Track · Lesson 2

The ReAct Loop

Building an agent from scratch with zero frameworks — the Think-Act-Observe loop

The ReAct Loop: Building an Agent From Scratch

Lesson 2 — understand the agent loop by building one with zero frameworks

What you'll learn

What ReAct (Reason + Act) actually is — from the original 2022 paper
How to build a working agent in 50 lines of Python with NO framework
Why the loop is the defining feature of an agent (not the tools, not the LLM)
The 3 failure modes of the ReAct loop and how to guard against them

What is ReAct?

ReAct (pronounced "ree-act") was introduced in a 2022 paper by Yao et al. at Princeton University. The name is a portmanteau of Reasoning + Acting. The core insight is simple but powerful:

The ReAct Insight

LLMs are good at reasoning (chain-of-thought) but can't take action. They're good at acting (generating text) but their reasoning degrades without external feedback. Combine both in a loop — reason about what to do, take action (call a tool), observe the result, reason again — and you get an agent that's smarter than either approach alone.

Source: Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022)

The ReAct Loop Visualized

┌──────────────────────────────────────────────────────────┐ │ THE REACT LOOP │ │ │ │ ┌──────────┐ │ │ │ THOUGHT │ "I need to find the stock price" │ │ └────┬─────┘ │ │ ▼ │ │ ┌──────────┐ │ │ │ ACTION │ get_stock_price("SBIN") │ │ └────┬─────┘ │ │ ▼ │ │ ┌──────────┐ │ │ │OBSERVATION│ "SBIN current price: ₹1,054" │ │ └────┬─────┘ │ │ │ │ │ ▼ │ │ ┌──────────┐ │ │ │ THOUGHT │ "Got the price. Now I need financials" │ │ └────┬─────┘ │ │ ▼ │ │ ┌──────────┐ │ │ │ ACTION │ get_financials("SBIN", "quarterly") │ │ └────┬─────┘ │ │ ▼ │ │ ┌──────────┐ │ │ │OBSERVATION│ "Revenue up 18%, NIM compressed..." │ │ └────┬─────┘ │ │ │ │ │ ▼ │ │ ┌──────────┐ │ │ │ THOUGHT │ "I have enough data to answer" │ │ └────┬─────┘ │ │ ▼ │ │ ┌──────────┐ │ │ │ FINAL │ "HOLD SBIN. Revenue strong but..." │ │ │ ANSWER │ │ │ └──────────┘ │ │ │ │ The loop runs N times. The LLM decides when to stop. │ └──────────────────────────────────────────────────────────┘

Build It From Scratch (No Framework)

Anthropic recommends: "Start by using LLM APIs directly. Many patterns can be implemented in a few lines of code." Let's prove it. Here's a working ReAct agent in ~50 lines of Python using only the OpenAI SDK (which works with DeepSeek):

Source: Anthropic — Building Effective Agents

import json
from openai import OpenAI

client = OpenAI(api_key="your-key", base_url="https://api.deepseek.com")

# --- Define tools as plain Python functions ---
def get_stock_price(ticker: str) -> str:
    """Get current stock price"""
    return f"{ticker} is at ₹1,054"

def get_financials(ticker: str) -> str:
    """Get quarterly financials"""
    return f"{ticker}: Revenue ₹85,000Cr, Net Profit ₹17,000Cr, NIM 3.2%"

def search_news(query: str) -> str:
    """Search recent news"""
    return f"RBI may cut rates next quarter. Positive for banks."

# Registry: tool name → function
TOOLS = {
    "get_stock_price": get_stock_price,
    "get_financials": get_financials,
    "search_news": search_news,
}

# --- The ReAct Loop ---
def run_agent(goal: str, max_iterations: int = 10):
    """The simplest possible agent. No framework. Just a loop."""
    
    # The system prompt tells the LLM HOW to reason
    system_prompt = f"""You are a stock research agent.
Available tools: {list(TOOLS.keys())}

To use a tool, output EXACTLY this format:
Thought: your reasoning about what to do next
Action: tool_name
Action Input: the argument to pass to the tool

When you have enough information, output:
Thought: I now have enough information
Final Answer: your complete answer

Tools available: {list(TOOLS.keys())}"""

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": goal},
    ]

    for i in range(max_iterations):
        print(f"\n--- Iteration {i+1} ---")
        
        # 1. Call the LLM
        response = client.chat.completions.create(
            model="deepseek-chat",
            messages=messages,
            temperature=0,
        )
        output = response.choices[0].message.content
        print(output)
        
        # 2. Check if the LLM gave a final answer
        if "Final Answer:" in output:
            return output.split("Final Answer:")[1].strip()
        
        # 3. Parse the action from the LLM output
        if "Action:" not in output:
            messages.append({"role": "assistant", "content": output})
            messages.append({"role": "user", "content": "Please use a tool or give a Final Answer."})
            continue
        
        # Extract tool name and input
        lines = output.split("\n")
        action_line = [l for l in lines if l.startswith("Action:")][0]
        action = action_line.split("Action:")[1].strip()
        
        input_line = [l for l in lines if l.startswith("Action Input:")][0]
        action_input = input_line.split("Action Input:")[1].strip()
        
        # 4. Call the tool and get observation
        if action in TOOLS:
            observation = TOOLS[action](action_input)
        else:
            observation = f"Error: unknown tool '{action}'"
        
        print(f"Observation: {observation}")
        
        # 5. Feed observation back to the LLM (THE LOOP)
        messages.append({"role": "assistant", "content": output})
        messages.append({"role": "user", "content": f"Observation: {observation}"})
    
    return "Max iterations reached without a final answer."

# --- Run it ---
result = run_agent("Should I hold or sell SBIN?")
print(f"\n=== RESULT ===\n{result}")

What's Happening in Each Iteration

ITERATION 1: Messages: [system, user(goal)] LLM output: "Thought: I need the stock price. Action: get_stock_price Action Input: SBIN" Parser: extracts action=get_stock_price, input=SBIN Tool call: get_stock_price("SBIN") → "SBIN is at ₹1,054" Messages: [system, user(goal), assistant(thought+action), user(observation)] ITERATION 2: Messages: [system, user(goal), assistant(thought+action), user(observation)] LLM output: "Thought: Got price. Need financials. Action: get_financials Action Input: SBIN" Parser: extracts action=get_financials, input=SBIN Tool call: get_financials("SBIN") → "Revenue ₹85,000Cr..." Messages: [..., assistant(thought+action), user(observation)] ITERATION 3: LLM output: "Thought: Need recent news. Action: search_news Action Input: SBI bank news" Tool call: search_news(...) → "RBI may cut rates..." ITERATION 4: LLM output: "Thought: I have enough info. Final Answer: HOLD SBIN. Price ₹1,054, revenue up 18%, positive RBI news..." Parser: sees "Final Answer:" → returns answer, loop ends

The 3 Failure Modes

Failure	What happens	Guard
Infinite loop	LLM keeps calling tools without ever giving a Final Answer	`max_iterations` cap. Always set this.
Parse failure	LLM doesn't follow the Thought/Action format. Output can't be parsed.	`handle_parsing_errors` — send "please use the correct format" back to LLM
Hallucinated tool	LLM calls a tool that doesn't exist	Tool registry check. Return error message, LLM retries with correct tool.

These are the exact same failure modes that LangChain's AgentExecutor handles for you. That's why frameworks exist — they solve these problems once so you don't have to. But now you know what's happening under the hood.

The one-sentence summary

ReAct is a loop where the LLM reasons (Thought), picks a tool (Action), sees the result (Observation), and repeats until it has enough to answer — and you can build it in 50 lines with no framework.

Practice Drill

Create a new file react_from_scratch.py in your ba-work-agent project
Copy the code above and add your DEEPSEEK_API_KEY
Run it: python react_from_scratch.py
Watch the loop execute. Count how many iterations it takes.
Now change the goal to something vague like "Tell me about SBIN" — does the agent handle it differently?
Try removing a tool from the TOOLS dict — does the agent recover when it tries to call a missing tool?

⚡ Quick Check

Q1: In the ReAct loop, what triggers the loop to end?

Show answer

The LLM outputs "Final Answer:" — the parser detects this and returns the answer, breaking the loop. If the LLM never does this, max_iterations is the safety net.

Q2: Why does the observation get sent back as a "user" message, not an "assistant" message?

Show answer

Because the observation is the TOOL's output, not the LLM's output. The LLM is the assistant; tools are external. The message alternation (assistant → user → assistant) keeps the conversation well-formed for the API. This is a chat completions API constraint — messages must alternate roles.

Q3: What's the difference between this from-scratch agent and your LangChain ba-work-agent?

Show answer

Functionally identical. LangChain's create_react_agent + AgentExecutor does the exact same loop — parse LLM output, extract action, call tool, feed observation back. The difference is LangChain handles: (1) parsing errors, (2) tool registration with schemas, (3) verbose logging, (4) async support, (5) streaming. You're now paying the abstraction tax for convenience.

Where to Go Deeper

ReAct Paper (Yao et al., 2022) — Read sections 1-3 for the theoretical foundation.
IBM: What is a ReAct Agent? — Accessible overview for quick reference.
Anthropic: Building Effective Agents — The "Agents" section at the bottom covers the autonomous agent loop.

The Agent Spectrum LangChain Agents

← Back to Deep Expertise Track

Want to see these patterns in action?

Explore the live apps built with these agent architectures.

Explore the Lab →

← Back to Deep Expertise Track