Multi-Agent Orchestration Using LangGraph

The first generation of LLM applications was a single model call wrapped in a UI. The second generation was a chain — retrieval, reasoning, output formatting — still mostly linear. The third generation, which is where most non-trivial production systems live now, is a graph: multiple specialized agents collaborating, branching on intermediate state, looping until a condition holds, and occasionally pausing for a human to intervene. LangGraph is one of a small number of frameworks designed for exactly this shape of computation, and at the time of writing it is the most widely adopted in the Python LLM ecosystem.

This post is an engineering view of LangGraph: what it actually is, how its execution model works, where it fits in the orchestration landscape, and what the production concerns look like.

Why Graphs, Not Chains

Chains (the LangChain Runnable model) are directed acyclic pipelines: input flows in, output flows out, each step runs once. They are correct and sufficient for a large class of LLM applications — summarization, classification, single-shot RAG.

They break down as soon as you need any of:

Conditional branching. “If the retrieved docs are insufficient, run a follow-up search.”
Cycles and retries. “Plan, execute, check the result, replan if it failed.”
Multiple agents. “Route the request to the right specialist agent, then merge results.”
Human-in-the-loop. “Pause here, wait for human approval, then continue from this state.”
Stateful long-running workflows. “This conversation needs to remember 50 turns of decisions.”

LangGraph models these as a directed graph (cycles allowed) over a typed state object. Nodes mutate state; edges (static or conditional) decide what runs next. The graph is the program; the state is the memory.

The State Model

LangGraph’s central abstraction is the state: a TypedDict (or Pydantic model) that flows through every node. Nodes return partial updates; the framework merges them according to per-key reducers.

from typing import Annotated, TypedDict
from operator import add
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    plan: list[str]
    completed_steps: Annotated[list[str], add]
    final_answer: str | None

add_messages is the canonical reducer for conversation history — it appends, deduplicates by ID, and supports message updates. operator.add works for plain list accumulation. The default reducer (when none is specified) replaces the value.

Two design rules earn their keep:

Make every node’s contract explicit in the state schema. If a node produces tool_results, it should be a typed field. Magic strings stuffed into a dict are the orchestration equivalent of **kwargs everywhere.
Keep state serializable. Checkpointing requires it. Avoid putting active DB connections, file handles, or open streams into state — keep those in module-level singletons or context.

Nodes, Edges, and Conditional Routing

A node is any callable (state) -> partial_state. It can be a single LLM call, a tool invocation, a sub-agent, or arbitrary Python. Edges connect nodes; conditional edges evaluate a function on state and return the name of the next node.

from langgraph.graph import StateGraph, START, END

def router(state: AgentState) -> AgentState:
    intent = classify(state["messages"][-1].content)
    return {"plan": [intent]}

def code_agent(state: AgentState) -> AgentState: ...
def analyst_agent(state: AgentState) -> AgentState: ...
def critic(state: AgentState) -> AgentState: ...

def route_after_router(state: AgentState) -> str:
    return state["plan"][-1]

def route_after_critic(state: AgentState) -> str:
    return "router" if state.get("needs_revision") else "synthesizer"

graph = StateGraph(AgentState)
graph.add_node("router", router)
graph.add_node("code", code_agent)
graph.add_node("analyst", analyst_agent)
graph.add_node("critic", critic)
graph.add_node("synthesizer", synthesizer)

graph.add_edge(START, "router")
graph.add_conditional_edges(
    "router", route_after_router, {"code": "code", "data": "analyst"}
)
graph.add_edge("code", "critic")
graph.add_edge("analyst", "critic")
graph.add_conditional_edges(
    "critic", route_after_critic,
    {"router": "router", "synthesizer": "synthesizer"},
)
graph.add_edge("synthesizer", END)

app = graph.compile(checkpointer=checkpointer)

The compiled graph is a Runnable — it has invoke, stream, and astream methods, integrates with LangChain’s streaming, and supports OpenTelemetry tracing out of the box.

The ReAct Pattern as a Graph

The single most common agent topology is ReAct: the model alternates between reasoning, calling tools, observing the result, and reasoning again until it produces a final answer. LangGraph ships a create_react_agent helper, but understanding the underlying graph is worthwhile:

Two nodes, one cycle. The cycle terminates when the LLM produces a message with no tool calls. The graph is trivially small but expresses the core agentic loop precisely.

Putting a recursion limit on the compiled graph (app.invoke(state, config={"recursion_limit": 25})) is mandatory — without it, a misbehaving model can loop indefinitely.

Checkpointing and Durability

A graph’s checkpointer persists state at every node boundary. This is the feature that takes LangGraph from “agent framework” to “workflow engine.”

Resume from failure. A node throws? Re-invoke with the same thread_id and execution resumes from the last successful checkpoint.
Time travel. Roll back to an earlier state and try a different branch.
Human-in-the-loop. Interrupt before a node, present the proposed action to a human, then resume on approval.
Long-running workflows. A graph can persist for days. Resume on demand.

Available checkpointers: in-memory (testing), SQLite (single-process production), Postgres (multi-process production), Redis (when you already run it). Choose Postgres for anything you care about — it’s the only option with serious operational properties (replication, point-in-time recovery, decent observability).

from langgraph.checkpoint.postgres import PostgresSaver

with PostgresSaver.from_conn_string(DB_URL) as ckpt:
    ckpt.setup()  # creates tables
    app = graph.compile(checkpointer=ckpt)
    app.invoke({"messages": [...]}, config={"configurable": {"thread_id": "u123"}})

The thread_id is the durable identity of a workflow run. Use a stable identifier per user session or per logical workflow instance.

Human-in-the-Loop

Real production agentic systems almost always need approvals — for tool calls that spend money, modify production data, send emails, or commit code. LangGraph supports this natively via interrupts:

from langgraph.types import interrupt

def maybe_send_email(state):
    if not state["draft"]:
        return {}
    approval = interrupt({"draft": state["draft"], "to": state["recipient"]})
    if approval == "approve":
        send(state["draft"])
        return {"sent": True}
    return {"sent": False, "reason": approval}

The first time execution hits interrupt, the graph pauses and the checkpointer persists state. The caller’s UI surfaces the approval request. When the human responds, the runtime resumes with the response substituted for the interrupt’s return value. This pattern is far more robust than a side-channel “approval queue” wired into the agent prompt.

Subgraphs and Multi-Agent Composition

A node can itself be a compiled graph. This composition pattern is how multi-agent systems stay manageable:

Specialist subgraphs — a “code reviewer” subgraph with its own retrieval, analysis, and synthesis nodes — are encapsulated and reusable.
Supervisor patterns — a top-level graph routes to specialist subgraphs based on intent.
Swarm patterns — agents hand off control to each other directly via a shared state field naming the next agent.

The supervisor pattern is the most production-friendly. It centralizes routing, makes the system’s flow explicit, and avoids the failure mode where agents argue with each other indefinitely. Swarms look clever in demos and produce difficult-to-debug emergent behavior in production.

Streaming and Observability

LangGraph streams in three modes:

updates — emits per-node state diffs. Use this to drive UIs that show what each agent is doing.
messages — emits LLM tokens as they arrive. Use this for chat-style responses.
values — emits the full state after each step. Most useful for debugging.

OpenTelemetry integration (via LangSmith or generic OTel exporters) gives you per-node spans, LLM call attributes (model, tokens, latency), and end-to-end traces of a workflow execution. This is non-negotiable in production. Without traces, diagnosing why an agent took a wrong turn at step 7 of a 12-step run is hopeless.

Metrics to track per node:

Latency distribution (p50/p95/p99).
Tool error rate.
Loop count (how many times each cyclic node executed per run).
Recursion limit hits.
Checkpoint write latency.

Failure Modes Worth Knowing

A few patterns that bite production LangGraph deployments:

Unbounded loops. Even with recursion_limit, a graph that loops 25 times before giving up wastes tokens and time. Add explicit termination conditions to cyclic edges based on state, not just iteration count.
State bloat. messages lists grow unboundedly. Implement summarization or windowing nodes that compress old turns into a single summary message before they exhaust the context window.
Non-determinism across resumes. If a node calls datetime.now() or generates a random ID, replaying from a checkpoint gives different results. Inject these via state or context, not directly.
Concurrent execution on the same thread_id. Two processes invoking the same thread simultaneously corrupts checkpoints. Use advisory locks (Postgres) or a job queue with at-most-one semantics.
Schema migrations. Changing the AgentState shape invalidates existing checkpoints. Version the schema and write migration paths or accept that in-flight workflows will be discarded on deploy.

Trade-offs vs. Alternatives

LangGraph is not the only option. A few honest comparisons:

vs. raw LangChain Expression Language (LCEL). LCEL is enough for linear pipelines. Reach for LangGraph the moment you need state, branching, or cycles.
vs. AutoGen. AutoGen leans into conversational multi-agent patterns (agents talking to each other in messages). LangGraph leans into explicit graph structure. Graphs are easier to reason about, debug, and operate.
vs. Temporal / Airflow with LLM steps. Workflow engines beat LangGraph on durability, scheduling, and operational maturity. LangGraph beats them on LLM-native features (streaming, message reducers, native tool integration). A real production stack often uses both: Temporal for the outer workflow, LangGraph for the agentic sub-steps.
vs. building your own. Don’t, unless you have a specific reason. The state-and-checkpointing layer is the part that takes the longest to get right.

Closing

LangGraph’s contribution is not novel — directed-graph workflow engines are decades old, and conditional reactive systems older than that. Its contribution is the right abstraction at the right level for LLM agents: a typed state object that flows through nodes, durable checkpoints for resuming and human-in-the-loop, native streaming for UI integration, and a graph compiler that integrates with the rest of the LangChain ecosystem. For multi-agent systems specifically, it forces you to make coordination explicit — every routing decision, every handoff, every loop is a visible edge — and that explicitness is what makes the difference between a clever demo and a system you can operate, debug, and extend. Build the graph deliberately, keep the state schema clean, checkpoint to Postgres, and trace every node. The rest is application logic.