Building MCP-Ready AI Agents: Architecture That Scales
Article
Building MCP-Ready AI Agents: Architecture That Scales
If you've been following the AI tooling space in 2024-2025, you've heard the term MCP thrown around a lot. Model Context Protocol — Anthropic's open standard for connecting AI models to tools, data sources, and external systems — has quietly become the backbone of serious agentic architectures. Not because of hype, but because it solves a genuinely ugly problem: how do you give an AI agent reliable, composable access to the outside world without writing bespoke integration code for every tool it needs?
This is a practical guide. We build these systems. Here's what actually works.
What MCP Is (and What It Isn't)
MCP is a protocol, not a framework. Think of it like HTTP for AI-tool communication — it defines how a host (an LLM client like Claude Desktop, or your own application) connects to servers that expose capabilities: tools, resources, and prompts.
An MCP server is a lightweight process that declares what it can do. An MCP client (your agent) discovers those capabilities and invokes them. The protocol handles serialization, transport, and capability negotiation. You write the business logic.
What MCP is not: it's not a cloud service, not a vendor lock-in, and not magic. An MCP server is code you (or someone in the community) writes and runs. The protocol is open. The implementations are yours to control.
The reason this matters is standardization. Before MCP, every AI integration was ad-hoc: custom function definitions, custom JSON schemas, custom error handling. With MCP, you write one server, and any MCP-compatible client can use it. That's a meaningful shift.
The Orchestrator Pattern
The architecture pattern that unlocks real scale is the orchestrator: a main agent that coordinates specialized sub-agents, each responsible for a bounded domain.
User Request
│
▼
┌─────────────────┐
│ Orchestrator │ ← Main agent (Claude, GPT-4o, etc.)
│ Agent │ Decides which sub-agents to invoke
└────────┬────────┘ Handles conversation state
│
┌────┴─────┐
│ │
▼ ▼
┌───────┐ ┌───────┐
│Search │ │ Data │ ← Sub-agents (specialized MCP servers)
│ Agent │ │ Agent │ Each owns a specific domain
└───────┘ └───────┘
│ │
▼ ▼
Web/APIs Database
The orchestrator doesn't do the work — it delegates. This is crucial. When you put everything in one agent, you get context bloat, inconsistent behavior, and debugging nightmares. When you split responsibilities across specialized sub-agents connected via MCP, each component stays focused and testable.
The orchestrator pattern also maps naturally to A2A (agent-to-agent) communication, where one agent's output becomes another's input. MCP gives you the transport layer for this without having to define custom APIs between agents.
Tool Use: The Right Abstraction
MCP tools are the atomic unit of agent capability. Each tool has a name, a description (this is what the LLM reads to decide when to use it), and an input schema. The server implements the handler; the client calls it.
# MCP server tool definition (simplified)
@server.tool()
async def search_documents(
query: str,
top_k: int = 5,
filters: dict | None = None
) -> list[dict]:
"""
Search the knowledge base for documents relevant to a query.
Use this when you need to find information from internal sources.
Returns ranked results with content and metadata.
"""
results = await vector_store.search(
query=query,
limit=top_k,
filters=filters
)
return [{"content": r.content, "score": r.score, "source": r.source}
for r in results]
A few things matter here that are easy to get wrong.
Description quality determines tool selection accuracy. The LLM picks which tool to call based on the description. If your description is vague or overlapping with another tool's description, you get unpredictable tool selection. Write descriptions like you're writing documentation for a junior developer who doesn't know your system.
Input schemas should be strict. Use required fields, enums where applicable, and clear field descriptions. The more constrained the input, the more reliably the agent will call the tool correctly.
Return only what the agent needs. If your tool returns 2MB of JSON and the agent only needs three fields, you're burning context and slowing down inference. Filter on the server side.
Resources and Prompts
Tools are the most visible part of MCP, but resources and prompts are equally important.
Resources are data sources the agent can read: files, database rows, API responses. They're identified by URI and can be listed, read, and subscribed to. In an orchestrator architecture, resources let sub-agents expose their state to the orchestrator without building custom read APIs.
Prompts are reusable prompt templates that MCP servers can expose. This sounds niche, but it's powerful: you can centralize your system prompt logic in an MCP server, version it, and serve it consistently across all agents that use your server.
How Jeeves Implements This
Our internal project Jeeves is a business automation agent we've built for production use. It follows the orchestrator pattern with four specialized sub-agents, each running as an independent MCP server:
- Research Agent: web search, content extraction, summarization
- Data Agent: database queries, report generation, data transformation
- Calendar Agent: scheduling, meeting coordination, reminders
- Communication Agent: email drafting, Slack messages, notification routing
The orchestrator is a Claude-powered main agent that maintains conversation context and decides which sub-agents to invoke based on the user's intent.
# Orchestrator decision flow (pseudo-code)
user_message = "Summarize last week's sales and schedule a review meeting"
# Orchestrator identifies required sub-agents
tasks = orchestrator.plan(user_message)
# → [
# Task(agent="data", action="query_sales", params={"period": "last_week"}),
# Task(agent="calendar", action="find_slot", params={"duration": 60}),
# Task(agent="calendar", action="create_meeting", params={"..."}),
# ]
# Execute with dependency resolution
results = await orchestrator.execute(tasks)
# Synthesize and respond
response = orchestrator.synthesize(results)
The key architectural decision: each sub-agent is independently deployable. When the Calendar Agent needs an update, we redeploy it without touching the orchestrator or other agents. This is the practical payoff of MCP's separation between host and server.
Production Considerations
State management is your problem, not MCP's. The protocol is stateless per invocation. If your agent needs to maintain state across tool calls (and it will), you need a state store. We use Redis for short-lived session state and PostgreSQL for persistent context.
Error handling must be explicit. MCP servers can return errors, but the LLM needs to know what to do with them. Design your error responses as structured objects with a code, message, and optional retry guidance. Don't rely on exception messages reaching the model meaningfully.
Timeout everything. Tool calls in production fail. Networks partition. External APIs go down. Every tool handler should have an explicit timeout and a graceful degradation path. An orchestrator that hangs because one sub-agent is unresponsive is worse than one that fails fast and tells the user what happened.
Observability matters more than you think. Log every tool invocation with input, output, duration, and the LLM's stated reasoning (available from the model's response). When something goes wrong — and it will — this is what you debug with.
Why MCP Wins Long-Term
The honest answer to "why MCP over custom function calling" is ecosystem. As MCP adoption grows, you get:
- Community-built servers you can drop into your architecture (there are already hundreds on mcpservers.org)
- Tooling that works across MCP-compatible hosts — build once, run in Claude Desktop, your custom app, your CI pipeline
- Standardized security and capability negotiation that you'd otherwise build yourself
We're still early. The protocol will evolve. But the architectural principles — separation of concerns, declarative capability discovery, composable tool integration — are sound regardless of what the spec looks like in version 2.0.
If you're building AI agents for production use, MCP isn't optional anymore. It's the sensible default.
We build MCP-based agent systems for B2B clients across Europe. If you're evaluating this architecture for your business, get in touch.
No comments yet. Be the first to comment.