Skip to content
fewtokensai
Article · April 22, 2026 · 9 min read

Why MCP servers fail in production (and how to prevent it)

The Anthropic SDK is just the starting point. Six things missing from a typical Model Context Protocol deployment — and the patterns I use to keep MCP from blowing up at the first thousand users.

Anthropic’s MCP spec is excellent as a starting point. It tells you how an LLM talks to external systems. It doesn’t tell you how not to explode with 10,000 concurrent users. After six months developing the MCP server at inFakt, I’ve collected a list of things initial implementers usually fail to anticipate.

1. No backpressure or queues

The first “PoC” MCP makes synchronous calls to its backend. Works for 10 users. Falls over for 1,000.

In production, every tool call should pass through a queue with explicit backpressure configuration. SQS, Redis Streams, Cloud Tasks — choice depends on stack. The MCP client gets pending with a poll URL, not a blocking call.

2. Per-call audit log

In a regulated industry, no audit log = no compliance. Every tool call (with parameters, sanitized payload, response, latency, error code) goes to dedicated storage with retention matching local regulations (GDPR: minimum necessary; EU AI Act: 6 months for high-risk).

In practice: write-through to a dedicated logging service before the tool runs. If logging fails — fail closed.

3. Per-tool scopes, not blanket auth

The auth token should carry exactly the permissions the tool needs, no more. Not read:all_data but read:invoices.where(user_id=X).

In practice: each tool has a declarative scopes manifest. The OAuth2 layer enforces minimal scope pre-call; the audit log records the delta.

4. Idempotency for write tools

LLMs retry. MCP clients retry. A tool that pays an invoice must not be called twice because the agent lost the response.

Solution: every write-side call carries a client-generated idempotency key. The server with Redis caches key → response for 24h.

5. Graceful degradation

The backend goes down. What does the MCP server do?

Bad: 500 to the client, the agent hallucinates “I updated your invoice.”

Good: returns structured error {"error": "backend_unavailable", "retry_after_seconds": 30}. The agent decides — retry, escalate to human, continue without this call.

6. Observability is not optional

Every call has a trace ID propagated from MCP client through tool, backend, response. OpenTelemetry, Honeycomb/Datadog/Grafana Tempo — choice doesn’t matter; absence does.

Without it, debugging the first production issue = 4 hours grepping logs.

TL;DR

Production MCP is not “expose Anthropic SDK and go to bed.” It’s queues, audit, scoped auth, idempotency, graceful errors, observability — i.e., distributed systems engineering with an LLM client.

If you’re planning an MCP deployment, reach out — it’s faster with me, and without unnecessary traps.

Let's talk about your AI

Let's talk.

30 minutes, no obligation. Tell me where your AI initiative is stuck or what you're planning — you'll leave with concrete next steps.