The 5-Point MCP Checklist: How to Vet Real AI Agencies (2026)

Matt Payne · ·Updated ·7 min read
Key Takeaway

78% of MCP servers lack proper authentication. Most AI agencies are prompt shops that can't ship a production MCP tool server with auth, audit logs, and eval gates in 5 days. Use this 5-point checklist to test any vendor before spending $30K/month.

Most AI Agencies Can't Ship an MCP Server. That Tells You Everything.

TL;DR

The Model Context Protocol (MCP) is the standard that lets AI agents connect to real tools and data. If your AI vendor can't ship an MCP tool server with authentication, audit logs, and evaluation gates in a week, they're a prompt shop — not an agency that can run production agents. StoryPros uses a 5-point demo checklist to vet any agency (including ourselves) before a dollar gets spent.

I Asked a $30K/Month AI Agency to Show Me Their MCP Server

Last month, a VP of Sales hired us to rescue a failed AI project. The previous vendor — a well-funded AI agency charging $30K/month — had spent four months building "agentic workflows."

I asked one question on the handoff call: "Can you show me the MCP server these agents use to access the CRM?"

Silence. Then: "We use API calls directly in the prompt chain."

That's not an agent. That's a script with a language model stapled on top.

The Model Context Protocol, built by Anthropic and now governed by the Linux Foundation's Agentic AI Foundation, is the standard way AI agents connect to external tools. Think of it as USB-C for AI. Build a connector once, and it works with Claude, ChatGPT, Cursor, and every other MCP-compatible app.

If your AI vendor doesn't know what MCP is, they're building throwaway integrations. If they know what it is but can't ship one, they're selling demos, not production systems.

78% of MCP setups lack proper authorization, according to a February 2026 vulnerability analysis. That's not a tech problem. That's a vendor competence problem.

The 5-Point MCP Demo Checklist (Hand This to Your Vendor)

I use this checklist before we sign any new client. It takes a real AI team about five days to pass. A prompt shop can't pass it at all.

1. Working MCP tool server on a staging URL. Not localhost. Not a screen share of VS Code. A server your team can hit from a browser. We use n8n to expose workflows as MCP tools because it has 400+ built-in integrations and it's self-hostable. [CrewAI has native MCP support with `pip install crewai 'crewai-tools[mcp]'`](https://docs.crewai.com/en/mcp/overview). LangChain uses `langchain-mcp-adapters`. Any of these work. "We'll have it next sprint" doesn't.

2. Authentication on every endpoint. In February 2026, security researchers found over 8,000 MCP servers exposed on the public internet with no auth. The Clawdbot incident saw 200+ API keys extracted by automated scanners in 72 hours — costing users $50,000+ in unauthorized charges. Your MCP server needs OAuth 2.0 or API key auth on every single endpoint. Default configs that bind to `0.0.0.0:8080` with no password are an instant fail.

3. Audit logs with retention policy. Every tool call the agent makes gets logged. Who called it. What it returned. When. How long it took. This isn't optional. If your agent books a meeting with the wrong prospect or sends a wrong email, you need the receipt. We log to a structured store with 90-day retention as a baseline.

4. Eval gates before any tool fires. Before an agent executes a tool — say, sending an email or updating a CRM record — there's a check. Does the input match the expected schema? Is the confidence score above threshold? Is this action within the agent's allowed scope? A CVE rated 10.0 (CVE-2025-6514) exists for MCP implementations that skip input validation. Eval gates catch prompt injection, tool poisoning, and plain old hallucination before they hit your production systems.

5. A recorded demo under adversarial conditions. Have someone feed the agent bad data. Misspelled names. Empty fields. Conflicting instructions. If it breaks, show me it fails gracefully, logs the error, and doesn't execute the tool. That's the difference between a demo and a production system.

StoryPros builds and ships MCP tool servers with all five of these criteria as part of every AI agent engagement. If an agency tells you eval gates are "phase two," find a different agency.

Why This Matters More Than the Model You Pick

Everyone argues about GPT-4o vs. Claude 3.5 vs. Gemini. That argument is a distraction.

The model is maybe 20% of what makes an AI agent work in production. The other 80% is plumbing: how the agent connects to your tools, how you control what it can do, how you know what it did, and how you stop it when it goes wrong.

MCP is that plumbing. Most vendors skip it because it's hard and unglamorous.

Here's what actually breaks in real-world MCP setups:

Tool schema mismatch. Your CRM expects a field called `company_name`. The agent sends `organization`. The call fails silently. No log. No alert. The agent just moves on and the lead never gets updated. We've seen this on three different CrewAI + HubSpot builds.

Permission creep. The agent has access to `shell_execute` and `file_write` because someone copy-pasted a config from GitHub. Those exposed MCP servers in February? Many had tool configs that allowed arbitrary command execution. Your agent should have the smallest possible set of permissions.

No retry logic. An API call to Slack fails because of a rate limit. The agent doesn't retry. It doesn't log the failure. It just skips the step. Now your sales team thinks a message was sent that was never sent.

These are boring problems. They're also the problems that cost you $50K in unauthorized charges or a quarter of pipeline you thought was being worked.

How to Evaluate Any AI Agency in 30 Minutes

I've vetted about 20 AI vendors for clients in the last year. Here's my playbook.

Ask for the MCP server URL. Not a deck. Not a Loom video. A URL you can test. If they don't have one, the conversation is over.

Ask what happens when a tool call fails. If the answer is "the agent retries" — ask to see the retry logic and the error logs. If there are no logs, there's no production system.

Ask who owns the audit trail. You need to own your logs. If the vendor stores them on their infra and you can't export them, you're locked in and blind.

Ask about auth. Specifically: is every MCP endpoint behind authentication, and how are API keys rotated? If they pause, they haven't thought about it.

Ask for a test with bad data. Give them a CSV with 50 rows of messy prospect data — missing emails, duplicate names, wrong phone formats. See what the agent does. A real agent handles it. A demo breaks.

I know this sounds aggressive. It's not. It's the bare minimum. You wouldn't hire a dev shop that can't show you a staging environment. Same standard applies here.

FAQ

How do you evaluate AI agents for production readiness?

StoryPros evaluates AI agents against five criteria: a working MCP tool server on a staging URL, authentication on every endpoint, structured audit logs with a 90-day retention minimum, eval gates that validate inputs before tool execution, and a demo under adversarial conditions with bad data. Any production-ready agent should pass all five within a one-week proof of concept.

What are the core MCP evaluation metrics that matter?

The metrics that matter are tool call success rate (should be above 95%), average latency per tool call (under 2 seconds for most CRM and email operations), auth failure rate (should be near zero — high rates mean someone's probing your endpoints), schema validation pass rate, and error recovery rate. Error recovery rate is how often the agent correctly retries or gracefully fails when a tool call breaks.

How should authentication and authorization be built for MCP servers?

Every MCP server endpoint needs OAuth 2.0 or API key authentication. In February 2026, over 8,000 MCP servers were found publicly exposed without auth, leading to $50,000+ in unauthorized API charges in one incident alone. Auth should be enforced at the gateway level. API keys should rotate on a 90-day cycle at minimum. Each agent should have scoped permissions — never full admin access. Default configs that expose admin panels are the number one cause of MCP security breaches.

How do you set up audit logging and retention for MCP tool calls?

Every MCP tool call should log the requesting agent ID, the tool name, input parameters, output response, timestamp, and execution duration. StoryPros stores these logs in a structured format with a 90-day default retention period. This audit trail is critical for debugging agent errors, proving ROI to leadership, and meeting compliance requirements. Without audit logs, you have no way to know if your agent sent the right email to the right person — or any email at all.

What's the biggest risk with MCP servers right now?

The biggest risk is unsecured default configurations. The Clawdbot incident in January 2026 saw 10,000+ MCP instances launched in 72 hours, with 1,000+ admin panels publicly accessible and 200+ API keys stolen by automated scanners. A separate scan in February 2026 found 8,000+ MCP servers exposed on the public internet. MCP servers can be exploited for remote code execution and data exfiltration. The fix is straightforward: enforce auth on every endpoint, scope agent permissions, and validate every tool input before execution.

AI Answer

How can I tell if an AI agency is actually production-ready or just a prompt shop?

Ask them to demo an MCP tool server with authentication, audit logs, and evaluation gates on a staging URL. A real AI agency can ship this in about 5 days. If they can't show you a working server (not localhost, not a screen share), they're selling demos, not production systems.

AI Answer

What percentage of MCP servers lack proper security authentication?

78% of MCP setups lack proper authorization according to a February 2026 vulnerability analysis. In one incident alone (Clawdbot), 200+ API keys were extracted by automated scanners in 72 hours, costing users $50,000+ in unauthorized charges. This is why every MCP endpoint must have OAuth 2.0 or API key authentication.

AI Answer

What are the 5 things I should ask an AI vendor to demonstrate before signing a contract?

Require a working MCP server on a staging URL, authentication on every endpoint, structured audit logs with 90-day retention, evaluation gates that validate inputs before tool execution, and a live demo using bad/messy data to prove graceful error handling. Any vendor charging $30K/month should pass all five within one week.