How to Audit Any AI Agency Before You Sign (2026 Guide)

Matt Payne · ·Updated ·8 min read
Key Takeaway

91% of marketers use AI but only 41% prove ROI. Four services deliver real returns: lead pipeline automation, AI BDR routing, content factory ops, and AI-native reporting. Demand runbooks, logs, and eval gates before signing.

How to Audit Any AI Agency Before You Sign

The Yellow Pages Problem, 2026 Edition

In 1985, every plumber, electrician, and locksmith in America bought the biggest Yellow Pages ad they could afford. Bigger ad = more trust. Nobody asked if the plumber could actually fix a pipe. The ad was the proof.

We're in the same era with AI agencies right now.

Prophet just launched MAIA, their "Multi-Agent Intelligent Assistant." 14.ai came out of stealth calling themselves the world's first "AI-native customer service agency." Cien.ai announced "Cien Agentic" promising 6x faster revenue growth. Better Together Agency shipped TogetherAI for marketing teams.

Every single one of these announcements happened in the last 30 days. Every one leads with a platform name and a bold claim. And the services pages are getting longer.

My take: the longer the services page, the less likely anything actually works. A menu with 12 items means the kitchen is a microwave.

At StoryPros, we've built 100+ AI automations. The ones that produce ROI fit into four categories. Everything else is garnish.

Step 1: Check for Lead Pipeline Automation — and Ask for the Runbook

Lead pipeline automation is the first thing worth paying for. It means an AI agent that finds prospects, enriches their data, scores them, and routes them into a sequence without a human touching a spreadsheet.

Here's what to ask: "Show me the runbook."

A runbook is the step-by-step playbook the agent follows. It's not a pitch deck. It's not a flowchart on a whiteboard. It's a document that says: "When a new lead matches [criteria], the agent does [action], checks [validation gate], and routes to [destination]."

If the agency can't show you a runbook, they don't have one.

Red-flag questions for the call:

  • "What's the average time from lead identification to CRM entry?"
  • "What happens when the enrichment API returns incomplete data?"
  • "How many leads per day does this handle before it breaks?"

Back-of-envelope ROI model: If your team spends 15 hours/week on manual prospecting at $35/hour, that's $2,275/month. An AI pipeline agent running on n8n costs roughly $200-400/month in API calls and hosting. That's a 5-10x return before it books a single meeting.

Automatic.co's benchmark report found that companies using agentic AI cut operational costs by up to 38% within 90 days. Pipeline automation is where most of that savings lives.

Step 2: Look for AI BDR Research and Routing — Not Just "AI BDR"

An AI BDR agent reads a prospect's LinkedIn, pulls recent company news, checks tech stack data, and writes a personalized reason to reach out. Then it routes that lead to the right human rep.

That last part matters. Routing is the difference between an AI BDR and a spam cannon.

Most vendors selling "AI BDR" are doing mail merge with GPT. They blast 10,000 generic emails and call it automation. This destroys trust at scale. A real BDR does the opposite.

What to audit on the services page:

  • Do they mention research as a distinct step? Or just "outreach"?
  • Do they mention routing logic? (e.g., "leads scoring above X go to senior AE, below X go to nurture sequence")
  • Do they mention eval gates? An eval gate is a checkpoint where the agent's output gets validated before it moves forward.

Red-flag questions:

  • "Show me a sample research output for a real prospect."
  • "What's your bounce rate and reply rate on outbound?"
  • "How does the agent decide which rep gets which lead?"

If they can't answer the routing question with specifics, they built a bulk email tool with a nice UI.

StoryPros builds AI BDR agents that book 30+ meetings a week. The reason they work: research happens before outreach, and routing happens before a human ever sees the lead.

Step 3: Demand Evidence of Content Factory Ops — Not "Content Creation"

Every AI agency says they do content. That's like saying a restaurant serves food. The question is what kind.

Content factory ops means a repeatable system that produces content at volume with quality controls built in. It's a pipeline: brief → draft → eval gate → revision → approval → publish. Each step has a defined owner (human or AI) and a defined handoff.

Jasper's 2026 State of AI in Marketing Report surveyed 1,400 marketers. 91% now use AI for marketing, up from 63% in 2025. But only 41% can prove ROI. That gap exists because most teams use AI for one-off drafts, not production systems.

Scaling high-quality content was the top AI priority in that report, growing 2.4x year-over-year. The demand is there. The infrastructure usually isn't.

What to look for on the services page:

  • Do they mention workflow stages, or just "AI-generated content"?
  • Is there a human review step built into the process?
  • Do they reference output logs — records of what the AI produced, what was edited, and why?

Red-flag questions:

  • "How many pieces per week does your system produce for a typical client?"
  • "What's the human edit rate on first drafts?"
  • "Can I see a sample output log from last month?"

Output logs are the smoking gun. If an agency keeps logs of every AI output, every human edit, and every approval decision, they're running a real operation. If they don't, they're prompting ChatGPT and sending you a Google Doc.

Step 4: Verify AI-Native Reporting and CRM Integration — Not Dashboards

This one's the sleeper. AI-native reporting means your CRM doesn't just store data. It uses AI to clean it, interpret it, and flag what matters.

Cien.ai claims their platform identified $2.1 billion in revenue opportunities across customer deployments. In one case, a global SaaS company found $180 million in overlooked expansion opportunities in 30 days. Your numbers will vary. But the underlying problem is real: most CRMs are full of garbage data, and AI can clean it faster than your ops team.

What AI-native CRM/reporting actually looks like:

  • Auto-deduplication and enrichment of contact records
  • AI-generated deal summaries pulled from email and call data
  • Anomaly detection on pipeline velocity (e.g., "deals in Stage 3 are stalling 40% longer than last quarter")
  • Reports that answer questions in plain English, not just display charts

Red-flag questions:

  • "Does your system write back to our CRM, or just read from it?"
  • "How do you handle data conflicts between what the AI finds and what's already in the CRM?"
  • "Can you show me a sample weekly report your system generates?"

Back-of-envelope ROI model: If your sales team has 30% duplicate or outdated contacts (common — Cien.ai literally built their product around this problem), every dollar you spend on outreach to those contacts is wasted. Clean data alone can improve pipeline conversion by 10-20%.

Step 5: Run the Production Evidence Checklist Before You Sign Anything

Here's the checklist. Print it out. Use it on every agency call.

For each of the four services, ask for:

1. Runbook — The documented, step-by-step process the AI follows. Not a slide. A real document with conditional logic. 2. Logs — Historical output records. What did the agent do last Tuesday at 2 PM? If they can't answer, it's not running. 3. Eval gates — Where does a human (or a second AI) check the first AI's work? No eval gates = no quality control. 4. Handoff documentation — How does the AI's output get to a human? Is it a Slack message? An email? A CRM task? If the handoff isn't defined, the output dies in a queue.

Instant disqualifiers:

  • They show you a demo but can't show you production data.
  • Their services page lists more than six offerings. Focus is a feature.
  • They talk about "AI strategy" without mentioning a single tool by name.
  • They promise ROI in vague terms. ("You'll see results." Results like what? When?)

The Jasper report found that governance friction — legal, compliance, brand review — is now the #1 barrier to AI scaling, up 3.4x year-over-year. Any agency worth hiring should already have answers for how they handle this. If they look confused when you ask about compliance workflows, walk away.

FAQ

What is an AI BDR agent?

An AI BDR (Business Development Representative) agent automates the research, qualification, and routing work of a human sales rep. It pulls prospect data from sources like LinkedIn, company news, and tech stack databases, then scores and routes leads to the right human rep. StoryPros builds AI BDR agents that book 30+ meetings per week by combining research, personalization, and routing logic into one automated pipeline.

How do you calculate ROI for AI agency services?

Start with the cost of the human work being replaced. If manual prospecting costs your team $2,275/month (15 hours/week at $35/hour), and an AI pipeline agent costs $200-400/month, that's a 5-10x return. Automatic.co's 2026 benchmark found companies using agentic AI cut operational costs by up to 38% within 90 days. Measure before-and-after over a 90-day window using the same KPIs.

What is the role of AI and automation in CRM?

AI-native CRM goes beyond storing records. It auto-deduplicates contacts, enriches missing fields, flags stale deals, and generates plain-English reports. Cien.ai's platform identified $180 million in expansion opportunities for a single SaaS client in 30 days by cleaning and analyzing existing CRM data. The goal is to turn messy data into decisions, not just display it in a dashboard.

What should I look for on an AI agency's services page?

Look for production evidence: runbooks (documented agent workflows), output logs (records of what the AI actually produced), eval gates (quality checkpoints), and handoff protocols (how AI output reaches a human). If the page only lists capabilities without naming tools, showing sample outputs, or referencing real metrics, the service probably doesn't exist yet.

How many AI agency services actually produce ROI?

Four: lead pipeline automation, AI BDR research and routing, content factory ops, and AI-native reporting/CRM. The Jasper 2026 report found that 91% of marketers now use AI, but only 41% can prove ROI. The gap exists because most agencies sell broad menus of services without production-grade systems behind them. Focus on these four and demand evidence before signing.

AI Answer

How much does an AI lead pipeline agent cost compared to doing it manually?

Manual prospecting at 15 hours per week and $35 per hour costs $2,275 per month. An AI pipeline agent running on n8n costs $200 to $400 per month in API calls and hosting. That is a 5 to 10x return before the agent books a single meeting.

AI Answer

What percentage of marketers using AI can actually prove ROI?

Only 41% of marketing teams can prove ROI from AI, even though 91% now use it. The gap exists because most teams use AI for one-off drafts instead of production systems. Jasper's 2026 State of AI in Marketing Report surveyed 1,400 marketers to reach these numbers.

AI Answer

What should I ask an AI agency to prove their services are real?

Ask for a runbook, output logs, eval gates, and handoff documentation for each service. A runbook is a step-by-step document showing exactly what the agent does, not a slide deck. If the agency cannot show production logs from last week, the system is not running.