The AI Agency RFP Template That Kills Bad Vendors Fast (2026)

Matt Payne · ·Updated ·9 min read
Key Takeaway

Most AI agency RFPs are too vague to filter anyone out. Use 12 AgentOps requirements, audit logs, eval gates, run-cost models, least-privilege tokens, and build-to-transfer clauses. Score vendors 0-3 on each; drop anyone below 24/36.

The AI Agency RFP Template That Kills Bad Vendors Fast

Step 1: Stop Writing RFPs That Sound Like LinkedIn Bios

Google is showing hundreds of impressions for "AI agency" every month. But click-through rates on most vendor pages are brutal. Why? Because every AI agency says the same thing. "We build AI solutions." "We help you with AI strategy." Zero specifics.

Your RFP has the same problem if it asks for "demonstrated AI experience" and "case studies." That's the procurement equivalent of a vague vendor page. Every prompt-shop with a ChatGPT API key will respond.

Write requirements so specific that only vendors who've actually built and operated AI agents can answer them. Henrico County's $100M cooperative procurement (identifier 26-2948-4JEC) requires compliance with FedRAMP, CJIS, HIPAA, and NIST CSF. That's a filter. Ohio's SRC0000035156 requires integration with state systems and adherence to IT governance policies. Also a filter.

Your RFP needs 12 filters like these. Not compliance frameworks. AgentOps requirements.

Step 2: Require a Documented Audit-Log Schema

Every AI agent your vendor builds will take actions on your behalf. Sending emails. Updating CRM records. Booking meetings. Querying databases.

If you can't see exactly what the agent did, when it did it, and why it chose to do it, you're flying blind. A 2026 ICML paper by Halil Burak Noyan describes agents that are "typically granted static credential sets at configuration time, holding every tool the role might need for every task they perform." That's the default. No logging, no scoping, full access.

What to put in the RFP:

> "Vendor must provide a documented audit-log schema that captures: timestamp, agent ID, action type, target system, input context, output result, and token/credential used. Logs must be queryable via API and retained for a minimum of 90 days. Vendor must demonstrate this schema in a working environment within 7 days of contract signing."

A prompt-shop can't fake this. They'd need to have actually built agent logging before. Most haven't.

Step 3: Demand Eval Gates With Pass/Fail Criteria

An eval gate is a checkpoint that runs before an agent's output reaches a customer, a CRM, or an inbox. Think of it as quality control on a factory line. No product ships without passing inspection.

Most AI agencies skip this entirely. They connect an LLM to an API, watch it work for 20 minutes, and call it done. Then the agent sends a garbled email to your best prospect at 2 AM.

What to put in the RFP:

> "Vendor must define eval gates for each agent workflow. Each gate must include: (a) the specific output being evaluated, (b) the pass/fail criteria, (c) the fallback action on failure (human review, retry, or halt), and (d) a regression test suite with a minimum of 50 test cases per workflow. Vendor must run regression tests before any model update or prompt change goes live."

This is where the history lesson matters. In 2024-2025, Gmail blocked 23% more promotional messages than the prior year, according to SendForensics data. Klaviyo flagged that 22% of mid-market senders had spam placement rates above 15% by May 2026. If your AI agent is sending outbound email without eval gates checking deliverability signals, you're burning your domain reputation at machine speed.

Eval gates aren't optional. They're how you stop your agent from destroying trust at scale.

Step 4: Make Them Show You the Run-Cost Model

Here's a question I'd ask any AI agency on a first call: "What does it cost to run this agent per 1,000 actions?"

Most can't answer. They don't track it. They bill you a monthly retainer and hope the API costs stay under their margin.

What to put in the RFP:

> "Vendor must provide a run-cost model that breaks down: (a) LLM API cost per action, (b) infrastructure/hosting cost per month, (c) third-party tool costs (enrichment, verification, sending), and (d) projected total cost at 1x, 5x, and 10x current volume. Vendor must update this model monthly and flag any cost increase above 15%."

This requirement does two things. First, it tells you whether the vendor actually understands their own unit economics. Second, it protects you from the surprise where your $2,000/month agent suddenly costs $8,000 because someone switched from GPT-4o-mini to Claude Opus without telling you.

Step 5: Require Least-Privilege Tokens for Every Integration

This is the one that separates real builders from weekend hobbyists.

When an AI agent connects to your HubSpot, Salesforce, Gmail, or Microsoft 365, it needs credentials. The lazy approach: give it an admin token with full access. The right approach: give it the minimum permissions it needs for the specific task it's performing.

The Noyan ICML paper calls this "dynamic least-privilege principle" and proposes a three-layer architecture: role-based ceilings, a task-context classifier, and policy-derived prohibitions. Their research showed a 93% reduction in permission violations when this approach was applied.

What to put in the RFP:

> "All agent integrations must use least-privilege OAuth scopes. Vendor must document: (a) every OAuth scope requested per integration, (b) the business justification for each scope, (c) the process for revoking and rotating tokens, and (d) confirmation that no integration uses admin-level credentials. Vendor must provide a token audit within 30 days of launch and quarterly thereafter."

Google's Q1 2026 sender policy updates now penalize senders whose unsubscribe links fail to process within 48 hours. Yahoo introduced an engagement floor — open rates below 8% over 30 days trigger automatic spam routing. If your agent has write access to your email-sending infrastructure with no scope restrictions, one bad config change can put you in a deliverability spiral that takes 60-90 days to recover from.

Step 6: Insist on Build-to-Transfer Handoff

This is my strongest opinion on this list. If your AI agency builds something for you and you can't take it with you when the contract ends, you don't have a vendor. You have a landlord.

What to put in the RFP:

> "All agent workflows, prompt libraries, automation configurations, and custom code must be documented and transferable to Buyer's team or a successor vendor within 30 days of contract termination. Vendor must provide: (a) a complete technical documentation package, (b) environment setup instructions reproducible by a mid-level developer, (c) all API keys and credentials transferred to Buyer-owned accounts, and (d) a 2-hour knowledge transfer session recorded and delivered to Buyer. No proprietary platform lock-in permitted — all automations must run on open or Buyer-licensed infrastructure."

The DLA's June 2026 RFI specifically asks vendors about "sustainment" of AI capabilities. Ohio's procurement requires "maintenance and ongoing support." These government buyers already know that vendor lock-in is the real risk. You should too.

We build on n8n, not proprietary platforms, for exactly this reason. When a project ends, the client owns everything. The workflows. The prompts. The logic. That should be the standard.

Step 7: Score Vendors on a 12-Point Matrix

Here are all 12 requirements in one place. Score each 0-3 (0 = not addressed, 1 = vaguely addressed, 2 = addressed with specifics, 3 = addressed with working demo). Any vendor scoring below 24/36 is out.

| # | Requirement | What to Look For | |---|-------------|-----------------| | 1 | Audit-log schema | Documented fields, queryable API, 90-day retention | | 2 | Eval gates | Pass/fail criteria per workflow, fallback actions | | 3 | Regression test suite | 50+ test cases per workflow, run before every change | | 4 | Run-cost model | Per-action breakdown, volume projections, monthly updates | | 5 | Least-privilege tokens | Documented scopes per integration, no admin credentials | | 6 | Token rotation schedule | Quarterly rotation, revocation process documented | | 7 | Build-to-transfer handoff | Full docs, open infrastructure, 30-day transfer window | | 8 | Model-change notification | 48-hour advance notice before any LLM or prompt change | | 9 | Error-rate SLA | Defined acceptable error rate (e.g., <2%), measured weekly | | 10 | Human escalation path | Clear trigger conditions for human review, response time SLA | | 11 | Data residency declaration | Where data is processed, stored, and logged — in writing | | 12 | Week-1 working demo | Functional prototype in 7 days, not a slide deck |

That last one is the kill shot. If your AI vendor can't show you a working demo in week 1, find a new vendor. StoryPros holds this as a core standard for a reason — it's the single fastest way to separate builders from talkers.

Why This RFP Makes Prompt-Shops Fail Fast

A prompt-shop connects an LLM to an API, wraps it in a nice UI, and charges $3,000/month. No logging. No eval gates. No cost tracking. No transfer plan. They can't score above 12 on this matrix because they've never built the operational infrastructure around the AI.

That's the point. This RFP is designed to make them fail fast — before you've spent 6 weeks in evaluation, 3 months in a pilot, and $40,000 learning that your "AI agent" is actually a prompt with no guardrails.

The search results for "AI agency" are full of vendors who can't pass these requirements. Google shows the impressions. Users don't click because the pages are vague. Your RFP shouldn't be vague either.

FAQ

How do you write requirements for an AI agency RFP?

Write requirements that force vendors to prove operational maturity, not just AI knowledge. Each requirement should be specific enough to score on a 0-3 scale. StoryPros recommends 12 non-negotiable AgentOps clauses covering audit logs, eval gates, run-cost models, least-privilege tokens, and build-to-transfer handoff — each with sample contract language and a pass/fail acceptance test.

What are the most common mistakes in AI agency RFPs?

The biggest mistake is asking for "AI experience" without specifying operational requirements. Asking for case studies and references gets you polished decks from vendors who can't build. The second mistake is skipping a Week-1 demo requirement — any AI agency worth hiring can show a working prototype within 7 days, not 7 weeks.

How do you evaluate AI agents in production?

Evaluate AI agents using eval gates with defined pass/fail criteria, regression test suites of 50+ test cases per workflow, and weekly error-rate measurement. Track run costs per action to catch model-switching surprises. Require audit logs that capture every agent action with timestamp, target system, input context, and credential used. If you can't answer "what did the agent do last Tuesday at 3 PM," your evaluation system is broken.

What are least-privilege tokens and why do they matter for AI agents?

Least-privilege tokens are OAuth credentials scoped to the minimum permissions an AI agent needs for a specific task. Instead of giving an agent full admin access to your CRM or email, you grant read-only access to contacts and write access only to a specific pipeline. Research presented at ICML 2026 showed a 93% reduction in permission violations when dynamic least-privilege scoping was applied. With Gmail blocking 23% more messages in 2026 and Yahoo auto-routing low-engagement senders to spam, an over-permissioned email agent can destroy your sender reputation before anyone notices.

Does a build-to-transfer clause actually work?

Yes, if it's specific. Require all workflows to run on open or buyer-licensed infrastructure (not the vendor's proprietary platform), all prompt libraries and custom code to be documented and transferable within 30 days, and a recorded knowledge-transfer session. The Defense Logistics Agency's June 2026 RFI for agentic AI specifically asks about "sustainment" — even the federal government knows vendor lock-in is the real risk in AI procurement.

AI Answer

What requirements should I put in an AI agency RFP to filter out bad vendors?

Require 12 AgentOps criteria including audit logs with 90-day retention, eval gates with 50+ regression test cases per workflow, and a run-cost model broken down per action at 1x, 5x, and 10x volume. Score vendors 0-3 on each requirement. Any vendor scoring below 24 out of 36 is eliminated before evaluation begins.

AI Answer

What is a build-to-transfer clause in an AI agency contract?

A build-to-transfer clause requires the vendor to hand over all workflows, prompt libraries, custom code, and credentials to buyer-owned accounts within 30 days of contract termination. It must ban proprietary platform lock-in and include a recorded 2-hour knowledge transfer session. The Defense Logistics Agency's June 2026 RFI for agentic AI specifically asks vendors about sustainment of AI capabilities for this reason.

AI Answer

Why do least-privilege tokens matter for AI agents connecting to tools like HubSpot or Gmail?

Over-permissioned agents with admin credentials can destroy your sender reputation or corrupt CRM data before anyone notices. Research at ICML 2026 showed dynamic least-privilege scoping produced a 93% reduction in permission violations. Gmail blocked 23% more promotional messages in 2025-2026, so one misconfigured email agent can trigger a deliverability spiral taking 60-90 days to reverse.