Your AI Marketing Contract Is Missing 5 Pages (2026)

Matt Payne · ·Updated ·9 min read
Key Takeaway

Deloitte, McKinsey, and Publicis all launched AI marketing practices in April 2026 with zero run-cost models, audit logs, or kill-switch clauses. Demand 5 contract addendum items before signing: per-run cost ceilings, audit log schemas, eval gates, kill-switch SLAs, and acceptance tests.

Your AI Marketing Contract Is Missing 5 Pages

TL;DR

Deloitte, McKinsey, Publicis, and WPP all announced agentic AI marketing practices in April 2026. None of them published a run-cost model, audit log schema, or kill-switch clause. If your AI marketing consultant's contract doesn't include AgentOps — the operational controls that keep agents accountable after launch — you're buying a PowerPoint. Here's the exact addendum to demand before you sign.

Deloitte launched an "agentic orchestration engine for the end-to-end marketing lifecycle" on April 22, 2026. McKinsey announced the McKinsey Google Transformation Group the same day. Publicis and Microsoft expanded their partnership on April 8 to build "a full-stack marketing solution" with AI agents.

Read their press releases carefully. Count the number of times they mention governance structure, per-run cost ceilings, or what happens when an agent goes sideways at 2 AM on a Saturday.

I counted zero.

AgentOps for AI marketing is the set of operational controls — run-cost models, audit logs, evaluation gates, kill-switches, and acceptance tests — that ensure your AI agents perform reliably after the consultants leave. StoryPros includes AgentOps provisions in every engagement because the agent isn't the hard part. Keeping it honest is.

Here's what "AI strategy" looks like without AgentOps: a $300K engagement, a shiny demo, and a content agent that starts hallucinating your competitor's pricing into blog posts three weeks after go-live.

Here's a parallel. In the early days of cloud computing — around 2008 — companies bought SaaS contracts without SLAs for uptime, data retention, or breach notification. They learned the hard way. Salesforce had a major outage in 2016 that took down thousands of businesses, and the companies without SLA teeth had no recourse. We're at the same inflection point with AI agents. The contracts haven't caught up to the technology.

The Validity/Litmus State of Email 2026 report found that only 12% of marketing teams have deeply integrated AI. But 75% of those who have are seeing ROI above 45:1. The gap between "AI that works" and "AI that doesn't" isn't the model. It's the operations around it.

Here are five contract addendum items to demand from any AI marketing consultant. If they can't agree to these, they're selling strategy, not results.

Step 1: Demand a Run-Cost Model With Per-Agent Ceilings

Every AI agent has a cost per execution. LLM API calls, data enrichment lookups, email sends, CRM writes. Your consultant should tell you exactly what each agent costs to run — per lead processed, per email generated, per meeting booked.

Apollo.io's Tolly Group study from April 8, 2026 showed a 2.37% cold-to-meeting conversion rate from an AI-powered campaign. That's real. But what did each meeting cost in LLM tokens, enrichment fees, and platform charges?

Your contract addendum should include:

  • Cost-per-run ceiling: A dollar amount cap per agent execution (e.g., $0.12 per lead scored, $0.35 per email drafted and sent).
  • Monthly spend cap: A hard ceiling on total agent operational costs, separate from the consulting fee.
  • Cost attribution by agent: If you're running a lead pipeline agent and a content factory agent, you need separate line items. Not one blended number.
  • Overage notification threshold: The consultant must alert you when spend hits 80% of the monthly cap.

If your consultant can't give you a per-run cost estimate before signing, they haven't built the thing yet. They're planning to figure it out on your dime.

We build agents at StoryPros where every run logs its cost. Not because we're generous. Because the math is the proof.

Step 2: Require Audit Logs With a Defined Schema and Retention Policy

Salesforce's State of Sales 2026 report found that 51% of sales leaders say disconnected systems are slowing their AI work. Audit logs are how you avoid becoming that stat.

Every action your AI agent takes — every email it writes, every lead it scores, every CRM field it updates — needs a logged record. Not buried in a vendor dashboard. In a format you own.

Your contract addendum should specify:

  • Log schema: At minimum, each log entry needs a timestamp, agent ID, action type, input data hash, output data, model version used, cost incurred, and pass/fail status from any eval gate.
  • Retention period: 12 months minimum. 24 if you're in financial services or healthcare or selling into the EU (the AI Act's explainability requirements under Article 13 now require documented audit trails for high-risk systems).
  • Export format: JSON or CSV, exportable on demand. Not locked inside the vendor's proprietary dashboard.
  • Access rights: Your team gets read access to raw logs. Not a summary report. The actual logs.

The EU AI Act compliance research published on Zenodo in April 2026 describes a model where "every validation event produces a tamper-evident record automatically." That's the standard. If a 10-person European SME can meet that bar, your AI marketing consultant can too.

Without audit logs, you can't debug a failing agent. You can't prove compliance. You can't even tell if the agent is doing what the consultant said it would do.

Step 3: Build Evaluation Gates Into Every Workflow

An eval gate is a checkpoint where agent output gets tested before it goes live. Think of it as QA for AI — except it runs automatically, on every single execution.

Here's why this matters: Salesforce's Adam Alfano said agents contacted 130,000 leads and created 3,200 opportunities in four months. That's impressive. It's also 130,000 chances for an agent to say something wrong, score a lead incorrectly, or send a message that violates your brand guidelines.

Your contract addendum should require:

  • Content eval gates: Every piece of AI-generated content (email, blog post, ad copy) gets scored against brand voice guidelines, factual accuracy checks, and compliance rules before it ships. Specify the tool — we use n8n workflows that pipe output through a second model for validation.
  • Lead scoring eval gates: Every lead score gets compared against a validation set of known-good scores. If accuracy drops below a threshold (e.g., 85% agreement with human-scored samples), the agent pauses.
  • Drift detection: Weekly automated checks comparing current agent output quality to baseline metrics from the acceptance test (Step 5). If quality drops more than 10%, the consultant is contractually obligated to investigate within 48 hours.

The Validity report showed that 25% of marketing teams cite poor data quality as their top scaling challenge with AI. Eval gates catch bad data before it becomes bad output. Skip them and you're shipping garbage at machine speed.

Step 4: Define Kill-Switch Responsibilities and Response Times

McKinsey's new Transformation Group promises "end-to-end" engagement with "governance." Publicis says their agents will "continuously optimize spend in real time — within guardrails set by marketing leaders."

Great. What happens when the guardrails fail?

Your contract addendum needs a kill-switch clause:

  • Who can pull it: Name the specific roles on your team authorized to stop an agent. Not just "the client." A named person and a backup.
  • How to pull it: A documented process. A URL, a Slack command, an API call. Tested monthly. If your kill-switch requires a support ticket, it's not a kill-switch. It's a suggestion box.
  • Response time SLA: If you pull the kill-switch, the consultant has a defined window (4 hours max during business hours, 12 hours off-hours) to diagnose and report.
  • Blast radius documentation: Before launch, the consultant must document exactly what systems the agent touches. CRM records, email platforms, ad accounts, content management systems. So when you kill the agent, you know what else might break.
  • On-call runbook: A written playbook for the three most likely failure modes. What does the agent do if the LLM returns garbage? What if the enrichment API goes down? What if send volume spikes 10x?

Deloitte has 1,000+ pre-built AI agents. WPP has an "agentic marketing platform." Neither press release mentions what happens when one of those agents misbehaves. That should make you nervous.

Step 5: Set Acceptance Tests Before the Contract Starts

You wouldn't hire a BDR and skip the 90-day review. Don't skip it for an AI agent either.

Acceptance tests are the specific, measurable criteria an agent must hit before you accept delivery. Your contract addendum should define two sets — one for a lead pipeline agent, one for a content factory agent.

Lead pipeline acceptance test example:

  • Process 500 leads from your CRM with zero data corruption
  • Score leads with 85%+ agreement vs. human-scored validation set
  • Book 10+ qualified meetings in the first 30 days (Apollo's benchmark: 2.37% conversion rate — use your own baseline)
  • Generate audit logs for every action per the schema in Step 2
  • Stay under the per-run cost ceiling defined in Step 1

Content factory acceptance test example:

  • Generate 20 pieces of content (emails, blog drafts, social posts) matching your brand voice guide
  • Pass human review at 80%+ approval rate on first draft (meaning 16 of 20 need no major edits)
  • Each piece includes source attribution for any factual claims
  • Full eval gate pipeline running and logging per Step 3
  • Content generated within defined cost ceiling (e.g., under $2 per blog draft, under $0.15 per email)

The Salesforce State of Sales report says sellers expect AI agents to cut email drafting time by 36%. Put that in the acceptance test. If the agent doesn't save your team 36% of drafting time within 30 days, the consultant isn't done.

Most AI consulting fails for the same reason most diets fail. People buy the plan. They skip the accountability.

AgentOps is the accountability. Demand it.

FAQ

What is AgentOps?

AgentOps is the operational layer that governs AI agents after they're built and launched. It includes run-cost tracking, audit logging, evaluation gates, kill-switches, and acceptance testing. Without AgentOps, an AI agent is an unsupervised employee with access to your CRM, your email platform, and your brand voice — running 24/7 with no performance review.

What does it cost to run an AI marketing agent?

Costs vary by complexity, but a lead scoring agent typically runs $0.05–$0.15 per lead processed (LLM API costs plus enrichment). A content generation agent runs $0.10–$2.00 per piece depending on length and model. StoryPros builds AI BDR agents that book 30+ meetings a week for roughly $200/month in run costs. The key is getting your consultant to commit to a per-run ceiling in writing before the engagement starts.

How do you set up evaluation gates for AI agents?

Build a secondary validation workflow that checks every agent output before it goes live. For content agents, pipe the draft through a second LLM call that scores it against your brand guidelines and flags factual claims. For lead pipeline agents, compare scores against a human-validated test set on a rolling basis. Tools like n8n let you build these checkpoints as workflow nodes — no custom code required. If accuracy drops below your threshold (85% is a reasonable starting point), the agent pauses automatically.

What should a kill-switch for an AI agent look like?

A kill-switch should be a single action — a button, a Slack command, or an API call — that immediately stops the agent from taking any further actions. It should be testable (run a drill monthly), documented in a runbook, and backed by a response-time SLA from your vendor. The Publicis-Microsoft partnership promises agents that "continuously optimize spend within guardrails." Your kill-switch is what happens when those guardrails break.

Why do most AI marketing consulting engagements fail?

They fail because the consultant delivers a strategy or a demo and leaves. Deloitte, McKinsey, and Publicis all announced AI marketing practices in April 2026. Their press releases emphasize "strategy," "transformation," and "governance" in the abstract. None published a run-cost model, audit log schema, or acceptance test framework. The distance between "we built it" and "it works reliably at 3 AM on a Sunday" is where most AI marketing investments die.

AI Answer

What should an AI marketing contract include for agent governance?

An AI marketing contract needs 5 operational controls: a per-run cost ceiling, audit logs with a defined schema, evaluation gates on every workflow, a kill-switch clause with response-time SLAs, and acceptance tests before delivery. Without these, you have no way to verify the agent performs after the consultant leaves. Consulting firms like Deloitte and McKinsey announced AI marketing practices in April 2026 without publishing any of these provisions.

AI Answer

How much does it cost to run an AI marketing agent per month?

A lead scoring agent typically costs $0.05 to $0.15 per lead processed, covering LLM API calls and data enrichment. A content generation agent runs $0.10 to $2.00 per piece depending on length and model. A full AI BDR agent booking 30 or more meetings per week can run for roughly $200 per month in total run costs.

AI Answer

What is an eval gate in an AI marketing workflow?

An eval gate is an automated checkpoint that tests agent output before it goes live. For content agents, a second model scores each draft against brand guidelines and flags factual claims. For lead pipeline agents, scores are compared against a human-validated test set, and the agent pauses automatically if accuracy drops below 85%.