Stop Building Agent Spaghetti. Automate First.

StoryPros Team · February 24, 2026 ·7 min read

Key Takeaway

Ninety percent of the ops wins your VP cares about come from plain deterministic automation. Not AI agents. Save the LLM for the 10% of messy, unstructured edge cases where if-then logic breaks down. Build it the other way around and you get an expensive, un-debuggable mess.

Stop Building Agent Spaghetti. Automate First.

TL;DR

---

We just finished ripping out an "agentic workflow" for a client and replacing 80% of it with simple n8n automations. Their monthly costs dropped from $14,000 to $1,800. Processing time went from 45 seconds per ticket to 3 seconds. Error rate went from 9% to nearly zero.

The agent wasn't bad. It was just doing work that didn't need an agent.

This is the pattern we see over and over. Teams get excited about agentic AI. They build a system where an LLM decides what to do at every step. Then they spend months debugging why it sometimes skips steps, doubles charges, or sends the wrong email.

Camunda surveyed 1,150 IT leaders for their 2026 State of Agentic Orchestration report. Seventy-three percent admitted there's a gap between what they expected from AI agents and what actually works in production. Eighty-one percent said without proper orchestration, a fully autonomous enterprise is "unrealistic."

That's not a small sample of pessimists. That's the majority of people building this stuff saying it doesn't work the way they hoped.

Deterministic Automation Does the Heavy Lifting

A deterministic workflow does the same thing every time for the same input.

If invoice amount > $10,000, route to senior approver. If lead source = "webinar," add to nurture sequence. If support ticket contains "billing," assign to finance team.

No tokens. No inference. No surprises.

This is boring. It's also what runs 90% of the operational processes a VP actually monitors—lead routing, invoice matching, ticket triage, contract renewals, status updates. These are if-then decisions with clear business rules.

We've built over 100 AI automations at StoryPros. The ones that deliver ROI fastest are almost always deterministic workflows with an LLM bolted on for one specific task. Classifying a free-text field. Summarizing a long email. Extracting data from a weird PDF format.

The math is brutal on this. Klaus Hofenbitzer documented what he calls the "token cost trap." One team's proof of concept cost $50 in OpenAI API fees. At full production volume, that same system would cost $2.5 million per month.

Real enterprise LLM deployments are already hitting $500K to $1M monthly in API bills alone.

Compare that to a deterministic automation platform. n8n's Pro plan runs about $130/month. Power Automate starts at $15/user/month. Even UiPath's mid-tier pricing is a fraction of what you'd spend routing every decision through GPT-4.

The "LLM as Edge Case Layer" Architecture

Here's the pattern that actually works in production.

Build your main workflow as a deterministic state machine. Use Temporal, Camunda, n8n, or even a well-structured Python script. Every step has a defined input, a defined output, and a defined error path. You can replay it. You can audit it. You can debug it at 2 AM when something breaks.

Then identify the 10-15% of cases where the input is genuinely unstructured. A customer email that could mean five different things. A document with no standard format. A support ticket written in a mix of English and Spanish.

That's where you drop in an LLM call—one specific call with a structured output schema. Not "figure out what to do next." More like "classify this text into one of these seven categories" or "extract the invoice number, amount, and due date from this PDF."

Anup Jadhav wrote about this exact approach in January 2026. He uses Temporal as the orchestration layer and LangGraph for the agent state machines. Temporal handles retries, state persistence, and timeouts. LangGraph manages the messy LLM reasoning.

Two layers. Clear boundaries.

The key insight: the LLM call is a single step inside a deterministic workflow. It's not the brain of the whole system. It's one tool the system uses when the structured rules can't handle the input.

Why Agent Spaghetti Happens (and What It Costs You)

Teams build agent spaghetti when they let the LLM make routing decisions that should be hardcoded.

Here's what it looks like. An LLM reads an incoming request. It decides which tool to call. It calls the tool. It reads the response. It decides the next step. Repeat until done.

This is the ReAct pattern—Reasoning and Acting in a loop. It makes incredible demos. It also creates systems where things fall apart.

You can't predict what happens. Same input, different result. Try explaining that to your compliance team. Try debugging it when a customer gets double-billed.

You can't replay failures. Temporal lets you replay a deterministic workflow step by step. You can't replay an LLM decision because it might decide differently the second time. The ShShell engineering team wrote about this in detail. They call non-deterministic routing "the same magic that makes a demo impressive" and "the same magic that makes a production system fragile."

Your costs scale linearly with volume. Every ticket, every invoice, every lead that touches an LLM costs tokens. At 10,000 tickets per month with an average of $0.14 per conversation, you're at $1,400/month just in API costs. That doesn't include retries, context windows that grow as conversations get longer, or the human review you'll need when the agent does something weird.

A deterministic workflow processing the same 10,000 tickets costs the same whether it's month one or month 60.

Your SLAs suffer. An LLM call takes 1-5 seconds. A deterministic rule evaluation takes milliseconds. Stack four or five LLM decisions in sequence and your ticket processing time is 10-20 seconds. Multiply by 10,000 tickets and you've added hours to your daily throughput.

The Decision Framework: When to Use What

I'll make it simple.

Use deterministic automation when:

The business rule can be written as an if-then statement
The input is structured (form field, database value, API response)
You need an audit trail for compliance
Speed matters (sub-second processing)
Volume is high (1,000+ events per month)

Use an LLM call when:

The input is unstructured text, images, or documents
The classification has more than 20 categories that shift over time
You need to generate natural language output (emails, summaries)
The rules would require hundreds of if-then branches to approximate

Never use an LLM for:

Routing between two or three known paths
Looking up data in a database
Sending a webhook
Updating a CRM field
Anything where "wrong 5% of the time" means real money or compliance risk

Most ops processes we audit are 85-95% structured decisions. The remaining 5-15% is where an LLM earns its keep.

The Playbook: How to Build This

Week 1: Map your current process. Write down every decision point. For each one, ask: can this be written as an if-then rule? If yes, it's deterministic. If no, flag it as a potential LLM step.

Week 2: Build the deterministic backbone. Use n8n, Temporal, or Camunda. We use n8n for most mid-market deployments because it's self-hostable and the workflow-as-code model makes version control easy. Wire up every structured decision. Test with real data.

Week 3: Add LLM calls for the flagged steps. Use structured outputs. Claude and GPT-4 both support JSON schema enforcement now. Define exactly what fields you need back. Set temperature to 0 for classification tasks. Wrap each LLM call in a retry with exponential backoff.

Week 4: Measure everything. Track cost per processed item. Track processing time. Track error rate. Compare to your baseline. We expect most teams to see 60-80% cost reduction and 90%+ improvement in processing consistency versus a fully agentic approach.

StoryPros deploys this pattern for [sales AI](/sales-ai) and [marketing automation](/marketing-automations) clients. The boring part (lead routing, data enrichment, CRM updates) runs on deterministic workflows. The interesting part (qualifying a messy inbound email, personalizing outreach, summarizing a sales call) uses an LLM.

That's how you get an [AI BDR](/ai-bdr) that books 30+ meetings a week for $200/month instead of an agent that burns $3,000/month and occasionally emails the wrong person.

Frequently Asked Questions

What's the difference between agentic AI and traditional automation?

Traditional automation follows fixed rules. If X happens, do Y. Every time. Agentic AI uses a large language model to decide what to do next based on context. The agent can reason, adapt, and handle ambiguity. The tradeoff is that agents are slower, more expensive, and harder to debug than deterministic workflows.

What is a deterministic workflow?

A deterministic workflow produces the same output for the same input every time. There's no randomness, no inference, no interpretation. It follows predefined rules and branching logic. Tools like Temporal, Camunda, n8n, and Power Automate build deterministic workflows. They're auditable, replayable, and cheap to run at scale.

When should you use agentic workflows vs deterministic automation?

Camunda's 2026 report found 81% of IT leaders believe agents need orchestration to work reliably. The practical answer: build deterministic first and add LLM calls only where the rules break down.

How do you avoid "agent spaghetti" in production?

Keep the LLM confined to specific, isolated steps inside a deterministic workflow. Don't let it make routing decisions that could be hardcoded. Use structured output schemas so every LLM response fits a predictable format. Wrap LLM calls in retries and timeouts using an orchestration layer like Temporal. Log every input and output for debugging and compliance.

Frequently Asked Questions

What's the difference between agentic AI and traditional automation?

What is a deterministic workflow?

When should you use agentic workflows vs deterministic automation?

Use deterministic automation for any decision that can be expressed as a business rule with structured inputs. Use an agentic workflow only when the input is genuinely unstructured and the decision requires natural language understanding. Camunda's 2026 report found 81% of IT leaders believe agents need orchestration to work reliably. The practical answer: build deterministic first and add LLM calls only where the rules break down.

How do you avoid "agent spaghetti" in production?