The first version of Whispor Auto that we showed to pilot customers did what everyone in this space does by default: it followed a script. For each supplier category, we had a branching tree of phrases the agent could use, with decision points at each branch. It worked. It was also wrong.

It was wrong because scripts are the opposite of how a good negotiator operates. A good negotiator has a mandate — a specific thing they're trying to accomplish, with clear limits on what they can and can't give up — and then they improvise inside that envelope. The script is implicit. The mandate is explicit. When we got this backwards, our agent was fluent but brittle: it could handle exactly the counterparty behaviors we had anticipated, and nothing else.

We rebuilt the system around the mandate. What we ended up with, after a couple of iterations, is something we now think of as the minimum viable spec for an autonomous procurement agent.

What a guardrail actually is

A guardrail is a constraint on agent behavior that is independent of the conversational context. "Never commit to a term longer than 12 months without human approval" is a guardrail. "If the supplier mentions pricing, pivot to value" is a script. The difference is that the guardrail survives any supplier input — it's invariant — while the script is contingent on a specific conversational branch.

The reason this matters is deployment risk. A guardrail can be audited, reasoned about, and signed off by legal and finance once per category. A script has to be reviewed in full every time you encounter a new counterparty behavior, which in practice means it never gets reviewed enough.

If a legal team can read your agent's guardrails in five minutes and approve them, the agent is ready for production. If they need to read a 40-page script, it isn't.

Four guardrail types

The guardrails that matter in autonomous procurement fall into four categories. Each corresponds to a dimension where the agent has authority to act, and a line past which it has to escalate.

1. Commercial guardrails

These are the price and term boundaries. "Never accept a unit price above $X." "Never commit to more than 24 months." "Net-45 is the floor on payment terms." They're the most obvious category, and the one most teams already have — though often as informal knowledge rather than explicit constraints.

2. Structural guardrails

These cover what deal shapes the agent is authorised to agree to at all. Can it agree to a volume commitment? A most-favored-nation clause? A price-index escalation? A framework agreement with downstream call-offs? Each of these is a structural commitment that changes the shape of the relationship, and most should sit outside the agent's mandate in version one.

3. Procedural guardrails

These govern how the agent interacts. "Never send more than two messages in a 24-hour window." "Always CC a named buyer on the first and last message of a thread." "If the supplier hasn't responded in 72 hours, escalate, don't follow up." Procedural guardrails are what make the agent feel professional on the receiving end, and what prevent the failure modes that most erode supplier trust.

4. Escalation guardrails

These specify exactly what triggers a handoff to a human. "Any mention of a legal or compliance term." "Any counter that reframes the category or scope." "Any supplier request for a meeting." Escalation guardrails are the most important category, because they determine the agent's humility — what it does when it encounters something it shouldn't resolve alone.

What we got wrong the first time

Two specific mistakes cost us roughly a quarter of calendar time.

First, we tried to make commercial guardrails adaptive — "if the supplier is at Q-end, the agent can move 2% beyond the floor." This was a mistake. Not because the economics were wrong (they weren't), but because it made the guardrails non-auditable. A legal team couldn't reason about the agent's behavior without understanding the adaptation logic. We pulled the adaptations back into an explicit escalation: now, when the Q-end signal fires, the agent pauses and asks a human whether to extend the floor. The negotiation is slightly slower. The deployment risk is dramatically lower.

Second, we didn't have enough escalation guardrails in the first version. Specifically, we hadn't covered "supplier introduces a term that isn't in our spec at all." The agent, not knowing what to do, would either ignore the term (bad) or acknowledge it vaguely (worse). The fix was a catch-all: any supplier message containing language the agent can't map onto an approved dimension goes to a human, immediately. This single rule eliminated roughly 90% of the category of failures we were worried about.

What this means in practice

A well-specified Whispor Auto deployment for a tail-spend category typically fits on two pages. Page one: commercial and structural guardrails. Page two: procedural and escalation guardrails. That's it. The rest of the agent's behavior emerges from the underlying language model, constrained by the guardrails and the conversational context.

When customers ask us to see "the script" the agent is running, we show them these two pages and explain that there is no other script. For some customers this is reassuring — they understand immediately why the system is auditable. For others it takes a demo or two to feel comfortable with the idea that the agent's behavior is generative rather than pre-written. That's fair. It's a real shift in mental model from what procurement software has traditionally looked like.

The takeaway for anyone deploying agents

If you're evaluating any autonomous system in a regulated or commercially-sensitive setting, the question to ask is not "what can the agent do?" but "what are the guardrails, and can they be understood without a product manager in the room?" If the answer to that second question is no, the system isn't ready for production, regardless of how impressive the demo looks.

This is the design principle behind Whispor Auto: the agent is generative, the guardrails are declarative, and the line between them is where accountability sits. Keep that line visible and the system becomes deployable. Blur it, and you're back to scripts — just with more expensive infrastructure.

The Whispor team