·5 min read·
#architecture#privacy

The PII firewall pattern: architecture notes

Splitting logic context from customer data is the single most important decision in regulated AI. Here's how it works in practice.

The PII firewall is a single architectural decision with large downstream consequences: customer data never enters the AI model's context.

This sounds like a policy statement. It's actually a data flow.

Two streams, one render

Every document generation has two inputs:

  1. Logic context — jurisdiction, intent, regulation IDs, product flags, relational flags. Non-identifying. This is what the model sees.
  2. PII payload — names, emails, balances, account numbers, household member names, referrer identity. Identifying. This stays local.

The model produces a draft containing placeholder tokens: Dear {{CLIENT_NAME}}, your balance as of {{STATEMENT_DATE}} is {{CURRENT_BALANCE}}. The draft is deterministic in its structure — the logic engine decided the clauses and their ordering. The model decided the phrasing.

A local injector runs after the model, inside your own runtime. It substitutes tokens from the PII payload. The injector is pure code — no AI involved — and is itself audited.

Why this is better than context-filtering

A common first attempt is to strip PII before sending to the model: regex out names, mask account numbers, tokenize emails. This fails in three ways:

  • Detection is incomplete. New PII patterns emerge constantly. Regex misses edge cases.
  • Masking degrades output quality. The model's draft becomes generic because it lost the relational context.
  • You still sent the data. Even masked data crossed the boundary.

The firewall pattern inverts the question: instead of "how do we scrub PII from the prompt?", ask "why is PII in the prompt at all?"

Relational identity is PII too

A subtle trap: relational information leaks identity. "Your spouse was also added to this account" tells the model something about a real person. ARC treats household members, referrers, and joint account holders as relational PII — these fields stay local and are injected via dedicated tokens ({{HOUSEHOLD_MEMBER_NAME}}, {{REFERRER_NAME}}).

The model reasons over flags: "is_vip_referral: true", "household_size: 3". It never sees the names.

What you give up

You give up the model's ability to make PII-aware phrasing choices. If you want the model to decide whether "Dear Mr. Smith" or "Dear John" is more appropriate, you can't — because it doesn't know the name.

In regulated document generation, that tradeoff is easy. Formality is a policy decision, not a creative one. You pick a register per intent per jurisdiction, and the rules engine enforces it.

The firewall is not a compromise. It's the foundation the rest of the compliance architecture sits on.