·7 min read·
#audit#architecture

Hash-chained audit trails for AI-generated documents

SHA-256 chaining turns your audit log from 'we hope this is intact' into a cryptographically verifiable record.

Most audit logs are append-only tables. "Append-only" is a convention, not a guarantee. A database admin with write access can silently alter rows. A backup restore can reorder events. A bug in the ORM can drop entries.

For AI-generated documents in regulated industries, that isn't good enough. Regulators ask a specific question: "Prove nothing between generation and delivery was altered." A plain audit table can't answer that.

The pattern

Every audit entry includes the SHA-256 hash of the previous entry, concatenated with its own payload:

entry_hash = SHA-256(prev_hash ∥ job_id ∥ status ∥ details_json ∥ timestamp)

The first entry uses an empty prev_hash. Each subsequent entry chains to the one before. Tampering with any single entry invalidates every entry after it — the hashes no longer align.

Verification is a single walk: recompute each hash in order, compare to stored hash, stop at first mismatch.

What this gets you

  • Tamper evidence. A changed row is detected immediately on verify.
  • Gap detection. A missing row breaks the chain — you know exactly where.
  • Cheap proof. Hand a regulator the chain, let them verify independently. No cryptographic signatures required at the row level.

What it doesn't get you

  • Tamper prevention. The chain detects changes; it doesn't stop them. Pair with WORM storage for prevention.
  • Identity binding. The chain tells you a row exists. It doesn't tell you who wrote it. Pair with signed identity tokens for that.
  • Confidentiality. Hashes aren't encryption. If audit entries contain sensitive data, encrypt them separately.

Implementation notes

Use a cryptographic hash, not a checksum. CRC-32 and MD5 are both broken for this use case — adversaries can construct collisions.

Store the previous hash alongside each entry, not just the current one. This lets you verify any entry in isolation without replaying the full chain.

Archive the chain to WORM storage per logical unit — typically per batch or per day. Even if the live database is compromised, the archive remains the canonical record.

Make verification a one-click operation. If verifying the chain is a 4-hour engineering project, it won't happen until the regulator asks — and by then, the window to detect tampering has passed.

Why this matters for AI specifically

AI-generated documents introduce a specific regulatory concern: how do we prove the content that was approved is the content that was delivered? A model could, in principle, be re-invoked between approval and send, producing slightly different output. A hash-chained trail locks the approved artifact to the chain; any post-approval regeneration is visible.

Combined with the PII firewall and identity-bound approval, you have a chain of custody: the model generated this, this person approved it, no one altered it, and we can prove all three.

That's the bar regulators are setting. The architecture to meet it is not complicated — it just has to be deliberate.