Building an AI Audit Trail That Actually Works
Most teams build audit trails because a compliance officer asked for one. They end up with a Append-only log table that nobody queries, nobody understands, and nobody trusts. An audit trail that actually works serves three audiences: compliance teams proving you followed process, engineering teams debugging production issues, and product teams understanding how AI decisions affect users.
What to Log
The first question is scope. Log too little and you can't investigate incidents. Log too much and you drown in storage costs. Here's what matters:
- Input data — what was sent to the AI model (prompt, context, configuration)
- Model output — the raw response, including confidence scores and token-level probabilities if available
- Reviewer actions — who reviewed, what decision they made, what changes they applied, and how long they spent
- Decision rationale — structured fields for why a reviewer approved, rejected, or modified the output
- System metadata — timestamps, model version, prompt version, environment, request IDs
- Escalation events — when tasks move between reviewers, why, and the final resolution
What you should not log: full PII. Store references or hashes instead, and keep the actual data in your primary data store with its own access controls.
Storage Strategies
Audit logs have different access patterns than operational data. They're write-heavy, rarely updated, and queried by time range, entity ID, or event type. This makes them ideal candidates for append-only storage.
Option 1: Dedicated Log Database
Use a purpose-built system like Amazon CloudWatch, Loki, or a partitioned PostgreSQL table with automatic archiving. Partition by date for efficient range queries and fast deletion of expired data.
Option 2: Event Sourcing
If your architecture already uses event sourcing, audit events are just another event stream. This gives you a complete, ordered history of every state change — but requires tooling to query effectively.
Option 3: Hybrid
Hot data (last 90 days) stays in your operational database for fast queries. Cold data archives to S3 or similar object storage with lifecycle policies. This balances query speed against storage cost.
Building a Query Interface
An audit trail nobody can query is just expensive text. Build a simple interface — even a SQL view or a basic admin panel — that lets stakeholders:
- Search by task ID, reviewer ID, or time range
- Filter by event type (creation, review, escalation, resolution)
- View the full event timeline for a single task
- Export results as CSV for compliance reporting
The query interface doesn't need to be fancy. It needs to be usable by someone who isn't an engineer — your compliance team will be the primary users.
Compliance Requirements
Different regulations have different expectations for audit logs:
- SOC 2 requires you to log access to customer data and demonstrate that you review logs regularly
- HIPAA requires audit trails for access to PHI, with 6-year retention and tamper-evidence
- GDPR requires documenting processing activities, including review decisions
- AI-specific regulations (EU AI Act) increasingly require logging of AI system decisions, including human overrides
Design your audit schema to accommodate all of these. A single well-structured event format is easier to maintain than separate logs per regulation.
Tamper-Proofing
If your audit trail is just a database table with INSERT permissions, it's not an audit trail — it's a suggestion. For regulated environments, implement cryptographic integrity:
- Hash chaining — each log entry includes a hash of the previous entry, making retroactive modification detectable
- Write-once storage — use append-only storage with no UPDATE or DELETE permissions
- Periodic checksums — write a daily hash of all entries to an independent store (like S3 Object Lock)
- External notarization — for high-assurance environments, timestamp hashes to a public blockchain or trusted timestamping authority
The purpose of tamper-proofing isn't to make modification impossible — it's to make modification detectable. If someone alters a log entry, you should be able to prove it happened.
Start simple: append-only PostgreSQL with row-level hashing and daily checksums to S3. That covers 90% of compliance requirements. Add more sophistication only when your regulatory or threat model demands it.
Ready to add human review to your pipeline?
Start with 100 free tasks. No credit card required.
Start free trial →