How We Built a Real-Time AI Review Pipeline

Engineering February 20, 2025 · 7 min read

When we set out to build Verified Workflows, the core challenge was clear: human review is slow, but AI pipelines are fast. Bridging that gap without compromising quality required rethinking how review tasks flow through a system.

This is the technical architecture behind our real-time review pipeline.

The Problem With Synchronous Review

The naive approach — submit an output, wait for a human to review it, return the result — works for low-volume use cases. But at scale, it breaks down. Reviewers need time to read, evaluate, and respond. A thorough review takes 2–5 minutes. If your pipeline blocks on that, you've built a very expensive rate limiter.

Our Architecture: Async-First With Webhooks

We designed the pipeline around three principles:

Never block the caller. Task submission returns immediately with a task ID. The caller continues processing.
Route intelligently. Not all tasks need the same reviewer. Skill-based routing matches task requirements to reviewer qualifications.
Deliver asynchronously. Results are pushed to the caller via webhooks, not pulled via polling.

Task Lifecycle

Every task follows a predictable path through the system:

Submitted — The API validates the payload, assigns an ID, escrows payment, and places the task in the routing queue.
Routed — The router evaluates required skills, priority, and reviewer availability. High-priority tasks go to on-call reviewers. Standard tasks enter the general pool.
Claimed — A qualified reviewer picks up the task. The system starts a session timer and monitors activity.
Reviewed — The reviewer submits their assessment. For consensus tasks, the system waits for the required number of votes.
Delivered — The final result is posted to the client's webhook endpoint with an HMAC signature for verification.

The Routing Algorithm

Our router considers four factors when matching tasks to reviewers:

Score = skill_match × 0.4 + availability × 0.3 + reliability × 0.2 + speed × 0.1

Skill match is binary — you either have the certification or you don't. Availability is real-time: how many tasks the reviewer is currently handling. Reliability is their historical accuracy rate. Speed is their average review time relative to the task complexity.

Handling Failures

Reviewers miss deadlines, give inconsistent ratings, or abandon tasks mid-review. Our failure handling:

Timeouts — If a reviewer doesn't submit within the SLA window, the task is rerouted to a backup reviewer.
Consensus divergence — If two reviewers disagree by more than a threshold, a third reviewer breaks the tie.
Abandonment — If a reviewer starts but doesn't finish, the task returns to the queue with no penalty to the client.

Performance Numbers

After six months in production:

Median time from submission to first review: 47 seconds
Median time to consensus (3-vote tasks): 2.1 minutes
Task completion rate: 99.2%
Webhook delivery success rate: 99.97%

What We'd Do Differently

If we were rebuilding today, we'd invest earlier in reviewer quality signals. Early on, we treated all reviewers equally. Now we weight reviewer reliability heavily — a task reviewed by a 98% accuracy reviewer is fundamentally different from one reviewed at 75%.

We'd also build better anomaly detection from day one. Spotting patterns like one reviewer consistently approving everything, or a sudden spike in task abandonment, requires purpose-built monitoring that we bolted on later.

Try It Yourself

The sandbox uses the same pipeline as production. Submit a task, watch it route, and see the review results come back in real time. It's the fastest way to understand how the architecture works in practice.

See the pipeline in action

Run a free review through the sandbox — no signup required.

Try the sandbox →