Back to Insights
Agentic Systems9 min read15 November 2025

Designing Human-in-the-Loop Systems: A Practical Architecture Guide

HITL is one of the most frequently cited and least frequently implemented requirements in agentic AI. Teams describe it as a safety feature. Regulators treat it as a legal requirement. Architects know it as a structural challenge that must be resolved before the system is built. Here is how to design it correctly.

AA

Agraj Agranayak

Founder & CEO, Imagine Works · About · LinkedIn

Key Takeaways

  • HITL operates on a spectrum from monitoring (non-blocking) through exception handling, approval gates (blocking), to supervised execution.
  • Most HITL failures treat it as a notification system — the agent has already acted and the human review is retrospective. True HITL requires the system to pause, surface, wait, and resume.
  • EU AI Act Article 14 mandates that high-risk AI systems be designed with human oversight that enables monitoring, intervention, override, and the ability to decide not to use the system.
  • HITL cannot be retrofitted. Pause, surface, wait, and resume are each separate architectural requirements — each requires design decisions that affect the rest of the system.
  • A written HITL specification — mapping each decision type to its required level of human oversight — is a design deliverable, not a compliance document.

Human-in-the-loop design is one of the most frequently cited and least frequently implemented requirements in agentic AI development. Teams describe it as a safety feature. Regulators treat it as a legal requirement. Architects know it as a structural challenge that must be resolved before any other part of the system is built.

Getting HITL design right requires clarity on three things: when human review is required, how the system surfaces outputs for review, and how the system resumes correctly after that review. Most teams address the first. Few address the second and third adequately.

The HITL Spectrum

Architecture Reference

The HITL Spectrum — Four Levels of Human Oversight

EU AI Act Article 14 · Each level has different architectural requirements

Monitoring

✗ Insufficient for Art. 14

Human receives information about system activity. System continues autonomously. Non-blocking.

Non-blockingAsynchronousRetrospective

Exception Handling

✗ Insufficient for Art. 14

System flags defined conditions for human review. Review is asynchronous — system continues on other tasks. Partial blocking.

Condition-triggeredAsynchronous reviewPre-defined exceptions

Approval Gate

✓ Meets Art. 14

System pauses at defined decision points. A human must approve before the system continues. Fully blocking.

Fully blockingSynchronous reviewState held during pause

Supervised Execution

✓ Meets Art. 14

Human monitors and can intervene in real time. System executes in partnership with a human observer. Maximum oversight.

Real-time oversightImmediate interventionHuman–AI co-execution

The Four Architectural Requirements for Approval Gates

1Pause — hold state securely
2Surface — present decision usably
3Wait — timeout + escalation logic
4Resume — correct state after approval

None of the four Approval Gate requirements can be retrofitted. They must be designed before implementation begins.

Not every human interaction point has the same design implications. HITL requirements sit on a spectrum:

Monitoring — A human receives information about system activity but takes no action. The system continues autonomously. This is a logging and reporting function, not a HITL control. Monitoring does not satisfy Article 14's oversight requirements for high-risk systems.

Exception Handling — The system continues autonomously but flags defined conditions for human review. Review is asynchronous and non-blocking. Appropriate where most decisions are low-risk and exceptions are identifiable in advance.

Approval Gates — The system pauses at defined decision points. A human must approve before the system continues. This is a blocking interaction. This is the form of HITL most directly addressed by EU AI Act Article 14 for high-risk AI systems.

Supervised Execution — A human monitors and can intervene in real time. The system executes in partnership with a human observer. Appropriate for high-stakes or novel scenarios where the full decision logic cannot be pre-specified.

Why HITL Is an Architecture Problem

The most common HITL implementation failure is treating it as a notification system. Sending an email when an agent makes a decision is monitoring, not human oversight. The agent has already acted. The human review is retrospective.

True HITL at the approval gate level requires the system to:

  1. 1Pause — stop processing and hold state securely
  2. 2Surface — present the pending decision to a human in a usable, contextualised form
  3. 3Wait — remain paused until a human responds, with defined timeout and escalation handling
  4. 4Resume — continue from the correct state after the human action

Each of these is a distinct architectural requirement. Pause requires state management. Surface requires a review interface. Wait requires timeout logic. Resume requires state consistency after an arbitrary delay. None of these can be bolted on after the system is built without rearchitecting core parts of it.

EU AI Act Article 14

Article 14 of the EU AI Act mandates that high-risk AI systems be designed with human oversight measures enabling natural persons to: understand the capabilities and limitations of the system; monitor its operation and detect anomalies; intervene in and override the system's decisions; and decide not to use the system in a particular situation.

This is not a soft requirement. It is a design specification. High-risk AI systems that cannot be monitored, intervened in, or overridden do not meet the Act's requirements, regardless of how well they perform under normal conditions. The conformity assessment process for high-risk systems will examine whether these capabilities exist in practice — not just in documentation.

The HITL Specification

A system without a written HITL specification does not have HITL — it has good intentions. The HITL specification is a document that defines, for each agent decision point:

  • Decision type — what is the agent deciding?
  • Risk level — what is the consequence of a wrong decision?
  • HITL level required — monitoring, exception handling, approval gate, or supervised execution
  • Review trigger — what specific conditions require human review?
  • Interface design — how is the pending decision presented to the reviewer?
  • Timeout and escalation — what happens if no review occurs within the defined window?

Producing this specification is a design exercise that forces the conversations teams most need to have before building: what decisions are being made, what could go wrong, and who is accountable for each category of outcome.

Imagine Works designs agentic system architectures with HITL specifications built in from the start. Talk to us before you build.

Related Service

Agentic Systems Architecture

Designing the architecture for autonomous AI agent systems — where agents coordinate, act, and hand off to humans at exactly the right moment.

Explore this service