Designing Human-in-the-Loop Systems: A Practical Architecture Guide
HITL is one of the most frequently cited and least frequently implemented requirements in agentic AI. Teams describe it as a safety feature. Regulators treat it as a legal requirement. Architects know it as a structural challenge that must be resolved before the system is built. Here is how to design it correctly.
Key Takeaways
- HITL operates on a spectrum from monitoring (non-blocking) through exception handling, approval gates (blocking), to supervised execution.
- Most HITL failures treat it as a notification system — the agent has already acted and the human review is retrospective. True HITL requires the system to pause, surface, wait, and resume.
- EU AI Act Article 14 mandates that high-risk AI systems be designed with human oversight that enables monitoring, intervention, override, and the ability to decide not to use the system.
- HITL cannot be retrofitted. Pause, surface, wait, and resume are each separate architectural requirements — each requires design decisions that affect the rest of the system.
- A written HITL specification — mapping each decision type to its required level of human oversight — is a design deliverable, not a compliance document.
Human-in-the-loop design is one of the most frequently cited and least frequently implemented requirements in agentic AI development. Teams describe it as a safety feature. Regulators treat it as a legal requirement. Architects know it as a structural challenge that must be resolved before any other part of the system is built.
Getting HITL design right requires clarity on three things: when human review is required, how the system surfaces outputs for review, and how the system resumes correctly after that review. Most teams address the first. Few address the second and third adequately.
The HITL Spectrum
Architecture Reference
The HITL Spectrum — Four Levels of Human Oversight
EU AI Act Article 14 · Each level has different architectural requirements
Monitoring
✗ Insufficient for Art. 14Human receives information about system activity. System continues autonomously. Non-blocking.
Exception Handling
✗ Insufficient for Art. 14System flags defined conditions for human review. Review is asynchronous — system continues on other tasks. Partial blocking.
Approval Gate
✓ Meets Art. 14System pauses at defined decision points. A human must approve before the system continues. Fully blocking.
Supervised Execution
✓ Meets Art. 14Human monitors and can intervene in real time. System executes in partnership with a human observer. Maximum oversight.
The Four Architectural Requirements for Approval Gates
None of the four Approval Gate requirements can be retrofitted. They must be designed before implementation begins.
Not every human interaction point has the same design implications. HITL requirements sit on a spectrum:
Monitoring — A human receives information about system activity but takes no action. The system continues autonomously. This is a logging and reporting function, not a HITL control. Monitoring does not satisfy Article 14's oversight requirements for high-risk systems.
Exception Handling — The system continues autonomously but flags defined conditions for human review. Review is asynchronous and non-blocking. Appropriate where most decisions are low-risk and exceptions are identifiable in advance.
Approval Gates — The system pauses at defined decision points. A human must approve before the system continues. This is a blocking interaction. This is the form of HITL most directly addressed by EU AI Act Article 14 for high-risk AI systems.
Supervised Execution — A human monitors and can intervene in real time. The system executes in partnership with a human observer. Appropriate for high-stakes or novel scenarios where the full decision logic cannot be pre-specified.
Why HITL Is an Architecture Problem
The most common HITL implementation failure is treating it as a notification system. Sending an email when an agent makes a decision is monitoring, not human oversight. The agent has already acted. The human review is retrospective.
True HITL at the approval gate level requires the system to:
- 1Pause — stop processing and hold state securely
- 2Surface — present the pending decision to a human in a usable, contextualised form
- 3Wait — remain paused until a human responds, with defined timeout and escalation handling
- 4Resume — continue from the correct state after the human action
Each of these is a distinct architectural requirement. Pause requires state management. Surface requires a review interface. Wait requires timeout logic. Resume requires state consistency after an arbitrary delay. None of these can be bolted on after the system is built without rearchitecting core parts of it.
EU AI Act Article 14
Article 14 of the EU AI Act mandates that high-risk AI systems be designed with human oversight measures enabling natural persons to: understand the capabilities and limitations of the system; monitor its operation and detect anomalies; intervene in and override the system's decisions; and decide not to use the system in a particular situation.
This is not a soft requirement. It is a design specification. High-risk AI systems that cannot be monitored, intervened in, or overridden do not meet the Act's requirements, regardless of how well they perform under normal conditions. The conformity assessment process for high-risk systems will examine whether these capabilities exist in practice — not just in documentation.
The HITL Specification
A system without a written HITL specification does not have HITL — it has good intentions. The HITL specification is a document that defines, for each agent decision point:
- Decision type — what is the agent deciding?
- Risk level — what is the consequence of a wrong decision?
- HITL level required — monitoring, exception handling, approval gate, or supervised execution
- Review trigger — what specific conditions require human review?
- Interface design — how is the pending decision presented to the reviewer?
- Timeout and escalation — what happens if no review occurs within the defined window?
Producing this specification is a design exercise that forces the conversations teams most need to have before building: what decisions are being made, what could go wrong, and who is accountable for each category of outcome.
Imagine Works designs agentic system architectures with HITL specifications built in from the start. Talk to us before you build.
Related Service
Agentic Systems Architecture
Designing the architecture for autonomous AI agent systems — where agents coordinate, act, and hand off to humans at exactly the right moment.
Explore this serviceMore Insights
More on Agentic Systems
Orchestration Patterns in Agentic AI: Choosing the Right Architecture
Choosing an orchestration pattern is one of the most consequential architectural decisions in agentic system design. It determines how information flows through the system, how errors propagate, how human oversight integrates, and how the system scales. Here is a practical guide to the three core patterns and when to use each.
Multi-Agent Systems: When One Agent Is Not Enough
Single-agent AI architectures have well-defined limits. As enterprise AI ambitions grow to include research synthesis, complex workflow automation, and multi-step operational processes, multi-agent architectures become necessary. Understanding when and how to use them is one of the most consequential architectural decisions in agentic AI today.
AI Agents vs. Automation: Knowing Which One to Use
AI agents and traditional automation are often treated as competing options — or worse, conflated as the same thing. They are neither. Understanding the difference, and knowing when each is the right tool, is one of the most practical decisions an enterprise technology leader can make right now.