Agent Safety Controls

These concepts describe how systems separate instructions, enforce policies, and keep tools in contained environments.

How to recognize this theme

Terms for restricting what an AI agent may do.

In a daily board, this category groups terms by their shared role. Look for four cards that describe the same mechanism, risk area, or workflow rather than four words that merely sound similar.

Educational context

These entries are vocabulary notes for learning. They are not project endorsements, token recommendations, exchange rankings, or trading signals.

Tool Sandbox

A tool sandbox is a restricted environment where an AI agent can run tools without full system access.

Instruction Hierarchy

Instruction hierarchy describes how system, developer, and user instructions are prioritized.

Policy Boundary

A policy boundary is the line that separates allowed actions from blocked actions in an AI workflow.

Sandbox Isolation

Sandbox isolation keeps a tool or process separated from the rest of a system to reduce blast radius.

Prompt Firewall

A prompt firewall checks user or tool inputs for unsafe, disallowed, or suspicious instructions before they reach an AI model or agent.

Policy Engine

A policy engine evaluates requests or planned actions against rules so a system can allow, block, modify, or escalate them.

Human Approval

Human approval is a control that requires a person to review and authorize a sensitive action before an automated system proceeds.

Audit Log

An audit log records important events, decisions, and changes so operators can review what happened and when.