Category

Agent Evaluation and Safety

These practices help teams evaluate reliability and reduce failure modes when agents call tools or execute workflows.

Eval Suite

An eval suite is a collection of tests and metrics used to measure an AI system's accuracy, robustness, and failure modes on representative tasks.

Sandboxing

Sandboxing is running an agent or tool in a restricted environment with limited permissions to reduce the impact of mistakes or malicious outputs.

Rate Limiting

Rate limiting caps how often an agent or API can act over time, helping prevent runaway loops, abuse, or accidental overload.

Structured Output

Structured output is when a model produces responses in a constrained schema (like JSON), making downstream parsing, validation, and safety checks more reliable.