Category

AI Evaluation Operations

These evaluation terms cover benchmark samples, human grading, regression checks, and monitoring after model changes.

How to recognize this theme

Operational terms used to test and monitor model quality.

In a daily board, this category groups terms by their shared role. Look for four cards that describe the same mechanism, risk area, or workflow rather than four words that merely sound similar.

Educational context

These entries are vocabulary notes for learning. They are not project endorsements, token recommendations, exchange rankings, or trading signals.

Eval Sample Set

An eval sample set is a collection of prompts, tasks, or cases used to measure an AI system's behavior.

Human Rating Rubric

A human rating rubric defines the criteria reviewers use to judge model outputs consistently.

Regression Check Suite

A regression check suite tests whether a model or agent update accidentally worsens behavior that previously worked.

Post-Deployment Monitor

A post-deployment monitor tracks an AI system after release for quality drift, errors, latency, or policy issues.