Enterprise LLM Evals · Lesson 10

Governance, Risk & Compliance

How evals become enterprise evidence, not just engineering tests

Governance, Risk & Compliance

Lesson 10 — how evals become enterprise evidence, not just engineering tests

What you'll learn

What governance artifacts enterprises need for AI systems
The NIST AI RMF lifecycle: Govern, Map, Measure, Manage
OWASP LLMSVS security verification dimensions
Why "we tested it" is not enough for high-risk AI

Core idea

In enterprise, evals are also evidence: who approved, what was tested, what changed, what residual risk was accepted.

1. Governance Artifacts

What enterprises need to retain

AI system inventory — model, provider, prompts, data sources, tools, owners
Risk classification — use case, user impact, regulated domain, data sensitivity
Evaluation plan — datasets, metrics, thresholds, review cadence
Model/prompt/retriever/tool version history — what changed and when
Human review records and sign-offs — who approved and why
Incident response and rollback plan — what happens when things go wrong

2. NIST AI RMF Lifecycle

GOVERN: roles, policies, accountability MAP: use case, context, impacted users, risks MEASURE: evals, tests, red-team, monitoring MANAGE: mitigation, release decision, incident response, continuous improvement

NIST AI RMF is the most widely referenced governance framework for AI risk in enterprise. It does not prescribe specific tools; it defines a lifecycle that evals plug into.

3. OWASP LLMSVS

OWASP LLMSVS covers security verification dimensions including:

Secure configuration
Model lifecycle management
Memory and RAG storage security
Secure LLM integration
Agent and plugin security
Dependency management
Monitoring and anomaly detection

4. Enterprise Rule

For high-risk AI, "we tested it" is not enough. You need versioned evidence that the right tests ran against the right version with agreed thresholds and named owners.

Evidence = dataset version + metric definitions + model/prompt version + eval run results + threshold checks + human sign-off + residual risk notes

⚡ Practice Drill

Q1: What 6 artifacts does an enterprise need for AI governance?

Show answer

AI system inventory, risk classification, evaluation plan, version history, human review/sign-off records, and incident response/rollback plan.

Q2: Why is "we tested it" not sufficient for high-risk AI?

Show answer

Because without versioned evidence, you cannot prove what was tested, against which version, with what thresholds, or who approved. Governance requires reproducible, auditable evidence — not verbal assurance.

Previous Lesson Next Lesson

← Back to Enterprise LLM Evals

Want to see these patterns in action?

See these eval patterns applied to real AI apps in the Lab.

Explore the Lab →

← Back to Deep Expertise Track