Enterprise LLM Evals · Lesson 10
Governance, Risk & Compliance
How evals become enterprise evidence, not just engineering tests
Governance, Risk & Compliance
Lesson 10 — how evals become enterprise evidence, not just engineering tests
- What governance artifacts enterprises need for AI systems
- The NIST AI RMF lifecycle: Govern, Map, Measure, Manage
- OWASP LLMSVS security verification dimensions
- Why "we tested it" is not enough for high-risk AI
Core idea
In enterprise, evals are also evidence: who approved, what was tested, what changed, what residual risk was accepted.
1. Governance Artifacts
- AI system inventory — model, provider, prompts, data sources, tools, owners
- Risk classification — use case, user impact, regulated domain, data sensitivity
- Evaluation plan — datasets, metrics, thresholds, review cadence
- Model/prompt/retriever/tool version history — what changed and when
- Human review records and sign-offs — who approved and why
- Incident response and rollback plan — what happens when things go wrong
2. NIST AI RMF Lifecycle
NIST AI RMF is the most widely referenced governance framework for AI risk in enterprise. It does not prescribe specific tools; it defines a lifecycle that evals plug into.
3. OWASP LLMSVS
OWASP LLMSVS covers security verification dimensions including:
- Secure configuration
- Model lifecycle management
- Memory and RAG storage security
- Secure LLM integration
- Agent and plugin security
- Dependency management
- Monitoring and anomaly detection
4. Enterprise Rule
For high-risk AI, "we tested it" is not enough. You need versioned evidence that the right tests ran against the right version with agreed thresholds and named owners.
Q1: What 6 artifacts does an enterprise need for AI governance?
Show answer
AI system inventory, risk classification, evaluation plan, version history, human review/sign-off records, and incident response/rollback plan.
Q2: Why is "we tested it" not sufficient for high-risk AI?
Show answer
Because without versioned evidence, you cannot prove what was tested, against which version, with what thresholds, or who approved. Governance requires reproducible, auditable evidence — not verbal assurance.
Want to see these patterns in action?
See these eval patterns applied to real AI apps in the Lab.
Explore the Lab →