Methodology
How Severity Is Estimated
Severity ratings in the LLM Risk Atlas are educational estimates, not universal or mathematically exact risk scores. They are designed to help users understand how serious an AI or LLM failure mode can become depending on context.
Why severity is context-sensitive
The same pitfall can have very different consequences depending on how the AI system is used. A hallucinated fact in casual conversation may be low impact, while a hallucinated security recommendation, medical explanation, legal summary, or production deployment instruction can become high or critical risk.
For that reason, the Atlas uses baseline severity labels while also describing the conditions that can increase or decrease risk.
Severity factors
Impact
How serious the harm could be if the failure occurs.
Likelihood
How often the pitfall is likely to occur in normal use.
Detectability
How easy it is for a user to notice the failure before harm occurs.
Triggerability
How easily the pitfall can be triggered by normal use, vague prompts, untrusted input, or adversarial behavior.
Data sensitivity
Whether the pitfall involves public information, confidential data, personal data, regulated data, credentials, or secrets.
System access
Whether the AI only produces text or can access tools, files, code, APIs, email, cloud resources, or external systems.
Scope
Whether the harm affects one person, a team, customers, production systems, or a broader public audience.
Reversibility
Whether the harm can be easily corrected or causes lasting, public, financial, legal, operational, or safety consequences.
Human overtrust
How likely users are to rely on the AI output without enough verification.
Mitigation difficulty
How hard it is to reduce the risk through review, testing, policy, monitoring, access controls, or human approval.
Severity levels
Low
Limited harm, narrow scope, easy detection, and simple mitigation.
Medium
Meaningful risk that can cause confusion, wasted effort, poor decisions, or moderate business or technical impact.
High
Serious risk that may affect security, privacy, compliance, production systems, professional decisions, or user trust.
Critical
Severe risk involving sensitive data exposure, credentials, safety-critical use, regulated decisions, security control bypass, autonomous external action, irreversible consequences, or production system impact.
Baseline severity vs. contextual severity
Each pitfall has a baseline severity. This is the general educational rating shown on pitfall cards. It represents how serious the pitfall usually is in ordinary use.
Actual severity can increase or decrease depending on context. The same pitfall may be low risk in casual use, but high or critical risk when it involves sensitive data, professional decisions, production systems, connected tools, or irreversible consequences.
Baseline severity
The default severity rating used for the Atlas pitfall card.
Contextual severity
The real-world severity after considering the task, data, domain, tools, human review, controls, and consequences.
What can lower severity
Severity is not only about what can go wrong. It also depends on whether meaningful controls are in place. Strong controls can reduce practical risk, even when the baseline pitfall is serious.
- Human review before important decisions or external actions
- Source verification for factual, legal, medical, security, or compliance claims
- Automated tests and code review before using AI-generated code
- Sandboxing before running scripts, commands, or generated code
- Read-only access instead of write access for AI-connected tools
- Least-privilege permissions for agents, plugins, APIs, and integrations
- Approval gates before sending emails, modifying files, deploying code, or changing systems
- Logging, monitoring, and audit trails for AI-assisted workflows
- Clear user warnings when outputs require verification
- Policies that prohibit entering credentials, secrets, or regulated data into unapproved tools
Methodology influences
The Atlas Severity Rubric is inspired by established risk-management, cybersecurity, AI governance, and human-factors concepts. It adapts those ideas for LLM and generative AI failure modes.
NIST AI Risk Management Framework
Supports context-sensitive AI risk management, governance, measurement, and management of harms to people, organizations, and society.
NIST Generative AI Profile
Provides generative-AI-specific risk framing for issues such as information integrity, privacy, misuse, transparency, testing, and governance.
OWASP Risk Rating Methodology
Informs the use of likelihood, impact, exploitability, detectability, and environment-specific customization.
OWASP Top 10 for LLM Applications
Informs LLM-specific risks such as prompt injection, sensitive information disclosure, insecure output handling, excessive agency, and system misuse.
CVSS and vulnerability severity concepts
Inspires the distinction between baseline severity and contextual or environmental severity, while recognizing that LLM risks require a different model than software vulnerabilities.
FMEA-style failure analysis
Supports considering severity, occurrence, and detectability when analyzing failure modes.
Human-AI interaction research
Supports including human overtrust, automation bias, and decision reliance as important AI risk factors.
Critical severity triggers
Some conditions can raise a pitfall to Critical even if the baseline severity is lower.
- Credential, secret, token, or private key exposure
- Exposure of sensitive, confidential, or regulated data
- Production system modification or outage risk
- Security control bypass or unsafe authorization behavior
- Autonomous or semi-autonomous external actions
- Safety-critical instructions or physical-world consequences
- Legal, medical, financial, employment, housing, or other regulated decisions
- Irreversible or hard-to-reverse public, legal, financial, or operational harm
Important limitations
The Atlas Severity Rubric is meant for education and practical risk communication. It does not replace a formal organizational risk assessment, legal review, security review, privacy assessment, model evaluation, or compliance process. Actual risk depends on the system, data, users, controls, deployment environment, monitoring, and consequences of failure. The model intentionally avoids claiming false precision: severity labels are structured judgments, not exact measurements.