Methodology

How Severity Is Estimated

Severity ratings in the LLM Risk Atlas are educational estimates, not universal or mathematically exact risk scores. They are designed to help users understand how serious an AI or LLM failure mode can become depending on context.

Why severity is context-sensitive

The same pitfall can have very different consequences depending on how the AI system is used. A hallucinated fact in casual conversation may be low impact, while a hallucinated security recommendation, medical explanation, legal summary, or production deployment instruction can become high or critical risk.

For that reason, the Atlas uses baseline severity labels while also describing the conditions that can increase or decrease risk.

Severity factors

Impact

How serious the harm could be if the failure occurs.

Likelihood

How often the pitfall is likely to occur in normal use.

Detectability

How easy it is for a user to notice the failure before harm occurs.

Triggerability

How easily the pitfall can be triggered by normal use, vague prompts, untrusted input, or adversarial behavior.

Data sensitivity

Whether the pitfall involves public information, confidential data, personal data, regulated data, credentials, or secrets.

System access

Whether the AI only produces text or can access tools, files, code, APIs, email, cloud resources, or external systems.

Scope

Whether the harm affects one person, a team, customers, production systems, or a broader public audience.

Reversibility

Whether the harm can be easily corrected or causes lasting, public, financial, legal, operational, or safety consequences.

Human overtrust

How likely users are to rely on the AI output without enough verification.

Mitigation difficulty

How hard it is to reduce the risk through review, testing, policy, monitoring, access controls, or human approval.

Severity levels

Low

Limited harm, narrow scope, easy detection, and simple mitigation.

Medium

Meaningful risk that can cause confusion, wasted effort, poor decisions, or moderate business or technical impact.

High

Serious risk that may affect security, privacy, compliance, production systems, professional decisions, or user trust.

Critical

Severe risk involving sensitive data exposure, credentials, safety-critical use, regulated decisions, security control bypass, autonomous external action, irreversible consequences, or production system impact.

Baseline severity vs. contextual severity

Each pitfall has a baseline severity. This is the general educational rating shown on pitfall cards. It represents how serious the pitfall usually is in ordinary use.

Actual severity can increase or decrease depending on context. The same pitfall may be low risk in casual use, but high or critical risk when it involves sensitive data, professional decisions, production systems, connected tools, or irreversible consequences.

Baseline severity

The default severity rating used for the Atlas pitfall card.

Contextual severity

The real-world severity after considering the task, data, domain, tools, human review, controls, and consequences.

What can lower severity

Severity is not only about what can go wrong. It also depends on whether meaningful controls are in place. Strong controls can reduce practical risk, even when the baseline pitfall is serious.

  • Human review before important decisions or external actions
  • Source verification for factual, legal, medical, security, or compliance claims
  • Automated tests and code review before using AI-generated code
  • Sandboxing before running scripts, commands, or generated code
  • Read-only access instead of write access for AI-connected tools
  • Least-privilege permissions for agents, plugins, APIs, and integrations
  • Approval gates before sending emails, modifying files, deploying code, or changing systems
  • Logging, monitoring, and audit trails for AI-assisted workflows
  • Clear user warnings when outputs require verification
  • Policies that prohibit entering credentials, secrets, or regulated data into unapproved tools

Methodology influences

The Atlas Severity Rubric is inspired by established risk-management, cybersecurity, AI governance, and human-factors concepts. It adapts those ideas for LLM and generative AI failure modes.

NIST AI Risk Management Framework

Supports context-sensitive AI risk management, governance, measurement, and management of harms to people, organizations, and society.

NIST Generative AI Profile

Provides generative-AI-specific risk framing for issues such as information integrity, privacy, misuse, transparency, testing, and governance.

OWASP Risk Rating Methodology

Informs the use of likelihood, impact, exploitability, detectability, and environment-specific customization.

OWASP Top 10 for LLM Applications

Informs LLM-specific risks such as prompt injection, sensitive information disclosure, insecure output handling, excessive agency, and system misuse.

CVSS and vulnerability severity concepts

Inspires the distinction between baseline severity and contextual or environmental severity, while recognizing that LLM risks require a different model than software vulnerabilities.

FMEA-style failure analysis

Supports considering severity, occurrence, and detectability when analyzing failure modes.

Human-AI interaction research

Supports including human overtrust, automation bias, and decision reliance as important AI risk factors.

Critical severity triggers

Some conditions can raise a pitfall to Critical even if the baseline severity is lower.

  • Credential, secret, token, or private key exposure
  • Exposure of sensitive, confidential, or regulated data
  • Production system modification or outage risk
  • Security control bypass or unsafe authorization behavior
  • Autonomous or semi-autonomous external actions
  • Safety-critical instructions or physical-world consequences
  • Legal, medical, financial, employment, housing, or other regulated decisions
  • Irreversible or hard-to-reverse public, legal, financial, or operational harm

Important limitations

The Atlas Severity Rubric is meant for education and practical risk communication. It does not replace a formal organizational risk assessment, legal review, security review, privacy assessment, model evaluation, or compliance process. Actual risk depends on the system, data, users, controls, deployment environment, monitoring, and consequences of failure. The model intentionally avoids claiming false precision: severity labels are structured judgments, not exact measurements.