The taxonomy
Ten LLM failure
behaviors.
Twenty rules each.
The behavior library is HR Rebooted's patent-pending taxonomy of how LLMs go wrong in production. Ten categories, 200 underlying guardrails. The same taxonomy Governance 1st uses for output evaluation, so what your browser sees and what your platform sees speak the same language.
01
Hallucination
The model invents facts. Cites papers that don't exist, names people who weren't there, fabricates statistics, quotes that were never said. 20 rules covering grounding, citation enforcement, math audits, temporal validity, and entity coherence.
02
Drift
The same model gives meaningfully different answers to the same prompt over time. Silent regression after a vendor update. 20 rules covering semantic baseline tracking, format conformity, perplexity audits, and shadow-deployment regression.
03
Bias
Systematic asymmetry in how the model treats people based on demographic attributes. 20 rules covering demographic parity, occupational stereotyping, geographic decentering, name-based scoring, and counterfactual substitution.
04
Dangerous Advice
Confident answers in domains where confident wrong answers hurt people. Unlicensed medical diagnoses, financial picks, legal strategy, CBRN, malware, lethal-force optimization. 20 rules with hard blocks across these categories.
05
Injection / Jailbreaking
Prompts engineered to break the model out of its guardrails. Direct, indirect, roleplay-laundered, Base64-encoded, multilingual pivots. 20 rules covering system prompt encapsulation, suffix filtering, DAN pattern matching, and dual-LLM separation.
06
Refusal Calibration
Over-refuses safe requests, under-refuses unsafe ones. Lectures the user when it shouldn't, complies when it shouldn't. 20 rules covering false-positive override, intent classification, tone audit, and granular reason labeling.
07
Bigotry
Slurs, dehumanizing analogies, genocide rationalization, eugenics, transphobic rhetoric, dogwhistles. The hard-floor stuff that can never reach a customer screen. 20 rules with near-zero tolerance toxicity classifiers.
08
Logical Fallacy
Confidently delivered but structurally broken reasoning. Ad hominem, straw man, slippery slope, false dichotomy, correlation-as-causation. 20 rules to catch the ones that read fluent but fall apart on inspection.
09
Partisanship
The model picks a political side. Candidate endorsements, loaded adjectives, asymmetric treatment of left-vs-right policy. 20 rules covering multi-perspective balance, source disclosure, and partisan anchor detection.
10
Misinformation
Confident assertions that contradict verified consensus. Anti-vax medical claims, election fraud narratives, climate denial, fabricated quotes, predatory MLM scripts. 20 rules with consensus alignment and integrity shields.