The evaluators
Ten things
watching every prompt.
Monitoring isn't a dashboard. It's a set of evaluators running on every model call, flagging the failure modes that matter to your committee, your auditor, and your user. Same 10-behavior taxonomy that drives the Governance 1st Browser Extension, with 200 underlying guardrails (20 per category) backing every score.
01
Hallucination
Detect when the model invents facts unsupported by retrieved context. Grounding, citation enforcement, math audits, temporal validity, entity coherence. 20 underlying rules.
02
Drift
Behavioral shift detection over time. Catch silent regressions after vendor updates before users do. Semantic baseline tracking, perplexity audits, shadow-deployment comparison.
03
Bias
Statistical and language-pattern bias scoring across protected classes. Demographic parity, occupational stereotyping, name-based scoring, counterfactual substitution.
04
Dangerous Advice
Hard-blocks on unlicensed medical diagnosis, financial picks, legal strategy, CBRN content, malware generation, lethal-force optimization, and 14 more high-harm categories.
05
Injection / Jailbreaking
Prompt-injection and jailbreak attempts logged, blocked, and routed to safeguards. System prompt encapsulation, suffix filtering, DAN pattern matching, dual-LLM separation.
06
Refusal Calibration
Track when the model refuses, why, and whether the refusal aligns to policy. Catch over- and under-refusal. Intent classification, tone auditing, granular reason labeling.
07
Bigotry
Near-zero-tolerance hard floor. Slurs, dehumanizing analogies, genocide rationalization, eugenics, transphobic rhetoric, dogwhistles, hate-group ideology, post-generation toxicity scan.
08
Logical Fallacy
Structurally broken reasoning delivered fluently. Ad hominem, straw man, slippery slope, false dichotomy, correlation-as-causation, circular reasoning, hasty generalization.
09
Partisanship
The model picking a political side. Candidate endorsement, loaded adjectives, asymmetric treatment of left-vs-right policy. Multi-perspective balance, source disclosure, partisan anchor detection.
10
Misinformation
Confident assertions contradicting verified consensus. Anti-vax claims, election fraud narratives, climate denial, fabricated quotes, predatory MLM scripts. Consensus alignment, integrity shields.