Non agens mensurat, sed structura. — Not the agent measures, but the structure.
A standalone module that externalises every agent-dependent decision point into frozen, verifiable structure.
Same input + same contract → same verdict, regardless of which AI runs it.
NON_EVALUABLE — declared, never guessed away. Tertia via semper patet.
8 Evaluation Channels
Stance (Verdict)
—
Regime: — · Critical: —
Channel Radar
Full Spine Report
Fidelity
—
Drift ω
—
Entropy S
—
Curvature C
—
κ
—
IC
—
Gap Δ
—
Eight Evaluation Channels
Each channel is scored ∈ [0.0, 1.0]. Equal weights wi = 1/8.
The kernel computes F (arithmetic mean) and
IC (geometric mean) from these 8 scores.
Geometric Slaughter Warning
If any single channel drops below 0.3, the geometric mean (IC) collapses regardless
of how strong the other 7 channels are. Example: 7 channels at 0.95 + 1 channel at 0.001 →
F ≈ 0.83 (looks fine) but
IC ≈ 0.001 (obliterated).
The heterogeneity gap Δ = F − IC ≈ 0.83 reveals the structural collapse
that the average masks. This is the trucidatio geometrica.
CE System Prompt
Copy this prompt into any AI system (ChatGPT, Claude, Gemini, etc.) to enable Cognitive Equalizer mode.
The AI will then self-audit using the 8-channel Spine protocol with frozen thresholds and three-valued verdicts.
Implement the Cognitive Equalizer
The CE is a standalone, zero-dependency module that can be embedded in any system.
Three implementations are available — each produces identical results
because the contract is frozen: trans suturam congelatum.
🐍Python
Full implementation with CLI, system prompt, Latin aliases. Part of the UMCP package.
# Score 8 channels directly
umcp-ce --channels 0.90,0.85,0.80,0.95,0.70,0.88,0.92,0.75
# Run built-in demo
umcp-ce --demo
# Print system prompt for embedding
umcp-ce --prompt
# Latin alias works too:
aequator-cognitivus --demo
Deep Explorations
!
Geometric Slaughter Lab
Trucidatio Geometrica — One dead channel kills multiplicative coherence
Drag the kill slider to zero. Watch how
F (arithmetic mean) barely moves while
IC (geometric mean) collapses to near-zero.
The heterogeneity gap Δ = F − IC explodes.
This is why averages lie and IC tells the truth.
Kill channel:
Kill value0.50
0.00 (dead)0.501.00 (alive)
Other 7 channels fixed at 0.90
F (Fidelity)—
IC (Integrity)—
Δ (Gap)—
F vs IC Sweep (killed channel 0→1)
F (arithmetic)IC (geometric)Δ = F − ICcurrent
⟷
Comparison Mode
Side-by-side evaluation — see how two responses differ structurally
Channel-by-Channel Comparison
Δκ
Integrity Ledger Deep Dive
Debit Drift + Roughness, credit Return — the account must reconcile
The ledger is the proof that the evaluation is well-formed. Every engagement debits
Dω = Γ(ω) = ωp/(1−ω+ε) for drift and
DC = α·C for roughness.
The balance Δκ = κ − Dω − DC must satisfy
|Δκ| ≤ tolseam for BALANCED status.
Drift Debit Dω
—
Γ(ω) = ω³ / (1−ω+ε)
Roughness Debit DC
—
α · C (α = 1.0 frozen)
Balance Δκ
—
|Δκ| ≤ 0.005
Γ(ω) Drift Cost Curve — the cost of collapse proximity
Γ(ω) = ω³/(1−ω+ε). Near ω=0 the cost is negligible; near ω=1 it diverges to ∞ (the pole).
The frozen guard band ε = 10⁻⁸ prevents numerical singularity without affecting any measurement to machine precision.
The cyan dot marks the current evaluation's ω.
↻
Session History
Continuitas Diachronica — Track your evaluations across this session
Each evaluation from the interactive panel is logged here. Watch the trajectory — is quality improving, stable, or degrading?
Historia numquam rescribitur; sutura tantum additur.
No evaluations yet. Run the CE Spine above to start logging.
#
F
IC
Δ
ω
Regime
Verdict
Weak Channel
Trajectory — F and IC over evaluations
🗿
Rosetta Cross-Domain Lens
Significatio stabilis manet dum dialectus mutatur — meaning stays stable while dialect changes
The five CE words map identically across disciplines. The same evaluation produces different prose
but identical structure in every lens. Select a lens to see how:
Word
CE Meaning
Epistemological Reading
∫
Mathematical Foundations
The exact formulas computed client-side — identical to Python and C implementations
Tier-1 Kernel (4 primitives + 2 derived)
F = Σ wi ci// fidelity — arithmetic mean
κ = Σ wi ln(max(ci, ε)) // log-integrity
S = −Σ wi [ci ln ci + (1−ci) ln(1−ci)] // Bernoulli entropy
C = σ(c) / 0.5 // normalised curvature
ω = 1 − F // drift (derived)
IC = exp(κ) // integrity composite (derived)
Structural Identities (always true)
F + ω = 1 // duality identity
IC ≤ F// integrity bound
IC = exp(κ) // log-integrity relation
Seam Budget
Dω = Γ(ω) = ω3 / (1 − ω + ε) // drift cost
DC = α · C // curvature cost
Δκ = κ − Dω − DC// must ≤ tol_seam
Regime Gates (frozen)
Stable: ω < 0.038 ∧ F > 0.90 ∧ S < 0.15 ∧ C < 0.14
Watch: 0.038 ≤ ω < 0.30 (or gates not all met)
Collapse: ω ≥ 0.30
Critical: IC < 0.30 (overlay, any regime)
?
How It Works — FAQ
Plain-language answers to common questions
▶
What is the Cognitive Equalizer?
A mathematical protocol that evaluates AI responses using the same GCD kernel that analyses particle physics, atomic structure, and 21 other domains.
It scores 8 dimensions of quality, computes structural invariants (fidelity, integrity, entropy, curvature), classifies the regime (Stable/Watch/Collapse),
and derives a three-valued verdict. The key innovation: the verdict is derived from frozen structure, never asserted by the agent.
Two different AIs, given the same channel scores, MUST produce the same verdict.
▶
Why IC (geometric mean) instead of F (arithmetic mean)?
Averages lie. A response can have high F = 0.83 (looks good!) while having
IC = 0.001 (structurally obliterated) — because one channel is dead.
IC uses the geometric mean which is dragged to zero by any single dead channel.
This is geometric slaughter (trucidatio geometrica). The gap Δ = F − IC
reveals the hidden structural weakness. Try the Slaughter Lab above to see this in action.
▶
What are the three verdicts?
CONFORMANT — the engagement passes all gates and the ledger balances. Quality is structurally verified. NONCONFORMANT — either the regime is COLLAPSE (ω ≥ 0.30) or the seam budget is unbalanced. Structural failure detected. NON_EVALUABLE — input is invalid or insufficient. The system refuses to guess — tertia via semper patet (the third way is always open).
This is never boolean. There is always a third state.
▶
Why "frozen" parameters?
In traditional systems, thresholds are chosen by convention (p < 0.05, 3σ, etc.). In GCD, the parameters
are discovered by the mathematics: p = 3 is the unique integer where the drift trap is a Cardano root of x³+x−1=0.
ε = 10⁻⁸ is where the pole at ω=1 does not affect measurements to machine precision. tol_seam = 0.005 is where IC ≤ F holds at 100% across all 23 domains.
These are trans suturam congelatum — frozen across the seam, consistent on both sides of every collapse-return boundary.
▶
Can I use this to evaluate ChatGPT / Claude / Gemini?
Yes — that's exactly what it's for. Three approaches: 1. Manual scoring — Read the AI's response, score the 8 channels yourself using the sliders above, and run the Spine. 2. System prompt — Paste the CE System Prompt into the AI, and it will self-audit using the protocol. 3. Programmatic — Use the Python SDK (pip install umcp) or TypeScript module
to build automated evaluation pipelines. The same formulas run in all three approaches — trans suturam congelatum.
▶
How is this different from other AI evaluation frameworks?
Most frameworks use scoring rubrics that depend on human judgment or ML classifiers — the result varies with the evaluator.
The CE is a cognitive equalizer: same channels + same contract → same verdict, regardless of agent.
It uses the same kernel that measures particle physics confinement, atomic coherence, and consciousness —
because quality has structure, and that structure is domain-independent.
The five-stop Spine protocol, three-valued verdicts, and frozen thresholds eliminate evaluator variance by construction.
Aequator numquam dormit — The equalizer never sleeps.