CoherenceScore Is Now Running Code
The Chapter 3 math is now a tested Python implementation, including protocol thresholds, geometric mean behavior, and explicit invariants.

The implementation now lives in poc/convergence with three core modules: embeddings.py for thread-safe normalized embeddings, coherence.py for the whitepaper formulas, and __init__.py exporting compute_coherence_score() plus threshold constants.
The formulas implemented are the whitepaper definitions for CS_semantic, CC, and the final CoherenceScore, which means the protocol math is no longer just prose. It is executable and testable code.
- βembeddings.py: sentence-transformers + L2 normalization
- βcoherence.py: CS_semantic, CC, and CoherenceScore
- β__init__.py: public API and threshold exports
CS_semantic uses the geometric mean of pairwise similarities because it punishes outlier disagreement much harder than an arithmetic average.
If two architectures agree strongly but one diverges, arithmetic mean can hide the disagreement. Geometric mean preserves that disagreement as signal, which is exactly what a convergence protocol should do.
- βArithmetic mean example: 0.633 can still look acceptable
- βGeometric mean example: 0.448 correctly falls into low confidence
- βOutlier disagreement should not be averaged away
The current weighting is 40% CS_semantic and 60% CC because causal coherence is harder to game than surface-level semantic similarity.
Two outputs can sound similar while reaching different conclusions. CC is meant to capture premise consistency and conclusion alignment, which makes it the more important signal.
Phase 0 still uses an approximation for CC by embedding the first three sentences of each output. That limitation is explicit in the implementation and will be replaced in Devnet.
The implementation currently enforces four convergence categories: rejected below 0.30, low confidence from 0.30 to 0.60, standard from 0.60 to 0.85, and high coherence above 0.85.
These values are treated as protocol invariants. The tests are intentionally written so changing a threshold requires a whitepaper amendment, not a silent code edit.
- βTHRESHOLD_REJECT = 0.30
- βTHRESHOLD_STANDARD = 0.60
- βTHRESHOLD_HIGH = 0.85
- βALPHA = 0.4 and BETA = 0.6 are also enforced
The current PoC validates the formula and the invariant boundaries, but it does not yet prove how often real-world queries exceed the standard threshold at scale.
It also does not prove that the current CC approximation is close enough to the full Devnet implementation, or that all-MiniLM-L6-v2 is the ideal similarity model for every domain RAXION will target.
Those questions remain open until the broader benchmark runs.