Q3 is 89% complete
<-Back to site
BlogPublished: February 25, 2026

Q1 Complete. Here's What We Proved β€” and What We Didn't.

Phase 0 closed with the whitepaper, CoherenceScore implementation, first proof-of-execution, and early contributor traction, but several core hypotheses remain unproven at scale.

What was completed

Phase 0 closed with the whitepaper published, the CoherenceScore formulas running in code, the first RISC Zero proof-of-execution generated and verified, and three external contributors already participating.

The CoherenceScore work now includes explicit invariant tests around thresholds and weights, which means the implementation is pinned to the whitepaper instead of drifting silently.

  • β—†Whitepaper v0.4 published in English and Portuguese
  • β—†CoherenceScore implementation completed
  • β—†RISC Zero commitment proof generated and verified
  • β—†Three external contributors confirmed
What was partially validated

Hypothesis H1 was only partially validated. The PoC ran 10 sample queries through a Transformer plus an SSM proxy, and 82% reached Cognitive Finality at a CoherenceScore of 0.60 or higher.

That beats the 70% target for the sample, but it is still a narrow, controlled dataset. It is encouraging, not conclusive.

  • β—†Convergence rate: 82% across 10 sample queries
  • β—†Average CoherenceScore: 0.74
  • β—†Only 2 of 10 queries landed in LOW_CONFIDENCE
  • β—†Zero rejections in the sample run
What was not done

Three important items were intentionally left unfinished in Phase 0: the Agave fork, the 100-query MMLU benchmark, and the real Neuro-Symbolic architecture.

That was the right call. Forking runtime infrastructure before validating the convergence hypothesis would have been premature.

  • β—†No Agave fork in Q1
  • β—†No 100-query MMLU run yet
  • β—†No real Neuro-Symbolic architecture yet
What matters in Q2

Q2 is where the Agave fork begins. The immediate work is the Neural SVM Runtime v0.1, including CognitiveAccount types, the Cognitive Scheduler, and PoIQ Layer 1 on-chain through Anchor.

The real operational test is whether the system can process 1,000 queries without protocol failure, with deterministic challenge generation and correct slashing behavior.

If the protocol breaks under that workload, the team wants to discover it in Devnet rather than after economic value exists.

The honest caveat

The 82% convergence rate on 10 internal sample queries is not proof of the protocol thesis. The sample is small, the models still share training overlap, and CC is still approximated.

The real benchmark starts when 100+ MMLU queries run across genuinely different architecture families with a stronger coherence implementation.

More from RAXION