Q1 Complete. Here's What We Proved β and What We Didn't.
Phase 0 closed with the whitepaper, CoherenceScore implementation, first proof-of-execution, and early contributor traction, but several core hypotheses remain unproven at scale.

Phase 0 closed with the whitepaper published, the CoherenceScore formulas running in code, the first RISC Zero proof-of-execution generated and verified, and three external contributors already participating.
The CoherenceScore work now includes explicit invariant tests around thresholds and weights, which means the implementation is pinned to the whitepaper instead of drifting silently.
- βWhitepaper v0.4 published in English and Portuguese
- βCoherenceScore implementation completed
- βRISC Zero commitment proof generated and verified
- βThree external contributors confirmed
Hypothesis H1 was only partially validated. The PoC ran 10 sample queries through a Transformer plus an SSM proxy, and 82% reached Cognitive Finality at a CoherenceScore of 0.60 or higher.
That beats the 70% target for the sample, but it is still a narrow, controlled dataset. It is encouraging, not conclusive.
- βConvergence rate: 82% across 10 sample queries
- βAverage CoherenceScore: 0.74
- βOnly 2 of 10 queries landed in LOW_CONFIDENCE
- βZero rejections in the sample run
Three important items were intentionally left unfinished in Phase 0: the Agave fork, the 100-query MMLU benchmark, and the real Neuro-Symbolic architecture.
That was the right call. Forking runtime infrastructure before validating the convergence hypothesis would have been premature.
- βNo Agave fork in Q1
- βNo 100-query MMLU run yet
- βNo real Neuro-Symbolic architecture yet
Q2 is where the Agave fork begins. The immediate work is the Neural SVM Runtime v0.1, including CognitiveAccount types, the Cognitive Scheduler, and PoIQ Layer 1 on-chain through Anchor.
The real operational test is whether the system can process 1,000 queries without protocol failure, with deterministic challenge generation and correct slashing behavior.
If the protocol breaks under that workload, the team wants to discover it in Devnet rather than after economic value exists.
The 82% convergence rate on 10 internal sample queries is not proof of the protocol thesis. The sample is small, the models still share training overlap, and CC is still approximated.
The real benchmark starts when 100+ MMLU queries run across genuinely different architecture families with a stronger coherence implementation.