Methodology
How problems are built and verified. Full contract:
data/SCHEMA.md.
sources probmods · dippl · forest · posteriordb
problem statement given / model / query
+ typed answer spec
+ typed answer spec
realizations WebPPL · Pyro · Stan
two gates solver re-derivation
cross-language
cross-language
verified dataset 160 problems
Measured tolerance
ground truth k seeded runs
noise floor max pairwise distance
tolerance margin × floor
Gate 1 — solver re-derivation
rendered statement prose only — no code shown
independent solvers 2 × LLM
judged vs ground truth within measured tolerance
accept
Gate 2 — cross-language consistency
WebPPL / gold-reference GT k seeded runs
agree within margin × max(floors) idiomatic library use, audited by judgment
Pyro / Stan GT k seeded runs
Provenance
Ground truth matching its textbook source is authoritative; statements are the rewritable layer. Source overrides require documented evidence of internal inconsistency. Retired problems are parked with reasons.