A single fair coin (heads probability 0.5 a priori). A soft-conditioning factor is applied: when the coin lands heads, a log-weight of 3 is added; when it lands tails, no weight is added.
A Bernoulli trial with prior probability 0.5 for heads. The outcome is soft-conditioned by adding log-weight 3 for heads and 0 for tails, then the result is enumerated exactly.
The posterior distribution over whether the coin lands heads.
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (Infer({method: 'enumerate'}, function () {2 var A = flip();3 factor(A*3);4 return A;5}));
1# A ~ flip(); factor(A*3); posterior over A. Exact discrete enumeration via Pyro.23@pyro.infer.config_enumerate4def model():5 A = pyro.sample("A", dist.Bernoulli(0.5))6 pyro.factor("f", A * 3.0)7 return A89marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)10m = marg["A"]11sup = m.enumerate_support()12probs = m.log_prob(sup).exp()1314ANSWER = {}15for s, p in zip(sup.tolist(), probs.tolist()):16 ANSWER[bool(int(s))] = p17
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Three independent fair coins, each with heads probability 0.5. A soft-conditioning factor adds log-weight 1 to outcomes where exactly 2 of the 3 coins land heads, and 0 otherwise.
Three independent Bernoulli trials, each with prior probability 0.5 for heads. The joint outcome is soft-weighted by adding log-weight 1 when the total number of heads equals exactly 2. The marginal distribution over the first coin is computed by exact enumeration.
The posterior marginal distribution over whether the first coin lands heads.
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (Infer({}, function() {2 var a = flip(0.5);3 var b = flip(0.5);4 var c = flip(0.5);5 factor(1*((a+b+c)==2));6 return a;7}));
1@pyro.infer.config_enumerate2def model():3 a = pyro.sample("a", dist.Bernoulli(0.5))4 b = pyro.sample("b", dist.Bernoulli(0.5))5 c = pyro.sample("c", dist.Bernoulli(0.5))6 total = a + b + c7 lw = torch.where(total == 2, torch.tensor(1.0), torch.tensor(0.0))8 pyro.factor("two_heads", lw)9 return a1011marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)12_m = marg["a"]13_p_true = _m.log_prob(torch.tensor(1.0)).exp().item()14ANSWER = {False: 1.0 - _p_true, True: _p_true}15
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A proposer splits $10 with a responder in $1 increments; possible offers are integers 0 through 10. The responder accepts any offer of $1 or more and rejects an offer of $0. If the offer is accepted, the proposer receives $10 minus the offer; if rejected, the proposer receives $0.
The proposer's offer is drawn uniformly over {0, 1, ..., 10}. The responder's accept/reject decision is deterministic: accept iff offer > 0. The proposer's reward is soft-maximized by using the reward as the factor weight. Exact enumeration over all offers.
The soft-maximizing distribution over the proposer's offer.
answer spec
{
"kind": "dist",
"domain": "int"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var responder = function(offer) {2 return (offer>0 ? true : false);3};4var ANSWER = (Infer({method: "enumerate"}, function(){5 var offer = uniformDraw([0,1,2,3,4,5,6,7,8,9,10]);6 var reward = responder(offer) ? (10 - offer) : 0;7 factor(reward);8 return offer;9}));
1# probmods2-agents-as-programs/ex2.a2# Proposer offer ~ Uniform{0..10}; responder accepts iff offer > 0.3# Reward (10-offer if accepted else 0) is used as the factor weight, so the4# proposer's offer is soft-maximized. Exact enumeration over the 11 offers.56offers = torch.arange(0, 11) # 0..10, index == offer value78@pyro.infer.config_enumerate9def model():10 offer = pyro.sample("offer", dist.Categorical(torch.ones(11) / 11.0))11 accepted = offer > 012 reward = torch.where(accepted, (10 - offer).double(), torch.tensor(0.0).double())13 pyro.factor("reward", reward)1415marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)16offer_marg = marg["offer"]17probs = offer_marg.probs.detach()1819ANSWER = {int(offers[i].item()): float(probs[i].item()) for i in range(11)}20
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A proposer splits $10 with a responder in $1 increments; possible offers are integers 0 through 10. The responder accepts the offer with probability (offer/10)^2 (i.e., the fraction of $10 given to the responder, squared). If accepted, the proposer receives $10 minus the offer; if rejected, the proposer receives $0.
The proposer's offer is drawn uniformly over {0, 1, ..., 10}. The proposer's reward is soft-maximized by placing a factor equal to the realized reward inside a joint enumeration over all (offer, accept/reject) outcome pairs, where the realized reward is (10 − offer) if the responder accepts and 0 if rejected. Exact enumeration over all offers and both accept/reject outcomes.
The soft-maximizing distribution over the proposer's offer.
answer spec
{
"kind": "dist",
"domain": "int"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var alpha = 2;23var responder = function(offer, alpha) {4 var p = Math.pow(offer/10,alpha);5 return flip(p);6};7var ANSWER = (Infer({method: "enumerate"}, function(){8 var offer = uniformDraw([0,1,2,3,4,5,6,7,8,9,10]);9 var reward = responder(offer,alpha) ? (10 - offer) : 0;10 factor(reward);11 return offer;12}));
1# Soft-maximizing proposer: offer ~ uniformDraw(0..10), responder accepts with2# p = (offer/10)^alpha, reward = (10-offer) if accept else 0, factor(reward).3# Marginal posterior over offer. The accept latent is discrete and enumerable;4# run exact Pyro enumeration over both offer and accept.56alpha = 2.07offers = list(range(11))8n_off = len(offers)9accept_p = torch.tensor([ (o / 10.0) ** alpha for o in offers ]) # per-offer accept prob1011@pyro.infer.config_enumerate12def model():13 offer = pyro.sample("offer", dist.Categorical(torch.ones(n_off) / n_off))14 p = accept_p[offer]15 accept = pyro.sample("accept", dist.Bernoulli(p))16 reward = torch.where(accept.bool(),17 (10.0 - offer.double()),18 torch.tensor(0.0))19 pyro.factor("f", reward)20 return offer2122marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)23m = marg["offer"]24sup = m.enumerate_support()25probs = m.log_prob(sup).exp()2627ANSWER = {}28for s, p in zip(sup.tolist(), probs.tolist()):29 ANSWER[int(s)] = p30
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A responder accepts an offer o with probability (o/10)^alpha, where alpha controls sensitivity to the offer. The proposer's prior over alpha is uniform on the interval [0.5, 5]. In a single round, the proposer offered $2 and the responder rejected (i.e., the accepted payoff was 0).
The proposer holds a continuous prior over the responder's sensitivity parameter alpha. Given the observed rejection of a $2 offer, the posterior over alpha is updated via Bayesian conditioning.
The posterior distribution over alpha given the rejection, obtained via MCMC with 50000 samples.
answer spec
{
"kind": "dist",
"domain": "real"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var responder = function(offer, alpha) {2 var p = Math.pow(offer/10,alpha);3 return flip(p);4};5var ANSWER = (Infer({method: "MCMC", samples:50000}, function(){6 var alpha = uniform(0.5,5);7 var offer = 2;8 var reward = responder(offer, alpha) ? (10 - offer) : 0;9 condition(reward==0);10 return alpha;11}));12
1def model():2 alpha = pyro.sample("alpha", dist.Uniform(0.5, 5.0))3 # responder accepts offer 2 with prob (2/10)^alpha; rejection observed (reward==0)4 p_accept = torch.pow(torch.tensor(2.0 / 10.0), alpha)5 log_p_reject = torch.log1p(-p_accept)6 pyro.factor("reject", log_p_reject)7 return alpha89nuts = pyro.infer.NUTS(model)10mcmc = pyro.infer.MCMC(nuts, num_samples=1000, warmup_steps=400)11mcmc.run()12ANSWER = mcmc.get_samples()["alpha"]13
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0220 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.011, 0.011] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0851 ≤ tol 0.3747 · floors 0.1873/0.0220 |
Ultimatum game: a responder accepts offer o with probability (o/10)^alpha, where alpha controls sensitivity. The proposer's prior over alpha is uniform on [0.5, 5]. Offers are integer dollar amounts from 0 to 10 inclusive. The proposer's payoff is (10 - offer) if the offer is accepted, and 0 if rejected. Round 1: the proposer offered $2 and the responder rejected.
After observing the round-1 rejection, the proposer updates beliefs about alpha. In round 2 the proposer is a rational agent: given a draw of alpha from the updated posterior, the proposer samples an offer, simulates the responder once (accepting with probability (offer/10)^alpha), and adds the realized payoff — (10 − offer) if accepted, 0 if rejected — to the log-weight (a softmax agent over realized outcomes, not expected-payoff weighting).
The marginal distribution over the proposer's round-2 offer (an integer dollar amount from 0 to 10), under the two-stage model described above.
answer spec
{
"kind": "dist",
"domain": "int"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var responder = function(offer, alpha) {2 var p = Math.pow(offer/10,alpha);3 return flip(p);4};56var proposer1 = Infer({method: "MCMC", samples:50000}, function(){7 var alpha = uniform(0.5,5);8 var offer1 = 2;9 var reward1 = responder(offer1, alpha) ? (10 - offer1) : 0;10 condition(reward1==0);11 return alpha;12});13var ANSWER = (Infer({method: "forward", samples:1000}, function(){14 var alpha2 = sample(proposer1);15 var proposer2 = Infer({method: "MCMC", samples:5000}, function(){16 var offer2 = uniformDraw([0,1,2,3,4,5,6,7,8,9,10]);17 var reward2 = responder(offer2, alpha2) ? (10 - offer2) : 0;18 factor(reward2);19 return offer2;20 });21 return sample(proposer2);22}));23
1# Two-stage agent model.2# Stage 1: alpha posterior given round-1 rejection of offer=2, via MCMC (NUTS).3# The discrete responder accept/reject for offer1 is enumerated away with4# config_enumerate so NUTS samples only the continuous alpha.5# Stage 2: for each of 1000 outer alpha draws, build the round-2 proposer6# distribution by EXACT Pyro enumeration over the finite offer support (the7# accept latent enumerated), draw one offer from it, aggregate.89responder_p = lambda offer, alpha: (offer / 10.0) ** alpha1011# ----- Stage 1: alpha | round-1 rejection, via MCMC -----12@pyro.infer.config_enumerate13def proposer1_model():14 alpha = pyro.sample("alpha", dist.Uniform(0.5, 5.0))15 offer1 = 2.016 p1 = (offer1 / 10.0) ** alpha17 # reward1 = (10-offer1) if accept else 0; condition reward1 == 0 i.e. reject.18 accept1 = pyro.sample("accept1", dist.Bernoulli(p1))19 reward1 = torch.where(accept1.bool(), torch.tensor(8.0), torch.tensor(0.0))20 # condition(reward1 == 0) <=> reject <=> accept1 == 021 pyro.factor("rej", torch.where(reward1 == 0.0, torch.tensor(0.0),22 torch.tensor(float("-inf"))))23 return alpha2425mcmc = pyro.infer.MCMC(pyro.infer.NUTS(proposer1_model),26 num_samples=2000, warmup_steps=500)27mcmc.run()28alpha_post = mcmc.get_samples()["alpha"] # 1-D tensor of alpha draws2930offers = list(range(11))31n_off = len(offers)32offers_t = torch.tensor([float(o) for o in offers])333435def proposer2_probs(alpha2):36 # exact enumeration of the round-2 proposer for a fixed alpha2.37 ap = torch.tensor([ (o / 10.0) ** alpha2 for o in offers ])3839 @pyro.infer.config_enumerate40 def model():41 offer2 = pyro.sample("offer2", dist.Categorical(torch.ones(n_off) / n_off))42 p = ap[offer2]43 accept = pyro.sample("accept2", dist.Bernoulli(p))44 reward2 = torch.where(accept.bool(), (10.0 - offer2.double()),45 torch.tensor(0.0))46 pyro.factor("f", reward2)47 return offer248 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)49 m = marg["offer2"]50 sup = m.enumerate_support()51 probs = m.log_prob(sup).exp()52 full = torch.zeros(n_off)53 for s, p in zip(sup.tolist(), probs.tolist()):54 full[int(s)] = p55 return full565758# ----- Stage 2: outer forward sampling over alpha posterior -----59counts = Counter()60n_outer = 100061idx = torch.randint(0, alpha_post.shape[0], (n_outer,))62for i in range(n_outer):63 alpha2 = float(alpha_post[int(idx[i])].item())64 probs = proposer2_probs(alpha2)65 drawn = int(pyro.sample(f"draw_{i}", dist.Categorical(probs)).item())66 counts[drawn] += 16768ANSWER = {o: counts[o] / n_outer for o in offers if counts[o] > 0}69
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.2360 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.116, 0.116] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.1200 ≤ tol 0.4720 · floors 0.1480/0.2360 |
Prisoner's dilemma: if the focal thief confesses (regardless of what the other does), she receives a lenient sentence of 6 years. If she does not confess but the other does, she receives 10 years. If neither confesses, she goes free (0 years). The other thief independently decides to confess with probability 0.5. The soft-conditioning weight for each joint outcome is (10 - years_in_jail) / 10.
Both thieves independently and uniformly decide whether to confess. The focal thief's years in jail follow the payoff matrix above. Each joint outcome's unnormalized log-weight is increased by (10 − years) / 10, so unnormalized weights are proportional to exp((10 − years) / 10) — not a multiplicative weight.
The posterior distribution over whether the focal thief confesses, under the soft-conditioning scheme described.
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var thiefRats = function(){2 return flip();3};45var lenient = 6;6var ANSWER = (Infer({}, function(){7 var otherThiefRats = thiefRats();8 var IRat = thiefRats();9 var years = (otherThiefRats?10 (IRat? lenient : 10) :11 (IRat? lenient : 0));12 var percentYearsFreedom = (10-years)/10;13 factor(percentYearsFreedom);14 return IRat;15}));16
1lenient = 623@pyro.infer.config_enumerate4def model():5 other_rats = pyro.sample("other", dist.Bernoulli(0.5))6 i_rats = pyro.sample("i_rats", dist.Bernoulli(0.5))7 # years: if other confesses -> (I confess? 6 : 10); else -> (I confess? 6 : 0)8 years = torch.where(9 other_rats.bool(),10 torch.where(i_rats.bool(), torch.tensor(float(lenient)), torch.tensor(10.0)),11 torch.where(i_rats.bool(), torch.tensor(float(lenient)), torch.tensor(0.0)),12 )13 percent_freedom = (10.0 - years) / 10.014 pyro.factor("soft", percent_freedom)15 return i_rats1617marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)18m = marg["i_rats"]19sup = m.enumerate_support()20probs = m.log_prob(sup).exp()21ANSWER = {bool(int(sup[i].item())): float(probs[i]) for i in range(len(sup))}22
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Three objects in the world, each described by a {shape, color} pair: {square, blue}, {circle, blue}, {square, green}, drawn with equal probability. Four possible utterances: 'blue', 'green', 'square', 'circle'. Truth function: an utterance about color is true iff it matches the object's color; an utterance about shape is true iff it matches the object's shape; all other utterances are vacuously true.
RSA (Rational Speech Acts) model with three levels. The literal listener infers the object by combining the uniform prior with the truth function. The speaker chooses utterances with probability proportional to exp(alpha * log P(object | utterance)) under the literal listener, where alpha is a rationality parameter. The pragmatic listener infers the object from the speaker's distribution, combining the uniform prior with the speaker's probability of the utterance.
The pragmatic listener's posterior distribution over objects given the utterance 'blue', computed for rationality parameters alpha = 0.01, 1, 4, and 10. Return a record with fields alpha_001, alpha_1, alpha_4, and alpha_10.
answer spec
{
"kind": "record",
"fields": {
"alpha_001": {
"kind": "dist",
"domain": "finite",
"support": [
{
"shape": "square",
"color": "blue"
},
{
"shape": "circle",
"color": "blue"
},
{
"shape": "square",
"color": "green"
}
]
},
"alpha_1": {
"kind": "dist",
"domain": "finite",
"support": [
{
"shape": "square",
"color": "blue"
},
{
"shape": "circle",
"color": "blue"
},
{
"shape": "square",
"color": "green"
}
]
},
"alpha_4": {
"kind": "dist",
"domain": "finite",
"support": [
{
"shape": "square",
"color": "blue"
},
{
"shape": "circle",
"color": "blue"
},
{
"shape": "square",
"color": "green"
}
]
},
"alpha_10": {
"kind": "dist",
"domain": "finite",
"support": [
{
"shape": "square",
"color": "blue"
},
{
"shape": "circle",
"color": "blue"
},
{
"shape": "square",
"color": "green"
}
]
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var meaningPrior = function() {2 uniformDraw([3 {shape: "square", color: "blue"},4 {shape: "circle", color: "blue"},5 {shape: "square", color: "green"}6 ])7};89var utterances = ["blue","green","square","circle"];1011var meaning = function(utterance, obj){12 (utterance === "blue" || utterance === "green") ? utterance === obj.color :13 (utterance === "circle" || utterance === "square") ? utterance === obj.shape :14 true15};1617var literalListener = function(utterance){18 return Infer({model: function(){19 var obj = meaningPrior();20 condition(meaning(utterance, obj));21 return obj;22 }});23};2425var speaker = function(obj,alpha){26 return Infer({model: function(){27 var utterance = uniformDraw(utterances);28 factor(alpha * literalListener(utterance).score(obj));29 return utterance;30 }});31};3233var pragmaticListener = function(utterance,alpha){34 return Infer({model: function(){35 var obj = meaningPrior();36 observe(speaker(obj,alpha),utterance);37 return obj;38 }});39};40var ANSWER = (({41 alpha_001: pragmaticListener("blue", 0.01),42 alpha_1: pragmaticListener("blue", 1),43 alpha_4: pragmaticListener("blue", 4),44 alpha_10: pragmaticListener("blue", 10)45}));46
1# RSA (Rational Speech Acts), three levels, faithful to the WebPPL reference.2# Each level is genuine Pyro enumeration inference over a single-sample model;3# inner-level distributions feed the outer level's pyro.factor via their log-prob.45objects = [6 {"shape": "square", "color": "blue"},7 {"shape": "circle", "color": "blue"},8 {"shape": "square", "color": "green"},9]10utterances = ["blue", "green", "square", "circle"]111213def meaning(utterance, obj):14 if utterance == "blue" or utterance == "green":15 return utterance == obj["color"]16 if utterance == "circle" or utterance == "square":17 return utterance == obj["shape"]18 return True192021def enum_dist(model, values):22 # Run exact enumeration over the single latent site 'x' (an index into23 # `values`) and return {value_index: probability} from the marginal.24 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(25 model, lambda: None26 )["x"]27 sup = marg.enumerate_support()28 probs = marg.log_prob(sup).exp()29 out = {}30 for s, p in zip(sup.tolist(), probs.tolist()):31 out[int(s)] = p32 return out333435def literal_listener(utterance):36 # uniform prior over objects, condition on the utterance being true37 @pyro.infer.config_enumerate38 def model():39 x = pyro.sample(40 "x", dist.Categorical(torch.ones(len(objects)))41 )42 truths = torch.tensor(43 [1.0 if meaning(utterance, o) else 0.0 for o in objects]44 )45 ev = torch.log(truths)[x]46 pyro.factor("ev", ev)47 return x4849 return enum_dist(model, objects)505152def speaker(obj_idx, alpha):53 # cache literal-listener log-scores of obj_idx under each utterance54 ll_scores = []55 for u in utterances:56 d = literal_listener(u)57 p = d.get(obj_idx, 0.0)58 ll_scores.append(math.log(p) if p > 0 else float("-inf"))59 ll_scores = torch.tensor(ll_scores)6061 @pyro.infer.config_enumerate62 def model():63 x = pyro.sample("x", dist.Categorical(torch.ones(len(utterances))))64 pyro.factor("ev", alpha * ll_scores[x])65 return x6667 return enum_dist(model, utterances)686970def pragmatic_listener(utterance, alpha):71 # uniform prior over objects, observe the speaker uttering `utterance`72 u_idx = utterances.index(utterance)73 # speaker(obj, alpha) log-prob of the heard utterance, per object74 sp_scores = []75 for i in range(len(objects)):76 d = speaker(i, alpha)77 p = d.get(u_idx, 0.0)78 sp_scores.append(math.log(p) if p > 0 else float("-inf"))79 sp_scores = torch.tensor(sp_scores)8081 @pyro.infer.config_enumerate82 def model():83 x = pyro.sample("x", dist.Categorical(torch.ones(len(objects))))84 pyro.factor("ev", sp_scores[x])85 return x8687 d = enum_dist(model, objects)88 return {i: d.get(i, 0.0) for i in range(len(objects))}899091def as_record_dist(idx_dist):92 out = {}93 for i, o in enumerate(objects):94 out[json.dumps(o, sort_keys=True)] = idx_dist.get(i, 0.0)95 return out969798ANSWER = {99 "alpha_001": as_record_dist(pragmatic_listener("blue", 0.01)),100 "alpha_1": as_record_dist(pragmatic_listener("blue", 1)),101 "alpha_4": as_record_dist(pragmatic_listener("blue", 4)),102 "alpha_10": as_record_dist(pragmatic_listener("blue", 10)),103}104
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Same three-object world as the standard RSA setup: objects {square, blue}, {circle, blue}, {square, green} drawn uniformly; utterances {blue, green, square, circle}; same truth function (color/shape match, vacuously true otherwise). Rationality parameter alpha = 1.
Two-level RSA stack built on top of the literal listener. The level-1 listener infers the object by combining the prior with a level-1 speaker; the level-1 speaker weights utterances by exp(alpha * log P(object | utterance)) under the literal listener. The level-2 listener infers the object from a level-2 speaker; the level-2 speaker weights utterances by exp(alpha * log P(object | utterance)) under the level-1 listener.
The posterior distributions over objects given the utterance 'blue' for the level-1 and level-2 listeners. Return a record with fields L1 and L2.
answer spec
{
"kind": "record",
"fields": {
"L1": {
"kind": "dist",
"domain": "finite",
"labels": {
"record": {
"shape": "string",
"color": "string"
}
}
},
"L2": {
"kind": "dist",
"domain": "finite",
"labels": {
"record": {
"shape": "string",
"color": "string"
}
}
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var meaningPrior = function() {2 uniformDraw([3 {shape: "square", color: "blue"},4 {shape: "circle", color: "blue"},5 {shape: "square", color: "green"}6 ])7};89var utterances = ["blue","green","square","circle"];1011var meaning = function(utterance, obj){12 (utterance === "blue" || utterance === "green") ? utterance === obj.color :13 (utterance === "circle" || utterance === "square") ? utterance === obj.shape :14 true15};1617var alpha = 1;1819var literalListener = function(utterance){20 return Infer({model: function(){21 var obj = meaningPrior();22 condition(meaning(utterance, obj));23 return obj;24 }});25};2627var speaker = function(obj){28 return Infer({model: function(){29 var utterance = uniformDraw(utterances);30 factor(alpha * literalListener(utterance).score(obj));31 return utterance;32 }});33};3435var pragmaticListener = function(utterance){36 return Infer({model: function(){37 var obj = meaningPrior();38 observe(speaker(obj),utterance);39 return obj;40 }});41};4243var speaker2 = function(obj){44 return Infer({model: function(){45 var utterance = uniformDraw(utterances);46 factor(alpha * pragmaticListener(utterance).score(obj));47 return utterance;48 }});49};5051var listener3 = function(utterance){52 return Infer({model: function(){53 var obj = meaningPrior();54 observe(speaker2(obj),utterance);55 return obj;56 }});57};58var ANSWER = (({59 L1: pragmaticListener("blue"),60 L2: listener3("blue")61}));62
1# RSA scalar-implicature model. Every level (literal listener, speaker, pragmatic2# listener, speaker2, listener3) is produced by exact Pyro enumeration3# (config_enumerate + TraceEnum_ELBO.compute_marginals). The answer is the4# pragmatic-listener (L1) and the level-3 listener (L2) posteriors over objects5# given utterance 'blue'.67objects = [8 {"shape": "square", "color": "blue"},9 {"shape": "circle", "color": "blue"},10 {"shape": "square", "color": "green"},11]12n_obj = len(objects)13utterances = ["blue", "green", "square", "circle"]14n_utt = len(utterances)15alpha = 1.0161718def meaning(utterance, obj):19 if utterance in ("blue", "green"):20 return utterance == obj["color"]21 if utterance in ("circle", "square"):22 return utterance == obj["shape"]23 return True242526def literal_listener_logprobs(utterance):27 # uniform prior over objects, condition on meaning(utterance, obj). Exact28 # enumeration over the object latent.29 holds = torch.tensor([1.0 if meaning(utterance, o) else 0.0 for o in objects])3031 @pyro.infer.config_enumerate32 def model():33 obj = pyro.sample("obj", dist.Categorical(torch.ones(n_obj) / n_obj))34 logp = torch.where(holds[obj] > 0.0, torch.tensor(0.0),35 torch.tensor(float("-inf")))36 pyro.factor("meaning", logp)37 return obj38 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)39 m = marg["obj"]40 sup = m.enumerate_support()41 logps = m.log_prob(sup)42 out = torch.full((n_obj,), float("-inf"))43 for s, lp in zip(sup.tolist(), logps.tolist()):44 out[int(s)] = lp45 return out464748# precompute literal listener scores: ll_score[utterance][obj]49_ll = [literal_listener_logprobs(u) for u in utterances]505152def speaker_logprobs(obj_idx):53 # enumerate utterance ~ uniform, factor(alpha * literalListener(utt).score(obj))54 scores = torch.tensor([alpha * _ll[u][obj_idx].item() for u in range(n_utt)])5556 @pyro.infer.config_enumerate57 def model():58 utt = pyro.sample("utt", dist.Categorical(torch.ones(n_utt) / n_utt))59 pyro.factor("f", scores[utt])60 return utt61 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)62 m = marg["utt"]63 sup = m.enumerate_support()64 logps = m.log_prob(sup)65 out = torch.full((n_utt,), float("-inf"))66 for s, lp in zip(sup.tolist(), logps.tolist()):67 out[int(s)] = lp68 return out697071_speaker = [speaker_logprobs(o) for o in range(n_obj)]727374def pragmatic_listener_probs(utterance):75 # obj ~ uniform; observe(speaker(obj), utterance). Enumerate obj.76 u_idx = utterances.index(utterance)77 obs_scores = torch.tensor([_speaker[o][u_idx].item() for o in range(n_obj)])7879 @pyro.infer.config_enumerate80 def model():81 obj = pyro.sample("obj", dist.Categorical(torch.ones(n_obj) / n_obj))82 pyro.factor("obs", obs_scores[obj])83 return obj84 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)85 m = marg["obj"]86 sup = m.enumerate_support()87 probs = m.log_prob(sup).exp()88 out = torch.zeros(n_obj)89 for s, p in zip(sup.tolist(), probs.tolist()):90 out[int(s)] = p91 return out929394def pragmatic_listener_logprobs(utterance):95 p = pragmatic_listener_probs(utterance)96 return torch.log(p.clamp_min(1e-300))979899# precompute pragmatic listener scores for speaker2: pl_score[utterance][obj]100_pl = [pragmatic_listener_logprobs(u) for u in utterances]101102103def speaker2_logprobs(obj_idx):104 scores = torch.tensor([alpha * _pl[u][obj_idx].item() for u in range(n_utt)])105106 @pyro.infer.config_enumerate107 def model():108 utt = pyro.sample("utt", dist.Categorical(torch.ones(n_utt) / n_utt))109 pyro.factor("f", scores[utt])110 return utt111 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)112 m = marg["utt"]113 sup = m.enumerate_support()114 logps = m.log_prob(sup)115 out = torch.full((n_utt,), float("-inf"))116 for s, lp in zip(sup.tolist(), logps.tolist()):117 out[int(s)] = lp118 return out119120121_speaker2 = [speaker2_logprobs(o) for o in range(n_obj)]122123124def listener3_probs(utterance):125 u_idx = utterances.index(utterance)126 obs_scores = torch.tensor([_speaker2[o][u_idx].item() for o in range(n_obj)])127128 @pyro.infer.config_enumerate129 def model():130 obj = pyro.sample("obj", dist.Categorical(torch.ones(n_obj) / n_obj))131 pyro.factor("obs", obs_scores[obj])132 return obj133 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)134 m = marg["obj"]135 sup = m.enumerate_support()136 probs = m.log_prob(sup).exp()137 out = torch.zeros(n_obj)138 for s, p in zip(sup.tolist(), probs.tolist()):139 out[int(s)] = p140 return out141142143def to_dist(probs):144 # key each outcome by its named-field record (sorted keys: color, shape),145 # serialized as compact JSON to match the harness's label space.146 d = {}147 for i, o in enumerate(objects):148 key = '{"color": "%s", "shape": "%s"}' % (o["color"], o["shape"])149 d[key] = float(probs[i].item())150 return d151152153L1 = to_dist(pragmatic_listener_probs("blue"))154L2 = to_dist(listener3_probs("blue"))155ANSWER = {"L1": L1, "L2": L2}156
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Observed data: k=1 success in n=20 Bernoulli trials. Prior on the success probability p: Beta(a=1, b=1). A new experiment has new_n=5 trials.
The success probability p is drawn from the prior. The observed count k is generated from Binomial(p, n). A posterior-predictive count is the number of successes in a fresh Binomial(p, new_n) draw using the same p.
The marginal posterior distribution over the posterior-predictive count (an integer 0 through 5).
answer spec
{
"kind": "dist",
"domain": "int"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var k = 1;2var n = 20;3var new_n = 5;4var priorDist = Beta({a: 1, b: 1});56var model = function() {7 var p = sample(priorDist);8 observe(Binomial({p : p, n: n}), k);9 var posteriorPredictive = binomial(p, new_n);10 var prior_p = sample(priorDist);11 var priorPredictive = binomial(prior_p, n);12 return {13 prior: prior_p, priorPredictive : priorPredictive,14 posterior : p, posteriorPredictive : posteriorPredictive15 };16};17var joint = Infer({method: "MCMC", samples: 2500, lag: 50}, model);18var ANSWER = marginalize(joint, function(x) { return x.posteriorPredictive; });
1k = 12n = 203new_n = 54prior_dist = dist.Beta(torch.tensor(1.0), torch.tensor(1.0))56def model():7 p = pyro.sample("p", prior_dist)8 pyro.sample("obs", dist.Binomial(total_count=n, probs=p), obs=torch.tensor(float(k)))910nuts = pyro.infer.NUTS(model)11mcmc = pyro.infer.MCMC(nuts, num_samples=2500, warmup_steps=1000)12mcmc.run()13_p_samples = mcmc.get_samples()["p"]1415# Posterior predictive: a fresh Binomial(p, new_n) draw per posterior p sample.16_pp = dist.Binomial(total_count=new_n, probs=_p_samples).sample()17ANSWER = [int(x) for x in _pp.tolist()]18
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0364 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.026, 0.026] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0220 ≤ tol 0.0896 · floors 0.0448/0.0364 |
Cancer occurs with probability 0.00001. Given cancer, death from cancer occurs with probability 0.9. The common cold occurs with probability 0.2; given a cold, death from the cold occurs with probability 0.00006. Death from other causes (independent of cancer and cold) occurs with probability 0.000000001. A person dies if they die from cancer, from the cold, or from other causes.
Cancer, cold, and other-cause death are drawn independently from their priors. Death from cancer requires having cancer; death from the cold requires having a cold. The person dies if any cause of death occurs.
A record of four posterior distributions over whether the person has cancer: prior (unconditional); death (given the person died); deathAndCold (given the person died and had a cold); deathAndNoCold (given the person died and did not have a cold).
answer spec
{
"kind": "record",
"fields": {
"prior": {
"kind": "dist",
"domain": "bool"
},
"death": {
"kind": "dist",
"domain": "bool"
},
"deathAndCold": {
"kind": "dist",
"domain": "bool"
},
"deathAndNoCold": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (({2 prior: Infer({method: 'enumerate'}, function() {3var cancer = flip(0.00001);4var cold = flip(0.2);5var death_by_cancer = cancer && flip(0.9);6var death_by_cold = cold && flip(0.00006);7var other_death = flip(0.000000001);8var death = death_by_cancer || death_by_cold || other_death;9 return cancer;10 }),11 death: Infer({method: 'enumerate'}, function() {12var cancer = flip(0.00001);13var cold = flip(0.2);14var death_by_cancer = cancer && flip(0.9);15var death_by_cold = cold && flip(0.00006);16var other_death = flip(0.000000001);17var death = death_by_cancer || death_by_cold || other_death;18 condition(death);19 return cancer;20 }),21 deathAndCold: Infer({method: 'enumerate'}, function() {22var cancer = flip(0.00001);23var cold = flip(0.2);24var death_by_cancer = cancer && flip(0.9);25var death_by_cold = cold && flip(0.00006);26var other_death = flip(0.000000001);27var death = death_by_cancer || death_by_cold || other_death;28 condition(death && cold);29 return cancer;30 }),31 deathAndNoCold: Infer({method: 'enumerate'}, function() {32var cancer = flip(0.00001);33var cold = flip(0.2);34var death_by_cancer = cancer && flip(0.9);35var death_by_cold = cold && flip(0.00006);36var other_death = flip(0.000000001);37var death = death_by_cancer || death_by_cold || other_death;38 condition(death && !cold);39 return cancer;40 })41}));42
1def make_model(cond):2 @pyro.infer.config_enumerate3 def model():4 cancer = pyro.sample('cancer', dist.Bernoulli(0.00001)).bool()5 cold = pyro.sample('cold', dist.Bernoulli(0.2)).bool()6 dbc_coin = pyro.sample('dbc', dist.Bernoulli(0.9)).bool()7 dbcold_coin = pyro.sample('dbcold', dist.Bernoulli(0.00006)).bool()8 other = pyro.sample('other', dist.Bernoulli(0.000000001)).bool()9 death_by_cancer = cancer & dbc_coin10 death_by_cold = cold & dbcold_coin11 death = death_by_cancer | death_by_cold | other12 ev = cond(death, cold)13 pyro.factor('ev', torch.where(ev, torch.tensor(0.0), torch.tensor(float('-inf'))))14 return cancer15 return model161718def marginal_bool(cond):19 model = make_model(cond)20 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)21 m = marg['cancer']22 sup = m.enumerate_support()23 probs = m.log_prob(sup).exp()24 out = {}25 for v, p in zip(sup.tolist(), probs.tolist()):26 out[bool(v)] = out.get(bool(v), 0.0) + p27 return out282930ANSWER = {31 'prior': marginal_bool(lambda death, cold: torch.tensor(True)),32 'death': marginal_bool(lambda death, cold: death),33 'deathAndCold': marginal_bool(lambda death, cold: death & cold),34 'deathAndNoCold': marginal_bool(lambda death, cold: death & (~cold)),35}36
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Cancer occurs with probability 0.00001. Given cancer, death from cancer occurs with probability 0.9. The common cold occurs with probability 0.2; given a cold, death from the cold occurs with probability 0.00006. Death from other causes (independent of cancer and cold) occurs with probability 0.000000001. A person dies if they die from cancer, from the cold, or from other causes.
Cancer, cold, and other-cause death are drawn independently from their priors. Death from cancer requires having cancer; death from the cold requires having a cold. The person dies if any cause of death occurs.
A record of four posterior distributions over whether the person has a cold: prior (unconditional); death (given the person died); deathAndCancer (given the person died and had cancer); deathAndNoCancer (given the person died and did not have cancer).
answer spec
{
"kind": "record",
"fields": {
"prior": {
"kind": "dist",
"domain": "bool"
},
"death": {
"kind": "dist",
"domain": "bool"
},
"deathAndCancer": {
"kind": "dist",
"domain": "bool"
},
"deathAndNoCancer": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (({2 prior: Infer({method: 'enumerate'}, function() {3var cancer = flip(0.00001);4var cold = flip(0.2);5var death_by_cancer = cancer && flip(0.9);6var death_by_cold = cold && flip(0.00006);7var other_death = flip(0.000000001);8var death = death_by_cancer || death_by_cold || other_death;9 return cold;10 }),11 death: Infer({method: 'enumerate'}, function() {12var cancer = flip(0.00001);13var cold = flip(0.2);14var death_by_cancer = cancer && flip(0.9);15var death_by_cold = cold && flip(0.00006);16var other_death = flip(0.000000001);17var death = death_by_cancer || death_by_cold || other_death;18 condition(death);19 return cold;20 }),21 deathAndCancer: Infer({method: 'enumerate'}, function() {22var cancer = flip(0.00001);23var cold = flip(0.2);24var death_by_cancer = cancer && flip(0.9);25var death_by_cold = cold && flip(0.00006);26var other_death = flip(0.000000001);27var death = death_by_cancer || death_by_cold || other_death;28 condition(death && cancer);29 return cold;30 }),31 deathAndNoCancer: Infer({method: 'enumerate'}, function() {32var cancer = flip(0.00001);33var cold = flip(0.2);34var death_by_cancer = cancer && flip(0.9);35var death_by_cold = cold && flip(0.00006);36var other_death = flip(0.000000001);37var death = death_by_cancer || death_by_cold || other_death;38 condition(death && !cancer);39 return cold;40 })41}));42
1# Cold / cancer / death model. Four posteriors over `cold` under different2# conditioning, each produced by exact Pyro discrete enumeration3# (config_enumerate + TraceEnum_ELBO.compute_marginals).45def cold_posterior(condition_fn):6 @pyro.infer.config_enumerate7 def model():8 cancer = pyro.sample("cancer", dist.Bernoulli(0.00001))9 cold = pyro.sample("cold", dist.Bernoulli(0.2))10 dbc_flip = pyro.sample("dbc", dist.Bernoulli(0.9))11 dbcold_flip = pyro.sample("dbcold", dist.Bernoulli(0.00006))12 other = pyro.sample("other", dist.Bernoulli(0.000000001))13 cancer_b = cancer.bool()14 cold_b = cold.bool()15 death_by_cancer = cancer_b & dbc_flip.bool()16 death_by_cold = cold_b & dbcold_flip.bool()17 other_death = other.bool()18 death = death_by_cancer | death_by_cold | other_death19 ev = condition_fn(death, cancer_b)20 if ev is not None:21 pyro.factor("cond", torch.where(ev, torch.tensor(0.0),22 torch.tensor(float("-inf"))))23 return cold24 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)25 m = marg["cold"]26 sup = m.enumerate_support()27 probs = m.log_prob(sup).exp()28 out = {}29 for s, p in zip(sup.tolist(), probs.tolist()):30 out[bool(int(s))] = p31 return out323334prior = cold_posterior(lambda death, cancer: None)35death = cold_posterior(lambda death, cancer: death)36deathAndCancer = cold_posterior(lambda death, cancer: death & cancer)37deathAndNoCancer = cold_posterior(lambda death, cancer: death & ~cancer)3839ANSWER = {40 "prior": prior,41 "death": death,42 "deathAndCancer": deathAndCancer,43 "deathAndNoCancer": deathAndNoCancer,44}45
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A fair coin has probability 0.5 of landing heads.
A single fair coin is flipped once.
The probability that the coin lands heads.
answer spec
{
"kind": "value",
"domain": "real"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var model = function() { return flip() ? "H" : "T" };2var ANSWER = (Math.exp(Infer({method:'enumerate'}, model).score('H')));
1@pyro.infer.config_enumerate2def model():3 h = pyro.sample('h', dist.Bernoulli(0.5))4 return h567marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)8m = marg['h']9sup = m.enumerate_support()10probs = m.log_prob(sup).exp()11p_heads = 0.012for v, p in zip(sup.tolist(), probs.tolist()):13 if bool(v):14 p_heads += p15ANSWER = p_heads16
0.5000
0.5000
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (absdiff) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are two coins: one fair (P(heads) = 0.5) and one biased (P(heads) = 0.9). One coin is selected uniformly at random. The selected coin is flipped three times independently. The first two flips both landed heads.
A coin type (fair or biased) is drawn with equal probability. Each flip of the selected coin is independent with heads probability determined by the coin type. The first two flips are observed to be heads.
The posterior distribution over whether the third flip lands heads (true) or tails (false).
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var flipCoin = function(coinType) {2 return coinType == "fair" ? flip() : flip(0.9);3}4var model = function() {5 var coinType = flip() ? "fair" : "biased";6 var flip1 = flipCoin(coinType);7 var flip2 = flipCoin(coinType);8 var flip3 = flipCoin(coinType);9 condition(flip1 && flip2);10 return flip3;11};12var ANSWER = (Infer({method:'enumerate'}, model));13
1@pyro.infer.config_enumerate2def model():3 fair = pyro.sample("fair", dist.Bernoulli(0.5)).bool()4 p = torch.where(fair, torch.tensor(0.5), torch.tensor(0.9))5 flip1 = pyro.sample("flip1", dist.Bernoulli(p)).bool()6 flip2 = pyro.sample("flip2", dist.Bernoulli(p)).bool()7 pyro.sample("flip3", dist.Bernoulli(p))8 ev = flip1 & flip29 pyro.factor("cond", torch.where(ev, torch.tensor(0.0), torch.tensor(float("-inf"))))1011marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)12m = marg["flip3"]13sup = m.enumerate_support()14probs = m.log_prob(sup).exp()15ANSWER = {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}16
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are two coins: one fair (P(heads) = 0.5) and one biased (P(heads) = 0.9). One coin is selected uniformly at random. The selected coin is flipped three times independently. All three flips landed heads.
A coin type (fair or biased) is drawn with equal probability. Each flip of the selected coin is independent with heads probability determined by the coin type. All three flips are observed to be heads.
The posterior distribution over the coin type — the string 'fair' or the string 'biased'.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"fair",
"biased"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var flipCoin = function(coinType) {2 return coinType == "fair" ? flip() : flip(0.9);3}4var model = function() {5 var coinType = flip() ? "fair" : "biased";6 var flip1 = flipCoin(coinType);7 var flip2 = flipCoin(coinType);8 var flip3 = flipCoin(coinType);9 condition(flip1 && flip2 && flip3);10 return coinType;11};12var ANSWER = (Infer({method:'enumerate'}, model));13
1# Two coins (fair p=0.5, biased p=0.9), one chosen uniformly, three flips all2# heads. Posterior over coin type by exact enumeration.34coin_types = ["fair", "biased"]5heads_p = {"fair": 0.5, "biased": 0.9}678@pyro.infer.config_enumerate9def model():10 t = pyro.sample("coinType", dist.Categorical(torch.tensor([0.5, 0.5])))11 p = torch.tensor([heads_p["fair"], heads_p["biased"]])[t]12 pyro.sample("flip1", dist.Bernoulli(p), obs=torch.tensor(1.0))13 pyro.sample("flip2", dist.Bernoulli(p), obs=torch.tensor(1.0))14 pyro.sample("flip3", dist.Bernoulli(p), obs=torch.tensor(1.0))15 return t161718marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(19 model, lambda: None20)21type_marg = marg["coinType"]22ANSWER = {23 coin_types[i]: torch.exp(type_marg.log_prob(torch.tensor(i))).item()24 for i in range(len(coin_types))25}26
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are two coins: one fair (P(heads) = 0.5) and one biased (P(heads) = 0.9). One coin is selected uniformly at random. The selected coin is flipped three times independently. The first two flips landed on different sides (one heads and one tails).
A coin type (fair or biased) is drawn with equal probability. Each flip of the selected coin is independent with heads probability determined by the coin type. The first two flips are observed to have different outcomes.
The posterior distribution over whether the third flip lands heads (true) or tails (false).
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var flipCoin = function(coinType) {2 return coinType == "fair" ? flip() : flip(0.9);3}4var model = function() {5 var coinType = flip() ? "fair" : "biased";6 var flip1 = flipCoin(coinType);7 var flip2 = flipCoin(coinType);8 var flip3 = flipCoin(coinType);9 condition(flip1 != flip2);10 return flip3;11};12var ANSWER = (Infer({method:'enumerate'}, model));13
1# Two coins (fair p=0.5, biased p=0.9), pick one uniformly, flip 3 times,2# condition flip1 != flip2, query distribution over flip3. Exact enumeration.345@pyro.infer.config_enumerate6def model():7 # coinType: 0 = fair (p=0.5), 1 = biased (p=0.9)8 coin = pyro.sample("coin", dist.Categorical(torch.tensor([0.5, 0.5])))9 p = torch.where(coin == 0, torch.tensor(0.5), torch.tensor(0.9))10 f1 = pyro.sample("f1", dist.Bernoulli(p))11 f2 = pyro.sample("f2", dist.Bernoulli(p))12 f3 = pyro.sample("f3", dist.Bernoulli(p))13 diff = f1 != f214 pyro.factor(15 "ev",16 torch.where(diff, torch.tensor(0.0), torch.tensor(float("-inf"))),17 )18 return f3192021marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(22 model, lambda: None23)["f3"]24sup = marg.enumerate_support()25probs = marg.log_prob(sup).exp()26ANSWER = {}27for s, p in zip(sup.tolist(), probs.tolist()):28 ANSWER[bool(s)] = p29
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Lung cancer is present with prior probability 0.01. A cold is present with prior probability 0.2. A cough occurs if: a cold is present and a cough-given-cold flip comes up (probability 0.5), or lung cancer is present and a cough-given-cancer flip comes up (probability 0.3). These two pathways are combined as a logical OR.
Lung cancer and cold are independent binary causes of coughing. Each cause contributes independently to producing a cough via its own noisy channel, and a cough results if either channel fires.
Return a record with three fields, each a distribution over whether a cough occurs: (1) `original` — the unconditional marginal of cough; (2) `intervention` — the marginal of cough after setting lung cancer to true (an intervention, not conditioning); (3) `conditioning` — the marginal of cough after observing that lung cancer is true (an observation, updating beliefs).
answer spec
{
"kind": "record",
"fields": {
"original": {
"kind": "dist",
"domain": "bool"
},
"intervention": {
"kind": "dist",
"domain": "bool"
},
"conditioning": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (({2 original: Infer({method: "enumerate"}, function() {3 var lungCancer = flip(0.01);4 var cold = flip(0.2);5 var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));6 return cough;7 }),8 intervention: Infer({method: "enumerate"}, function() {9 var lungCancer = true;10 var cold = flip(0.2);11 var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));12 return cough;13 }),14 conditioning: Infer({method: "enumerate"}, function() {15 var lungCancer = flip(0.01);16 condition(lungCancer);17 var cold = flip(0.2);18 var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));19 return cough;20 })21}));
1# probmods2-conditioning/ex2.a2# Three marginals over `cough`:3# original : unconditional4# intervention : lungCancer set to true (do-operation; lungCancer not a random choice)5# conditioning : lungCancer observed true (updates beliefs)6# Exact discrete enumeration via config_enumerate + compute_marginals. `cough` is7# made a genuine discrete sample site (a degenerate Bernoulli on its deterministic8# value) so compute_marginals returns a marginal for it.910ZERO = torch.tensor(0.0)11NEG_INF = torch.tensor(float("-inf"))1213def original_model():14 lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01)).bool()15 cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()16 c1 = pyro.sample("c1", dist.Bernoulli(0.5)).bool()17 c2 = pyro.sample("c2", dist.Bernoulli(0.3)).bool()18 cough = (cold & c1) | (lungCancer & c2)19 pyro.sample("cough", dist.Bernoulli(cough.double()))20 return cough2122def intervention_model():23 # intervention: lungCancer is fixed to true, not a random choice24 lungCancer = torch.tensor(True)25 cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()26 c1 = pyro.sample("c1", dist.Bernoulli(0.5)).bool()27 c2 = pyro.sample("c2", dist.Bernoulli(0.3)).bool()28 cough = (cold & c1) | (lungCancer & c2)29 pyro.sample("cough", dist.Bernoulli(cough.double()))30 return cough3132def conditioning_model():33 lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01)).bool()34 pyro.factor("obs_lc", torch.where(lungCancer, ZERO, NEG_INF))35 cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()36 c1 = pyro.sample("c1", dist.Bernoulli(0.5)).bool()37 c2 = pyro.sample("c2", dist.Bernoulli(0.3)).bool()38 cough = (cold & c1) | (lungCancer & c2)39 pyro.sample("cough", dist.Bernoulli(cough.double()))40 return cough4142def marginal_of(model_fn):43 enum = pyro.infer.config_enumerate(model_fn)44 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(enum, lambda: None)45 m = marg["cough"]46 sup = m.enumerate_support()47 probs = m.log_prob(sup).exp()48 return {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}4950ANSWER = {51 "original": marginal_of(original_model),52 "intervention": marginal_of(intervention_model),53 "conditioning": marginal_of(conditioning_model),54}55
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A person has lung cancer with probability 0.01, independently has a cold with probability 0.2. Given lung cancer, the person coughs with probability 0.3; given a cold, the person coughs with probability 0.5. Both causes contribute to coughing independently: the person coughs if either causal pathway fires. Coughing is observed to be true.
Lung cancer and cold are independent latent causes. Coughing occurs if at least one of the following independent events occurs: the lung-cancer pathway fires (probability 0.3 given lung cancer) or the cold pathway fires (probability 0.5 given a cold). We compare three scenarios for the same underlying system: (1) no observations, (2) coughing is directly forced to be true regardless of its causes (do-operator intervention — the causal parents are unaffected), (3) coughing is observed to be true (conditioning, which propagates information back to the causes).
A record with three fields: 'original' — the prior marginal distribution over lung cancer; 'intervention' — the marginal distribution over lung cancer when coughing is forced to true without updating the causal parents; 'conditioning' — the posterior marginal distribution over lung cancer given coughing is observed to be true.
answer spec
{
"kind": "record",
"fields": {
"original": {
"kind": "dist",
"domain": "bool"
},
"intervention": {
"kind": "dist",
"domain": "bool"
},
"conditioning": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (({2 original: Infer({method: "enumerate"}, function() {3 var lungCancer = flip(0.01);4 var cold = flip(0.2);5 var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));6 return lungCancer;7 }),8 intervention: Infer({method: "enumerate"}, function() {9 var lungCancer = flip(0.01);10 var cold = flip(0.2);11 var cough = true;12 return lungCancer;13 }),14 conditioning: Infer({method: "enumerate"}, function() {15 var lungCancer = flip(0.01);16 var cold = flip(0.2);17 var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));18 condition(cough);19 return lungCancer;20 })21}));22
1@pyro.infer.config_enumerate2def original_model():3 lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01))4 cold = pyro.sample("cold", dist.Bernoulli(0.2))5 return lungCancer67@pyro.infer.config_enumerate8def intervention_model():9 lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01))10 cold = pyro.sample("cold", dist.Bernoulli(0.2))11 # cough forced to true without informing the causal parents (do-operator)12 return lungCancer1314@pyro.infer.config_enumerate15def conditioning_model():16 lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01))17 cold = pyro.sample("cold", dist.Bernoulli(0.2))18 cold_fires = pyro.sample("cold_fires", dist.Bernoulli(0.5))19 lung_fires = pyro.sample("lung_fires", dist.Bernoulli(0.3))20 cough = ((cold > 0) & (cold_fires > 0)) | ((lungCancer > 0) & (lung_fires > 0))21 pyro.factor("cough_obs", torch.where(cough, torch.tensor(0.0), torch.tensor(float("-inf"))))22 return lungCancer2324def _marg_bool(m, site):25 p_true = m[site].log_prob(torch.tensor(1.0)).exp().item()26 return {False: 1.0 - p_true, True: p_true}2728_orig = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(original_model, lambda: None)29_intv = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(intervention_model, lambda: None)30_cond = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(conditioning_model, lambda: None)3132ANSWER = {33 "original": _marg_bool(_orig, "lungCancer"),34 "intervention": _marg_bool(_intv, "lungCancer"),35 "conditioning": _marg_bool(_cond, "lungCancer"),36}37
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A person is nice with probability 0.7 (a stable, person-specific trait). A nice person wants something from you with probability 0.2; a non-nice person wants something from you with probability 0.5 (varies per occasion). A person smiles if either of two independent channels fires: (a) if they want something, they smile with probability 0.8; otherwise with probability 0.5; (b) if they are nice, they smile with probability 0.8; otherwise with probability 0.5. A smile occurs if at least one of these two independent channels produces a smile (logical OR).
Niceness is a latent stable trait. On each occasion, a person independently may or may not want something, depending on their niceness. Whether they smile is determined by the OR of two independent smile-generating channels, one driven by wanting and one by niceness.
The marginal distribution over whether Alice smiles on a given occasion, with no observations.
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var extendedSmilesModel = function() {2 var nice = mem(function(person) { flip(.7) });3 var wantsSomething = function(person) {4 return flip(nice(person) ? .2 : .5);5 }6 var smiles = function(person, wants) {7 return (wants ? flip(.8) : flip(.5))8 || (nice(person) ? flip(.8) : flip(.5));9 }10 var wants = wantsSomething('alice');11 return smiles('alice', wants);12};13var ANSWER = (Infer({method: "enumerate"}, extendedSmilesModel));
1# probmods2-conditioning/ex4.b2# Niceness is a stable trait; wanting depends on niceness; smiling is the OR of3# two independent channels (wanting-driven and niceness-driven).4# Marginal over whether Alice smiles, no observations. Exact enumeration.56@pyro.infer.config_enumerate7def model():8 nice = pyro.sample("nice", dist.Bernoulli(0.7))9 p_wants = torch.where(nice == 1.0, torch.tensor(0.2), torch.tensor(0.5))10 wants = pyro.sample("wants", dist.Bernoulli(p_wants))11 p_chan_want = torch.where(wants == 1.0, torch.tensor(0.8), torch.tensor(0.5))12 p_chan_nice = torch.where(nice == 1.0, torch.tensor(0.8), torch.tensor(0.5))13 chan_want = pyro.sample("chan_want", dist.Bernoulli(p_chan_want))14 chan_nice = pyro.sample("chan_nice", dist.Bernoulli(p_chan_nice))15 smiles = (chan_want == 1.0) | (chan_nice == 1.0)16 probs = torch.stack([(~smiles).double(), smiles.double()], dim=-1)17 pyro.sample("smiles", dist.Categorical(probs))1819marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)20p = marg["smiles"].probs.detach()21ANSWER = {False: float(p[0].item()), True: float(p[1].item())}22
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A person's niceness is a stable trait: P(nice) = 0.7. Whether the person wants something from you on a given day is independent across days: P(wants | nice) = 0.2, P(wants | not nice) = 0.5. Given whether the person wants something and whether they are nice, they smile with probability determined as follows: both the 'wanting' channel and the 'niceness' channel independently produce a smile (the person smiles if either channel fires). The wanting channel fires with probability 0.8 if they want something, 0.5 otherwise. The niceness channel fires with probability 0.8 if they are nice, 0.5 otherwise. You have observed the person on five previous days, and on each of those days the person was not smiling; each day's wanting was independently drawn from the prior. Today you observe the person smiling; today's wanting is independently drawn from the prior.
Niceness is a fixed latent trait drawn once from the prior. Each day's wanting is drawn independently from the conditional prior given niceness. Smiling on a day is the logical OR of the two independent channels (wanting-based and niceness-based). The five past non-smiling observations and today's smiling observation are all conditioned on.
The posterior distribution over whether the person wants something from you today (true or false).
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var extendedSmilesModel = function() {2 var nice = mem(function(person) { flip(.7) });3 var wantsSomething = function(person) {4 return flip(nice(person) ? .2 : .5);5 }6 var smiles = function(person, wants) {7 return (wants ? flip(.8) : flip(.5))8 || (nice(person) ? flip(.8) : flip(.5));9 }10 var wantsToday = wantsSomething('bob');11 condition(!smiles('bob', wantsSomething('bob')));12 condition(!smiles('bob', wantsSomething('bob')));13 condition(!smiles('bob', wantsSomething('bob')));14 condition(!smiles('bob', wantsSomething('bob')));15 condition(!smiles('bob', wantsSomething('bob')));16 condition(smiles('bob', wantsToday));17 return wantsToday;18};19var ANSWER = (Infer({method: "enumerate"}, extendedSmilesModel));20
1# probmods2-conditioning/ex4.c2# `nice('bob')` is memoized in WebPPL (one draw, reused everywhere), so it is a3# single sample site. `wantsSomething('bob')` is NOT memoized: each call is a4# fresh draw. Five days of not-smiling (each a fresh wantsSomething draw), then5# today's smile evaluated at wantsToday. Exact enumeration; `wantsToday` is made6# a genuine discrete sample site so compute_marginals returns its marginal.78ZERO = torch.tensor(0.0)9NEG_INF = torch.tensor(float("-inf"))1011@pyro.infer.config_enumerate12def model():13 nice = pyro.sample("nice", dist.Bernoulli(0.7)).bool()1415 def wants_something(name):16 p = torch.where(nice, torch.tensor(0.2), torch.tensor(0.5))17 return pyro.sample(name, dist.Bernoulli(p)).bool()1819 def smiles(tag, wants):20 pw = torch.where(wants, torch.tensor(0.8), torch.tensor(0.5))21 a = pyro.sample(tag + "_a", dist.Bernoulli(pw)).bool()22 pn = torch.where(nice, torch.tensor(0.8), torch.tensor(0.5))23 b = pyro.sample(tag + "_b", dist.Bernoulli(pn)).bool()24 return a | b2526 wantsToday = wants_something("wantsToday")2728 # five days of NOT smiling, each with a fresh wantsSomething draw29 for i in range(5):30 w = wants_something(f"w{i}")31 s = smiles(f"day{i}", w)32 pyro.factor(f"obs{i}", torch.where(~s, ZERO, NEG_INF))3334 # today: smiles, evaluated at wantsToday35 s_today = smiles("today", wantsToday)36 pyro.factor("obs_today", torch.where(s_today, ZERO, NEG_INF))3738 pyro.sample("wantsToday_out", dist.Bernoulli(wantsToday.double()))3940marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)41m = marg["wantsToday_out"]42sup = m.enumerate_support()43probs = m.log_prob(sup).exp()44ANSWER = {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}45
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A sprinkler runs on any given morning with probability 0.5, independently. It rains on any given morning with probability 0.3, independently. The lawn is wet if the sprinkler ran, if it rained, or if both occurred. One morning the lawn is observed to be wet.
Rain and sprinkler are independent Bernoulli events. The lawn is wet if and only if at least one of them occurred. The lawn being wet is observed.
A record with two fields: 'rain' — the posterior distribution over whether it rained (true/false); 'sprinkler' — the posterior distribution over whether the sprinkler ran (true/false).
answer spec
{
"kind": "record",
"fields": {
"rain": {
"kind": "dist",
"domain": "bool"
},
"sprinkler": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (({2 rain: Infer({method: "enumerate"}, function() {3 var sprinkler = flip();4 var rain = flip(0.3);5 var wetLawn = sprinkler || rain;6 condition(wetLawn);7 return rain;8 }),9 sprinkler: Infer({method: "enumerate"}, function() {10 var sprinkler = flip();11 var rain = flip(0.3);12 var wetLawn = sprinkler || rain;13 condition(wetLawn);14 return sprinkler;15 })16}));17
1def make_model(ret):2 @pyro.infer.config_enumerate3 def model():4 sprinkler = pyro.sample('sprinkler', dist.Bernoulli(0.5)).bool()5 rain = pyro.sample('rain', dist.Bernoulli(0.3)).bool()6 wet = sprinkler | rain7 pyro.factor('wet', torch.where(wet, torch.tensor(0.0), torch.tensor(float('-inf'))))8 return ret(rain, sprinkler)9 return model101112def marginal_bool(site, ret):13 model = make_model(ret)14 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)15 m = marg[site]16 sup = m.enumerate_support()17 probs = m.log_prob(sup).exp()18 out = {}19 for v, p in zip(sup.tolist(), probs.tolist()):20 out[bool(v)] = out.get(bool(v), 0.0) + p21 return out222324ANSWER = {25 'rain': marginal_bool('rain', lambda rain, sprinkler: rain),26 'sprinkler': marginal_bool('sprinkler', lambda rain, sprinkler: sprinkler),27}28
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Rain falls on a given morning with probability 0.3. Two people (me and Kelsey) each have their own sprinkler; each sprinkler turns on independently with probability 0.5. One morning both lawns are wet.
A lawn is wet if rain falls that morning or if that lawn's sprinkler runs; rain affects both lawns simultaneously, while each sprinkler affects only its own lawn.
The posterior distribution over whether it rained that morning.
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (Infer({method: "enumerate"}, function() {2 var rain = flip(0.3);3 var mySprinkler = flip();4 var herSprinkler = flip();5 var myLawnIsWet = mySprinkler || rain;6 var herLawnIsWet = herSprinkler || rain;7 condition(myLawnIsWet && herLawnIsWet);8 return rain;9}));10
1@pyro.infer.config_enumerate2def model():3 rain = pyro.sample("rain", dist.Bernoulli(0.3)).bool()4 mySprinkler = pyro.sample("mySprinkler", dist.Bernoulli(0.5)).bool()5 herSprinkler = pyro.sample("herSprinkler", dist.Bernoulli(0.5)).bool()6 myLawnIsWet = mySprinkler | rain7 herLawnIsWet = herSprinkler | rain8 ev = myLawnIsWet & herLawnIsWet9 pyro.factor("cond", torch.where(ev, torch.tensor(0.0), torch.tensor(float("-inf"))))1011marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)12m = marg["rain"]13sup = m.enumerate_support()14probs = m.log_prob(sup).exp()15ANSWER = {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}16
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Rain falls on a given morning with probability 0.3. Five people — me, Kelsey, Kevin, Manu, and Josh — each have an independent sprinkler that runs with probability 0.5. One morning all five lawns are wet.
A lawn is wet if rain falls or if that lawn's sprinkler runs; rain affects all lawns simultaneously, while each sprinkler affects only its own lawn.
The posterior distribution over whether it rained that morning.
answer spec
{
"kind": "dist",
"domain": "bool"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (Infer({method: "enumerate"}, function() {2 var rain = flip(0.3);3 var sprinkler = mem(function(person) { return flip() });4 var wetLawn = function(person) { return rain || sprinkler(person) };5 condition(wetLawn("me"));6 condition(wetLawn("Kelsey"));7 condition(wetLawn("Kevin"));8 condition(wetLawn("Manu"));9 condition(wetLawn("Josh"));10 return rain;11}));12
1# Rain (p=0.3) plus five independent sprinklers (p=0.5 each); all five lawns2# wet. A lawn is wet if rain OR its own sprinkler runs. Posterior over rain by3# exact enumeration.45people = ["me", "Kelsey", "Kevin", "Manu", "Josh"]678@pyro.infer.config_enumerate9def model():10 rain = pyro.sample("rain", dist.Bernoulli(0.3))11 sprinklers = [12 pyro.sample(f"sprinkler_{p}", dist.Bernoulli(0.5)) for p in people13 ]14 for p, s in zip(people, sprinklers):15 wet = ((rain > 0) | (s > 0)).float()16 logw = torch.where(17 wet > 0, torch.tensor(0.0), torch.tensor(float("-inf"))18 )19 pyro.factor(f"wet_{p}", logw)20 return rain212223marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(24 model, lambda: None25)26rain_marg = marg["rain"]27ANSWER = {28 True: torch.exp(rain_marg.log_prob(torch.tensor(1.0))).item(),29 False: torch.exp(rain_marg.log_prob(torch.tensor(0.0))).item(),30}31
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A machine draws one letter at random from the word "game": vowels (a, e) are drawn with probability 0.45 each, and consonants (g, m) with probability 0.05 each. Bob's probability of winning given the drawn letter is 1/k^2, where k is that letter's 1-based position in the string "game" (g=1, a=2, m=3, e=4).
One letter is sampled from the distribution above. Bob independently wins or loses with probability 1/k^2 based on the letter's position. We observe that Bob won.
The posterior distribution over which letter was drawn, given that Bob won.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"g",
"a",
"m",
"e"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var checkVowel = function(letter) { _.includes(['a', 'e', 'i', 'o', 'u'], letter) };2var letterVals = ['g', 'a', 'm', 'e'];3var letterProbs = map(function(letter) { checkVowel(letter) ? 0.45 : 0.05 }, letterVals);4var letters = Categorical({vs: letterVals, ps: letterProbs});5var ANSWER = (Infer({method: 'enumerate'}, function() {6 var letter = sample(letters);7 var position = letterVals.indexOf(letter) + 1;8 var winProb = 1 / Math.pow(position, 2);9 condition(flip(winProb));10 return letter;11}));
1# Letter drawn from 'game' (vowels a,e at 0.45; consonants g,m at 0.05),2# Bob wins with prob 1/k^2 (k = 1-based position). Condition on Bob won.3# Query: posterior over the drawn letter. Exact enumeration.45NEG_INF = float("-inf")67letter_vals = ["g", "a", "m", "e"]8letter_probs = torch.tensor([0.05, 0.45, 0.05, 0.45], dtype=torch.float64)9letter_logits = torch.log(letter_probs)10win_probs = torch.tensor([1.0 / ((i + 1) ** 2) for i in range(len(letter_vals))],11 dtype=torch.float64)1213@pyro.infer.config_enumerate14def model():15 letter = pyro.sample("letter", dist.Categorical(logits=letter_logits))16 p_win = win_probs[letter]17 # observe Bob won: flip(winProb) == True18 pyro.sample("won", dist.Bernoulli(p_win), obs=torch.tensor(1.0, dtype=torch.float64))19 return letter2021marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)22probs = marg["letter"].probs23ANSWER = {letter_vals[i]: float(probs[i]) for i in range(len(letter_vals))}24
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A casino game draws one letter from the ordered set [g, a, m, e] (positions 1 through 4). Consonants g and m each have prior probability 0.05; vowels a and e each have prior probability 0.45. A player at position k wins with probability 1/k². Bob played and won.
A letter is drawn according to its prior probability. Given the drawn letter's position k, the player wins with probability 1/k²; whether the player won is observed.
The posterior distribution over whether the drawn letter is a vowel or a consonant, given that Bob won.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"vowel",
"consonant"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var checkVowel = function(letter) { _.includes(['a', 'e', 'i', 'o', 'u'], letter) };2var letterVals = ['g', 'a', 'm', 'e'];3var letterProbs = map(function(letter) { checkVowel(letter) ? 0.45 : 0.05 }, letterVals);4var letters = Categorical({vs: letterVals, ps: letterProbs});5var ANSWER = (Infer({method: 'enumerate'}, function() {6 var letter = sample(letters);7 var position = letterVals.indexOf(letter) + 1;8 var winProb = 1 / Math.pow(position, 2);9 condition(flip(winProb));10 return checkVowel(letter) ? 'vowel' : 'consonant';11}));12
1letter_vals = ["g", "a", "m", "e"]2vowels = ["a", "e", "i", "o", "u"]345def check_vowel(letter):6 return letter in vowels789letter_probs = torch.tensor([0.45 if check_vowel(l) else 0.05 for l in letter_vals])101112@pyro.infer.config_enumerate13def model():14 letter = pyro.sample("letter", dist.Categorical(letter_probs))15 # position k = index + 1; win probability 1/k^2; condition on a win via log-weight.16 positions = torch.arange(1, len(letter_vals) + 1).double()17 win_probs = 1.0 / positions.pow(2)18 log_win = torch.log(win_probs)19 pyro.factor("won", log_win[letter])20 return letter212223marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)24d = marg["letter"]2526p_vowel = 0.027p_consonant = 0.028for i, l in enumerate(letter_vals):29 p = float(torch.exp(d.log_prob(torch.tensor(i))))30 if check_vowel(l):31 p_vowel += p32 else:33 p_consonant += p3435ANSWER = {"vowel": p_vowel, "consonant": p_consonant}36
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Three programs each produce a random boolean: Program 1: with probability 0.5 flip a coin; if heads flip again with probability 0.7, if tails flip with probability 0.1. Program 2: flip a coin with probability 0.5; use the result to choose a second flip: probability 0.7 if heads, 0.1 if tails. Program 3: a single flip with probability 0.4.
Each program independently generates a boolean by composing fair and biased coin flips in the ways described.
Return a record with three fields — one per program — where each field holds a list of 1000 independent draws from that program's marginal distribution.
answer spec
{
"kind": "record",
"fields": {
"p1": {
"kind": "dist",
"domain": "bool"
},
"p2": {
"kind": "dist",
"domain": "bool"
},
"p3": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (({2 p1: repeat(1000, function() { return flip() ? flip(.7) : flip(.1); }),3 p2: repeat(1000, function() { return flip(flip() ? .7 : .1); }),4 p3: repeat(1000, function() { return flip(.4); })5}));
1# Program 1: flip a fair coin; if heads flip again at 0.7, else flip at 0.1.2def program1(i):3 first = pyro.sample(f"p1_first_{i}", dist.Bernoulli(0.5)) > 04 second = pyro.sample(f"p1_second_{i}", dist.Bernoulli(0.7 if first else 0.1)) > 05 return bool(second)67# Program 2: flip a fair coin, use it to pick the second flip's bias (0.7 / 0.1).8def program2(i):9 first = pyro.sample(f"p2_first_{i}", dist.Bernoulli(0.5)) > 010 second = pyro.sample(f"p2_second_{i}", dist.Bernoulli(0.7 if first else 0.1)) > 011 return bool(second)1213# Program 3: a single flip at 0.4.14def program3(i):15 return bool(pyro.sample(f"p3_{i}", dist.Bernoulli(0.4)) > 0)1617p1_draws = [program1(i) for i in range(1000)]18p2_draws = [program2(i) for i in range(1000)]19p3_draws = [program3(i) for i in range(1000)]2021ANSWER = {"p1": p1_draws, "p2": p2_draws, "p3": p3_draws}22
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0590 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.019, 0.019] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0210 ≤ tol 0.1180 · floors 0.0450/0.0590 |
A fair coin has probability 0.5 of coming up heads; a biased coin has probability 0.8 of heads. The following three expressions all produce true with probability 0.4: (1) with probability 0.5 return the result of a 0.7-probability flip, otherwise a 0.1-probability flip; (2) flip a fair coin and use probability 0.7 if heads, 0.1 if tails; (3) a single flip with probability 0.4.
A boolean is generated by composing one or more coin flips.
Write a new expression — structurally different from the three listed — whose marginal probability of returning true is also 0.4. The expression will be evaluated independently for each seed; the marginal is estimated from the collection of results.
answer spec
{
"kind": "dist",
"domain": "bool",
"protocol": "draws"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (flip() ? false : flip(.8));
1# probmods2-generative-models/ex1.c2# A new expression whose marginal P(true) = 0.4, structurally different from the3# three listed: with prob 0.5 return False, otherwise return a flip(0.8).4# P(true) = 0.5 * 0.8 = 0.4. protocol: draws -> one draw, no inference.56gate = pyro.sample("gate", dist.Bernoulli(0.5))7if gate == 1.0:8 ANSWER = False9else:10 inner = pyro.sample("inner", dist.Bernoulli(0.8))11 ANSWER = bool(inner.item() == 1.0)12
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0050 (tv) |
| solver re-derivation | accept | 1/2 solvers · d=[0.220, 0.030] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0050 ≤ tol 0.0300 · floors 0.0150/0.0050 |
A single fair coin is flipped once per run.
One coin flip is performed, and that flip's boolean result is placed in all three positions of a length-3 list.
One draw from the process: the list of three booleans produced by a single run.
answer spec
{
"kind": "dist",
"domain": "finite",
"protocol": "draws",
"support": [
[
true,
true,
true
],
[
false,
false,
false
]
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var foo = mem(function() { return flip(); });2var ANSWER = ([foo(), foo(), foo()]);3
1# One fair coin flip per run; the single flip's value fills all three positions2# of a length-3 list. protocol 'draws': bind ANSWER to one draw, run no3# inference -- the harness reseeds and aggregates draws across many runs.45flip = bool(pyro.sample("coin", dist.Bernoulli(torch.tensor(0.5))).item())6ANSWER = [flip, flip, flip]7
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0550 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.020, 0.020] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0200 ≤ tol 0.1100 · floors 0.0300/0.0550 |
A function maps each integer argument to an independent fair-coin toss with probability 0.5 of heads, but the outcome for any given argument is fixed once determined — calling the function twice with the same argument always returns the same boolean.
Three boolean values are generated: the first and second by calling the function with the same argument, the third by calling it with a different argument.
The marginal distribution over the resulting list of three booleans. The program is evaluated once per seed; results are pooled across seeds to estimate the distribution.
answer spec
{
"kind": "dist",
"domain": "finite",
"protocol": "draws",
"support": [
[
true,
true,
true
],
[
true,
true,
false
],
[
false,
false,
true
],
[
false,
false,
false
]
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var foo = mem(function(x) { return flip(); });2var ANSWER = ([foo(0), foo(0), foo(1)]);
12# probmods2-generative-models/ex2.c3# foo is a memoized coin: foo(x) ~ Bernoulli(0.5), fixed per argument x.4# Return [foo(0), foo(0), foo(1)] for one execution (draws protocol).56_memo = {}7def foo(x):8 if x not in _memo:9 _memo[x] = bool(pyro.sample(f"foo_{x}", dist.Bernoulli(0.5)).item())10 return _memo[x]1112ANSWER = [foo(0), foo(0), foo(1)]13
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0550 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.050, 0.050] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0450 ≤ tol 0.1700 · floors 0.0850/0.0550 |
A person has allergies independently with probability 0.3. A person has a cold independently with probability 0.2. Allergies and cold are independent of each other.
A person sneezes if they have a cold or have allergies (logical OR). A person has a fever if and only if they have a cold.
The joint distribution over whether the person sneezes and whether they have a fever.
answer spec
{
"kind": "dist",
"domain": "finite",
"labels": {
"record": {
"sneeze": "bool",
"fever": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (Infer({method: "enumerate"}, function() {2 var allergies = flip(0.3);3 var cold = flip(0.2);4 var sneeze = cold || allergies;5 var fever = cold;6 return {sneeze: sneeze, fever: fever};7}));
1@pyro.infer.config_enumerate2def model():3 allergies = pyro.sample("allergies", dist.Bernoulli(0.3)).bool()4 cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()5 sneeze = cold | allergies6 fever = cold7 return sneeze, fever89serving = pyro.infer.infer_discrete(10 pyro.infer.config_enumerate(model), first_available_dim=-111)1213counts = Counter()14N = 2000015for _ in range(N):16 sneeze, fever = serving()17 key = json.dumps({"sneeze": bool(sneeze.item()), "fever": bool(fever.item())}, sort_keys=True)18 counts[key] += 11920ANSWER = {k: v / N for k, v in counts.items()}21
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0033 ≤ tol 0.0181 · floors 0.0091/0.0000 |
Each person independently has allergies with probability 0.3 and a cold with probability 0.2; these are independent, and each person's disease state is consistent throughout a single scenario (allergies and cold are person-level traits, not re-sampled per query).
A person sneezes if they have a cold or have allergies (logical OR). A person has a fever if and only if they have a cold. Bob's symptoms are evaluated using his consistent disease state.
The joint distribution over whether Bob sneezes and whether Bob has a fever.
answer spec
{
"kind": "dist",
"domain": "finite",
"labels": {
"record": {
"sneeze": "bool",
"fever": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (Infer({method: "enumerate"}, function() {2 var allergies = mem(function(person) { return flip(.3); });3 var cold = mem(function(person) { return flip(.2); });4 var sneeze = function(person) { return cold(person) || allergies(person); };5 var fever = function(person) { return cold(person); };6 return {sneeze: sneeze('bob'), fever: fever('bob')};7}));
1# Each person has allergies (p=0.3) and a cold (p=0.2), independent. Bob sneezes2# if cold OR allergies; has a fever iff cold. No conditioning -- the queried3# joint distribution is the prior over (sneeze, fever). The two binary latents are4# enumerated and the joint over the derived (sneeze, fever) outcome is read off a5# single enumerated outcome site by exact marginalization.67# Encode the joint outcome (sneeze, fever) as one categorical latent so that8# exact enumeration over (allergies, cold) yields its marginal directly. Outcome9# index = sneeze*2 + fever, with outcomes ordered below.10outcomes = [(False, False), (False, True), (True, False), (True, True)]111213@pyro.infer.config_enumerate14def model():15 allergies = pyro.sample("allergies", dist.Bernoulli(0.3))16 cold = pyro.sample("cold", dist.Bernoulli(0.2))17 # Tensor-valued under enumeration: keep everything in torch so the derived18 # (sneeze, fever) pair is computed per enumeration cell.19 sneeze = ((cold > 0) | (allergies > 0)).long() # 0/1, broadcasts20 fever = (cold > 0).long()21 out_idx = sneeze * 2 + fever # in {0,1,2,3}22 # One-hot logits over the 4 outcomes selecting the derived outcome per cell.23 labels = torch.arange(4)24 onehot = torch.where(25 out_idx.unsqueeze(-1) == labels,26 torch.tensor(0.0),27 torch.tensor(float("-inf")),28 )29 return pyro.sample("outcome", dist.Categorical(logits=onehot))303132marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(33 model, lambda: None34)35out_marg = marg["outcome"]36# Record-labeled finite outcomes: each key is the JSON object {sneeze, fever};37# the harness parses these keys back into the labeled record.38ANSWER = {39 json.dumps({"sneeze": outcomes[i][0], "fever": outcomes[i][1]}):40 torch.exp(out_marg.log_prob(torch.tensor(i))).item()41 for i in range(len(outcomes))42}43
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
A fair coin has weight 0.5 (equal probability of heads or tails). A bent coin is derived from a fair coin as follows: if the fair coin shows heads, the bent coin flips a new coin with weight 0.7; if the fair coin shows tails, the bent coin flips a new coin with weight 0.1. Inference uses forward sampling with 10000 samples.
Each toss of the bent coin draws a result from the fair coin; based on that result it draws a second coin with a higher or lower bias and returns that second coin's result.
The marginal distribution over outcomes of a single toss of the bent coin, estimated by forward sampling with 10000 samples. Represent the outcome as the string 'h' for heads and 't' for tails.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"h",
"t"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var makeCoin = function(weight) {2 return function() {3 return flip(weight) ? 'h' : 't';4 };5};6var bend = function(coin) {7 return function() {8 return coin() == 'h' ? makeCoin(.7)() : makeCoin(.1)();9 };10};1112var fairCoin = makeCoin(.5);13var bentCoin = bend(fairCoin);14var ANSWER = (Infer({method: 'forward', samples: 10000}, bentCoin));
1# Bent coin: a fair coin (0.5) selects a second coin (0.7 if heads, 0.1 if tails);2# return the second coin's result. Forward sampling, 10000 samples, over {h,t}.34def bent_coin():5 fair = pyro.sample("fair", dist.Bernoulli(0.5))6 weight = 0.7 if bool(fair.item()) else 0.17 second = pyro.sample("second", dist.Bernoulli(weight))8 return "h" if bool(second.item()) else "t"910num_samples = 1000011outcomes = [bent_coin() for _ in range(num_samples)]12counts = Counter(outcomes)13ANSWER = {14 "h": counts["h"] / num_samples,15 "t": counts["t"] / num_samples,16}17
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0085 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.003, 0.003] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0040 ≤ tol 0.0254 · floors 0.0127/0.0085 |
A fair coin is flipped at each step (probability 0.5 heads).
A non-negative integer is generated recursively: with probability 0.5 the value is 0; otherwise the value is 1 plus an independent draw from the same process. This defines a geometric distribution on the non-negative integers with success probability 0.5.
The empirical distribution over the recursively generated integer, estimated by 10000 independent forward samples.
answer spec
{
"kind": "dist",
"domain": "int",
"protocol": "object"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var geometric = function() {2 return flip() ? 0 : 1 + geometric();3};4var ANSWER = (Infer({method: "forward", samples:10000}, geometric));
1# Recursive geometric process: each step a fair coin decides 0 vs 1 + recurse.2def geometric(trial):3 if bool(pyro.sample(f"f{trial}_0", dist.Bernoulli(0.5))):4 return 05 n = 16 i = 17 while True:8 if bool(pyro.sample(f"f{trial}_{i}", dist.Bernoulli(0.5))):9 return n10 n += 111 i += 1121314# Empirical distribution from 10000 independent forward samples.15counts = Counter()16for s in range(10000):17 counts[geometric(s)] += 11819total = sum(counts.values())20ANSWER = {k: counts[k] / total for k in counts}21
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0441 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.026, 0.026] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0277 ≤ tol 0.0882 · floors 0.0371/0.0441 |
The joint distribution over two Boolean random variables A and B is given by the following table: | A | B | P(A,B) | |---|---|--------| | F | F | 0.14 | | F | T | 0.06 | | T | F | 0.40 | | T | T | 0.40 |
A and B are jointly distributed according to the table above. One natural factorization fixes the marginal of A first, then draws B from the conditional distribution of B given A.
A single draw from the joint distribution of (A, B). The program returns one pair per run; collect multiple seeded runs to form the empirical joint distribution. Represent the draw as a two-element list: the outcome of A first, then the outcome of B (booleans).
answer spec
{
"kind": "dist",
"domain": "finite",
"protocol": "draws",
"support": [
[
true,
true
],
[
true,
false
],
[
false,
true
],
[
false,
false
]
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var a = flip(0.8);2var b = flip(a ? 0.5 : 0.3);3var ANSWER = ([a, b]);
1# One draw from the joint of (A, B). Factorization: P(A) then P(B | A).2# Table: P(A=T)=0.40+0.40=0.80; P(B=T|A=T)=0.40/0.80=0.5; P(B=T|A=F)=0.06/0.20=0.3.3a = pyro.sample("a", dist.Bernoulli(0.8)).item() > 04p_b = 0.5 if a else 0.35b = pyro.sample("b", dist.Bernoulli(p_b)).item() > 06ANSWER = [bool(a), bool(b)]7
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0600 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.035, 0.035] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0350 ≤ tol 0.1800 · floors 0.0450/0.0600 |
Two Boolean random variables A and B have the following distribution: P(A=true) = 0.8; P(B=true | A=true) = 0.5; P(B=true | A=false) = 0.3.
A is drawn from its marginal; B is then drawn conditionally on A.
The full joint distribution over (A, B), estimated by forward sampling with 10000 samples. Represent each outcome pair as a two-element list: the outcome of A first, then the outcome of B (booleans).
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
[
true,
true
],
[
true,
false
],
[
false,
true
],
[
false,
false
]
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var ANSWER = (Infer({method: "forward", samples: 10000}, function() {2 var a = flip(0.8);3 var b = flip(a ? 0.5 : 0.3);4 return [a, b];5}));
1# probmods2-generative-models/ex7.b2# A ~ Bernoulli(0.8); B | A ~ Bernoulli(0.5 if A else 0.3).3# Full joint over (A, B) estimated by forward sampling (prior) with 10000 draws.45N = 1000067def model():8 with pyro.plate("draws", N):9 a = pyro.sample("a", dist.Bernoulli(0.8))10 p_b = torch.where(a == 1.0, torch.tensor(0.5), torch.tensor(0.3))11 b = pyro.sample("b", dist.Bernoulli(p_b))12 return a, b1314a, b = model()15a_bool = a.bool()16b_bool = b.bool()1718support = [(True, True), (True, False), (False, True), (False, False)]19counts = {pair: 0 for pair in support}20for i in range(N):21 pair = (bool(a_bool[i].item()), bool(b_bool[i].item()))22 counts[pair] += 12324ANSWER = {pair: counts[pair] / N for pair in support}25
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0179 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.007, 0.007] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0082 ≤ tol 0.0358 · floors 0.0116/0.0179 |
There are five colors: black, blue, green, orange, red. In the `observed` model, the Dirichlet concentration vector is all-ones (length 5), and the observed data are three draws from bag1: blue, blue, black (in that order). In the `usealpha` model, the Dirichlet concentration vector for each bag is [2, 3, 1, 1, 1] in the order (black, blue, green, orange, red), with no observed data.
Each bag's color distribution is drawn independently from a Dirichlet prior parameterized by a concentration vector. Draws from a bag are conditionally independent given that bag's color distribution. Each model is run with MCMC using 20000 samples.
Return a record with two fields: `observed` — the posterior predictive distribution over a single color draw from bag1 under the all-ones-prior model conditioned on the three observed draws; `usealpha` — the prior predictive distribution over a single color draw from bag1 under the [2,3,1,1,1]-concentration model with no observations.
answer spec
{
"kind": "record",
"fields": {
"observed": {
"kind": "dist",
"domain": "finite",
"support": [
"black",
"blue",
"green",
"orange",
"red"
]
},
"usealpha": {
"kind": "dist",
"domain": "finite",
"support": [
"black",
"blue",
"green",
"orange",
"red"
]
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var colors = ['black', 'blue', 'green', 'orange', 'red'];2var observedData = [{bag: 'bag1', draw: 'blue'},3 {bag: 'bag1', draw: 'blue'},4 {bag: 'bag1', draw: 'black'}];56var observed = Infer({method: 'MCMC', samples: 20000}, function() {7 var makeBag = mem(function(bag) {8 var colorProbs = dirichlet(ones([colors.length, 1]));9 return Categorical({vs: colors, ps: colorProbs});10 });11 var obsFn = function(datum) { observe(makeBag(datum.bag), datum.draw); };12 mapData({data: observedData}, obsFn);13 return sample(makeBag('bag1'));14});1516var usealpha = Infer({method: 'MCMC', samples: 20000}, function () {17 var makeBag = mem(function(bag) {18 var colorProbs = dirichlet(Vector([2, 3, 1, 1, 1]));19 return Categorical({vs: colors, ps: colorProbs});20 });21 return sample(makeBag('bag1'));22});23var ANSWER = (({observed: observed, usealpha: usealpha}));24
1colors = ['black', 'blue', 'green', 'orange', 'red']23# observed: all-ones Dirichlet prior on bag1's color distribution, conditioned on4# three draws (blue, blue, black). The model draws colorProbs ~ Dirichlet, observes5# the three categorical draws, and returns a fresh draw -> posterior predictive.6# This is a Dirichlet-Categorical posterior, so we sample the latent colorProbs via7# MCMC over the model and average the predictive categorical.89def observed_model():10 alpha = torch.ones(5)11 colorProbs = pyro.sample('colorProbs', dist.Dirichlet(alpha))12 counts = {'blue': 2, 'black': 1}13 for c, n in counts.items():14 idx = colors.index(c)15 with pyro.plate('obs_' + c, n):16 pyro.sample('d_' + c, dist.Categorical(colorProbs),17 obs=torch.full((n,), idx, dtype=torch.long))18 return colorProbs192021mcmc_obs = pyro.infer.MCMC(pyro.infer.NUTS(observed_model), num_samples=800, warmup_steps=400)22mcmc_obs.run()23probs_obs = mcmc_obs.get_samples()['colorProbs'].mean(0)24observed = {c: float(probs_obs[i]) for i, c in enumerate(colors)}2526# usealpha: prior predictive under Dirichlet([2,3,1,1,1]) (black,blue,green,orange,red),27# no observations. Express the model through pyro.sample: draw colorProbs from the28# Dirichlet prior and a predictive color draw from Categorical(colorProbs), then run29# it under Importance (no conditioning -> forward sampling) and read the predictive30# site's EmpiricalMarginal.31alpha2 = torch.tensor([2.0, 3.0, 1.0, 1.0, 1.0])3233def usealpha_model():34 colorProbs = pyro.sample('colorProbs_ua', dist.Dirichlet(alpha2))35 draw = pyro.sample('draw_ua', dist.Categorical(colorProbs))36 return draw3738posterior_ua = pyro.infer.Importance(usealpha_model, num_samples=20000).run()39marg_ua = pyro.infer.EmpiricalMarginal(posterior_ua, sites='draw_ua')40ua_samps = torch.stack([marg_ua.sample() for _ in range(20000)])41ua_counts = torch.zeros(5)42for i in range(5):43 ua_counts[i] = (ua_samps == i).sum()44probs_ua = ua_counts / ua_counts.sum()45usealpha = {c: float(probs_ua[i]) for i, c in enumerate(colors)}4647ANSWER = {'observed': observed, 'usealpha': usealpha}48
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0274 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.021, 0.033] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0112 ≤ tol 0.0548 · floors 0.0211/0.0274 |
Each apple in a barrel is independently rotten with probability p, where p is drawn from Beta(a=0.1, b=0.2).
Each barrel has its own rottenness probability drawn once from a Beta(0.1, 0.2) prior. Given that probability, each apple in the barrel is independently rotten or fresh.
The marginal distribution over the total number of rotten apples in a barrel of 10 apples, integrating over the barrel's rottenness probability. Use forward sampling.
answer spec
{
"kind": "dist",
"domain": "int"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var makeBarrel = mem(function(barrelName) {2 var pRotten = beta({a: .1, b: .2});3 var barrel = function(n) {4 return repeat(n, function() { flip(pRotten) });5 };6 return barrel;7});8var ANSWER = (Infer({method: 'forward'}, function() {9 var barrel = makeBarrel('barrel');10 return Math.sum(barrel(10));11}));
1def model():2 p_rotten = pyro.sample("pRotten", dist.Beta(0.1, 0.2))3 apples = pyro.sample("apples", dist.Bernoulli(p_rotten.expand([10])).to_event(1))4 total = apples.sum()5 return pyro.deterministic("total", total)67# Forward sampling (no conditioning): draw from the prior and aggregate.8post = pyro.infer.Importance(model, num_samples=4000).run()9ANSWER = pyro.infer.EmpiricalMarginal(post, "total")10
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.2100 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.194, 0.194] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.1792 ≤ tol 0.4200 · floors 0.1517/0.2100 |
Each store independently draws its type from a 50/50 mixture: fresh stores use Beta(a=0.1, b=0.3) for apple rottenness probability; rotten stores use Beta(a=0.3, b=0.1). All barrels from the same store share that store's type. Each barrel within a store draws its own rottenness probability from the store's Beta. Each apple in a barrel is independently rotten or fresh with the barrel's rottenness probability.
A two-level hierarchy: a store's type is drawn once from a 50/50 prior, determining the Beta distribution from which each of the store's barrels draws its rottenness probability. Given the barrel's probability, each apple is independently rotten or fresh.
Two distributions over the absolute difference in rotten count between two barrels of 10 apples each, estimated with 10000 forward samples each: one where both barrels come from the same store, and one where the barrels come from two different stores.
answer spec
{
"kind": "record",
"fields": {
"sameStore": {
"kind": "dist",
"domain": "int"
},
"differentStore": {
"kind": "dist",
"domain": "int"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var makeStore = mem(function(storeName) {2 var storePrior = flip() ? {a: .1, b: .3} : {a: .3, b: .1};3 var makeBarrel = mem(function(barrelName) {4 var pRotten = beta(storePrior);5 var barrel = function(n) {6 return repeat(n, function() { flip(pRotten) });7 };8 return barrel;9 });10 return makeBarrel;11});12var ANSWER = (({13 sameStore: Infer({method: 'forward', samples: 10000}, function() {14 var S = makeStore('S');15 var B1 = S('B1');16 var B2 = S('B2');17 return Math.abs(Math.sum(B1(10)) - Math.sum(B2(10)));18 }),19 differentStore: Infer({method: 'forward', samples: 10000}, function() {20 var S1 = makeStore('S1');21 var S2 = makeStore('S2');22 var B1 = S1('B1');23 var B2 = S2('B2');24 return Math.abs(Math.sum(B1(10)) - Math.sum(B2(10)));25 })26}));
12# Two store types, 50/50: fresh -> Beta(0.1,0.3); rotten -> Beta(0.3,0.1).3# All barrels in a store share the store type; each barrel draws its own4# pRotten from the store's Beta; each of 10 apples is Bernoulli(pRotten).5# Query: forward-sampled distribution of |sum(B1) - sum(B2)|, 10000 samples,6# (a) both barrels from the SAME store, (b) barrels from DIFFERENT stores.7# No conditioning -> pure forward simulation of the generative model.89n_samples = 100001011def barrel_count(name, a, b):12 # one barrel: draw pRotten ~ Beta(a,b), then 10 apples ~ Bernoulli(pRotten)13 p_rotten = pyro.sample(f"{name}_p", dist.Beta(torch.tensor(a), torch.tensor(b)))14 apples = pyro.sample(f"{name}_apples",15 dist.Bernoulli(p_rotten).expand([10]).to_event(1))16 return int(apples.sum().item())1718def same_store_model():19 # both barrels share one store's type20 fresh = pyro.sample("store_fresh", dist.Bernoulli(0.5))21 a, b = (0.1, 0.3) if fresh.item() > 0.5 else (0.3, 0.1)22 c1 = barrel_count("B1", a, b)23 c2 = barrel_count("B2", a, b)24 return abs(c1 - c2)2526def different_store_model():27 fresh1 = pyro.sample("store1_fresh", dist.Bernoulli(0.5))28 a1, b1 = (0.1, 0.3) if fresh1.item() > 0.5 else (0.3, 0.1)29 fresh2 = pyro.sample("store2_fresh", dist.Bernoulli(0.5))30 a2, b2 = (0.1, 0.3) if fresh2.item() > 0.5 else (0.3, 0.1)31 c1 = barrel_count("B1", a1, b1)32 c2 = barrel_count("B2", a2, b2)33 return abs(c1 - c2)3435def forward_dist(model_fn):36 counts = Counter()37 for _ in range(n_samples):38 val = model_fn()39 counts[int(val)] += 140 total = sum(counts.values())41 return {k: v / total for k, v in counts.items()}4243ANSWER = {44 "sameStore": forward_dist(same_store_model),45 "differentStore": forward_dist(different_store_model),46}47
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0705 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.033, 0.084] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0402 ≤ tol 0.2920 · floors 0.1460/0.0705 |
A three-level hierarchy of cities, stores, and barrels. Each city has a probability p_city drawn from Beta(a=0.25, b=0.25) that a store is the fresh type. A fresh-type store uses Beta(a=0.1, b=0.3) for apple rottenness probability; a rotten-type store uses Beta(a=0.3, b=0.1). Each barrel within a store draws its own rottenness probability from the store's Beta. Each apple in a barrel is independently rotten or fresh with the barrel's rottenness probability.
A three-level Bayesian hierarchy: city-level type probability determines the store's type, which determines the distribution from which each barrel's rottenness probability is drawn, which determines whether each apple is rotten.
The marginal distribution over the total number of rotten apples in a 20-apple barrel drawn from one store in one city, integrating over all three levels of the hierarchy. Use forward sampling.
answer spec
{
"kind": "dist",
"domain": "int"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var makeCity = mem(function(cityName){2 var cityPrior = beta({a: .25, b: .25});3 var makeStore = mem(function(storeName) {4 var storePrior = flip(cityPrior) ? {a: .1, b: .3} : {a: .3, b: .1};5 var makeBarrel = mem(function(barrelName) {6 var pRotten = beta(storePrior);7 var barrel = function(n) {8 return repeat(n, function() { flip(pRotten) });9 };10 return barrel;11 });12 return makeBarrel;13 });14 return makeStore;15});1617var ANSWER = (Infer({method: 'forward'}, function(){18 var C1 = makeCity("C1");19 var S1 = C1("S1");20 var B1 = S1("B1");21 return Math.sum(B1(20));22}));
1def model():2 city_prior = pyro.sample("cityPrior", dist.Beta(0.25, 0.25))3 is_fresh = pyro.sample("storeType", dist.Bernoulli(city_prior))4 a = torch.where(is_fresh.bool(), torch.tensor(0.1), torch.tensor(0.3))5 b = torch.where(is_fresh.bool(), torch.tensor(0.3), torch.tensor(0.1))6 p_rotten = pyro.sample("pRotten", dist.Beta(a, b))7 apples = pyro.sample("apples", dist.Bernoulli(p_rotten.expand([20])).to_event(1))8 total = apples.sum()9 return pyro.deterministic("total", total)1011# Forward sampling over all three hierarchy levels (no conditioning).12post = pyro.infer.Importance(model, num_samples=4000).run()13ANSWER = pyro.infer.EmpiricalMarginal(post, "total")14
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 1.4000 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.401, 0.431] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.4522 ≤ tol 2.8000 · floors 0.4063/1.4000 |
A three-level hierarchy as in the previous exercise: each city has p_city drawn from Beta(a=0.25, b=0.25); stores within a city are fresh type (Beta(a=0.1, b=0.3)) with probability p_city, else rotten type (Beta(a=0.3, b=0.1)); each barrel in a store draws its own rottenness probability from the store's Beta; apples are independently rotten given the barrel's probability. You observe a first barrel of 10 apples from one store in a city, and 7 of those apples are rotten.
Condition the three-level hierarchy on the observation. Infer the posterior over the number of rotten apples in a 10-apple barrel from a different store in the same city, given the observation from the first store's barrel.
The posterior distribution over the number of rotten apples in a 10-apple barrel from the second store, conditioned on 7 of 10 apples being rotten in the first store's barrel. Use MCMC with 5000 samples and a lag of 100.
answer spec
{
"kind": "dist",
"domain": "int"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var makeCity = mem(function(cityName){2 var cityPrior = beta({a: .25, b: .25});34 var makeStore = mem(function(storeName) {5 var storePrior = flip(cityPrior) ? {a: .1, b: .3} : {a: .3, b: .1};67 var makeBarrel = mem(function(barrelName) {8 var pRotten = beta(storePrior);9 var barrel = function(n) {10 return repeat(n, function() { flip(pRotten) });11 };12 return barrel;13 });1415 return makeBarrel;16 });1718 return makeStore;19});20var ANSWER = (Infer({method: 'MCMC', samples:5000, lag: 100}, function(){21 var C = makeCity("C");22 var S1 = C("S1");23 var B1 = S1("B1");24 var S2 = C("S2");25 var B2 = S2("B2");2627 condition(Math.sum(B1(10)) == 7);2829 return Math.sum(B2(10));30}));
12# Mixed discrete (store-type flips) + continuous (cityPrior, pRotten). The discrete3# flips are concrete samples under Importance (no enumeration needed); the obs1=74# Binomial likelihood is not extreme, so Importance over the prior recovers the5# posterior. Query = posterior-predictive over the 2nd barrel's rotten count.6def store_params(flip):7 return (0.1, 0.3) if bool(flip.item()) else (0.3, 0.1) # fresh : rotten8def model():9 cityPrior = pyro.sample("cityPrior", dist.Beta(0.25, 0.25))10 f1 = pyro.sample("f1", dist.Bernoulli(cityPrior))11 a1, b1 = store_params(f1)12 pRotten1 = pyro.sample("pRotten1", dist.Beta(a1, b1))13 pyro.sample("obs1", dist.Binomial(10, pRotten1), obs=torch.tensor(7.0))14 f2 = pyro.sample("f2", dist.Bernoulli(cityPrior))15 a2, b2 = store_params(f2)16 pRotten2 = pyro.sample("pRotten2", dist.Beta(a2, b2))17 return pyro.sample("B2", dist.Binomial(10, pRotten2))18post = pyro.infer.Importance(model, num_samples=40000).run()19ANSWER = pyro.infer.EmpiricalMarginal(post)20
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 1.1702 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.306, 0.364] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.3331 ≤ tol 2.3404 · floors 0.1421/1.1702 |
Reading-time data: 24 observations across 6 words in two groups. Vowel-initial words: abacus (ids 1,2,3 with rts 210,212,209), aardvark (ids 1,2,3 with rts 200,201,198), ellipse (ids 1,2,3 with rts 220,222,219). Consonant-initial words: proton (ids 1,2,3 with rts 190,191,189), folder (ids 1,2,3 with rts 180,182,178), fedora (three replicates: ids 1,2,3 with rts 230,231,228; then 231,233,230; then 230,232,228). Each group's mean reading time has a Gaussian(200, 100) prior. Each word's mean reading time is drawn from a Gaussian centered at its group's mean with standard deviation 20. Each observed reading time is drawn from a Gaussian centered at the word's mean with standard deviation 10.
A two-level hierarchical Gaussian model: group-level mean reading times are latent, word-level means are drawn from the group mean, and observed reading times are drawn from the word mean.
The posterior distribution over the difference in group mean reading time (vowel minus consonant), given all observations. Use MCMC with 5000 samples, a burn-in of 10000, and a lag of 5.
answer spec
{
"kind": "dist",
"domain": "real"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var data = [{group: "vowel", word: "abacus", id: 1, rt: 210},2 {group: "vowel", word: "abacus", id: 2, rt: 212},3 {group: "vowel", word: "abacus", id: 3, rt: 209},4 {group: "vowel", word: "aardvark", id: 1, rt: 200},5 {group: "vowel", word: "aardvark", id: 2, rt: 201},6 {group: "vowel", word: "aardvark", id: 3, rt: 198},7 {group: "vowel", word: "ellipse", id: 1, rt: 220},8 {group: "vowel", word: "ellipse", id: 2, rt: 222},9 {group: "vowel", word: "ellipse", id: 3, rt: 219},10 {group: "consonant", word: "proton", id: 1, rt: 190},11 {group: "consonant", word: "proton", id: 2, rt: 191},12 {group: "consonant", word: "proton", id: 3, rt: 189},13 {group: "consonant", word: "folder", id: 1, rt: 180},14 {group: "consonant", word: "folder", id: 2, rt: 182},15 {group: "consonant", word: "folder", id: 3, rt: 178},16 {group: "consonant", word: "fedora", id: 1, rt: 230},17 {group: "consonant", word: "fedora", id: 2, rt: 231},18 {group: "consonant", word: "fedora", id: 3, rt: 228},19 {group: "consonant", word: "fedora", id: 1, rt: 231},20 {group: "consonant", word: "fedora", id: 2, rt: 233},21 {group: "consonant", word: "fedora", id: 3, rt: 230},22 {group: "consonant", word: "fedora", id: 1, rt: 230},23 {group: "consonant", word: "fedora", id: 2, rt: 232},24 {group: "consonant", word: "fedora", id: 3, rt: 228}];2526var opts = {method: "MCMC", burn: 10000, lag: 5, samples: 5000};27var ANSWER = (Infer(opts, function() {28 var groupMeans = {vowel: gaussian(200, 100),29 consonant: gaussian(200, 100)};3031 var wordMean = mem(function(word, group) {32 return gaussian(groupMeans[group], 20);33 });3435 var obsFn = function(d) {36 observe(Gaussian({mu: wordMean(d.word, d.group),37 sigma: 10}), d.rt);38 };3940 mapData({data: data}, obsFn);4142 return groupMeans['vowel'] - groupMeans['consonant'];43}));
1# Two-level hierarchical Gaussian. Group means ~ Normal(200,100); word means ~2# Normal(groupMean, 20); observed rt ~ Normal(wordMean, 10). All latents continuous,3# so NUTS. Query: posterior over (vowel groupMean - consonant groupMean).45data = [6 ("vowel", "abacus", 210.0), ("vowel", "abacus", 212.0), ("vowel", "abacus", 209.0),7 ("vowel", "aardvark", 200.0), ("vowel", "aardvark", 201.0), ("vowel", "aardvark", 198.0),8 ("vowel", "ellipse", 220.0), ("vowel", "ellipse", 222.0), ("vowel", "ellipse", 219.0),9 ("consonant", "proton", 190.0), ("consonant", "proton", 191.0), ("consonant", "proton", 189.0),10 ("consonant", "folder", 180.0), ("consonant", "folder", 182.0), ("consonant", "folder", 178.0),11 ("consonant", "fedora", 230.0), ("consonant", "fedora", 231.0), ("consonant", "fedora", 228.0),12 ("consonant", "fedora", 231.0), ("consonant", "fedora", 233.0), ("consonant", "fedora", 230.0),13 ("consonant", "fedora", 230.0), ("consonant", "fedora", 232.0), ("consonant", "fedora", 228.0),14]1516groups = ["vowel", "consonant"]17words = sorted({(g, w) for (g, w, _) in data})1819def model():20 groupMeans = {g: pyro.sample(f"group_{g}", dist.Normal(200.0, 100.0)) for g in groups}21 wordMeans = {}22 for (g, w) in words:23 wordMeans[(g, w)] = pyro.sample(f"word_{g}_{w}", dist.Normal(groupMeans[g], 20.0))24 for i, (g, w, rt) in enumerate(data):25 pyro.sample(f"obs_{i}", dist.Normal(wordMeans[(g, w)], 10.0), obs=torch.tensor(rt))2627kernel = pyro.infer.NUTS(model)28mcmc = pyro.infer.MCMC(kernel, num_samples=2000, warmup_steps=1000)29mcmc.run()30s = mcmc.get_samples()31ANSWER = (s["group_vowel"] - s["group_consonant"]).tolist()32
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 2.4346 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[1.500, 1.500] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=1.1237 ≤ tol 4.8692 · floors 0.9237/2.4346 |
Reading-time experiment with 24 observations across two word-onset groups. Each observation has fields: group ("vowel" or "consonant"), word, id (participant 1, 2, or 3), and rt (reading time in ms). The 24 rows are: {group: "vowel", word: "abacus", id: 1, rt: 210}, {group: "vowel", word: "abacus", id: 2, rt: 212}, {group: "vowel", word: "abacus", id: 3, rt: 209}, {group: "vowel", word: "aardvark", id: 1, rt: 200}, {group: "vowel", word: "aardvark", id: 2, rt: 201}, {group: "vowel", word: "aardvark", id: 3, rt: 198}, {group: "vowel", word: "ellipse", id: 1, rt: 220}, {group: "vowel", word: "ellipse", id: 2, rt: 222}, {group: "vowel", word: "ellipse", id: 3, rt: 219}, {group: "consonant", word: "proton", id: 1, rt: 190}, {group: "consonant", word: "proton", id: 2, rt: 191}, {group: "consonant", word: "proton", id: 3, rt: 189}, {group: "consonant", word: "folder", id: 1, rt: 180}, {group: "consonant", word: "folder", id: 2, rt: 182}, {group: "consonant", word: "folder", id: 3, rt: 178}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 231}, {group: "consonant", word: "fedora", id: 3, rt: 228}, {group: "consonant", word: "fedora", id: 1, rt: 231}, {group: "consonant", word: "fedora", id: 2, rt: 233}, {group: "consonant", word: "fedora", id: 3, rt: 230}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 232}, {group: "consonant", word: "fedora", id: 3, rt: 228}. Priors: each group's mean reading time has a Gaussian(200, 100) prior. Each word's mean is drawn from a Gaussian centered at the group mean with sd=20. Each participant has an additive offset drawn from Gaussian(0, 2). Observed reading times are drawn from a Gaussian centered at the word mean plus participant offset with sd=10.
A three-level hierarchical Gaussian model: group-level means, word-level means centered on the group mean, and participant-level additive offsets. Each observed reading time is drawn from a Gaussian at the sum of the word mean and participant offset.
The posterior marginal distribution over the group mean difference (vowel minus consonant), integrating over word means and participant offsets. Use MCMC with 5000 samples, a burn-in of 10000, and a lag of 5.
answer spec
{
"kind": "dist",
"domain": "real"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var data = [{group: "vowel", word: "abacus", id: 1, rt: 210},2 {group: "vowel", word: "abacus", id: 2, rt: 212},3 {group: "vowel", word: "abacus", id: 3, rt: 209},4 {group: "vowel", word: "aardvark", id: 1, rt: 200},5 {group: "vowel", word: "aardvark", id: 2, rt: 201},6 {group: "vowel", word: "aardvark", id: 3, rt: 198},7 {group: "vowel", word: "ellipse", id: 1, rt: 220},8 {group: "vowel", word: "ellipse", id: 2, rt: 222},9 {group: "vowel", word: "ellipse", id: 3, rt: 219},10 {group: "consonant", word: "proton", id: 1, rt: 190},11 {group: "consonant", word: "proton", id: 2, rt: 191},12 {group: "consonant", word: "proton", id: 3, rt: 189},13 {group: "consonant", word: "folder", id: 1, rt: 180},14 {group: "consonant", word: "folder", id: 2, rt: 182},15 {group: "consonant", word: "folder", id: 3, rt: 178},16 {group: "consonant", word: "fedora", id: 1, rt: 230},17 {group: "consonant", word: "fedora", id: 2, rt: 231},18 {group: "consonant", word: "fedora", id: 3, rt: 228},19 {group: "consonant", word: "fedora", id: 1, rt: 231},20 {group: "consonant", word: "fedora", id: 2, rt: 233},21 {group: "consonant", word: "fedora", id: 3, rt: 230},22 {group: "consonant", word: "fedora", id: 1, rt: 230},23 {group: "consonant", word: "fedora", id: 2, rt: 232},24 {group: "consonant", word: "fedora", id: 3, rt: 228}];2526var opts = {method: "MCMC", burn: 10000, lag: 5, samples: 5000};27var joint = Infer(opts, function() {28 var groupMeans = {vowel: gaussian(200, 100),29 consonant: gaussian(200, 100)};3031 var participantMean = mem(function(pid) {32 return gaussian(0, 2);33 });3435 var wordMean = mem(function(word, group) {36 return gaussian(groupMeans[group], 20);37 });3839 var obsFn = function(d) {40 observe(Gaussian({mu: wordMean(d.word, d.group) + participantMean(d.id),41 sigma: 10}), d.rt);42 };4344 mapData({data: data}, obsFn);4546 return {diff: groupMeans['vowel'] - groupMeans['consonant'],47 p1: participantMean(1),48 p2: participantMean(2),49 p3: participantMean(3)};50});51var ANSWER = marginalize(joint, function(x) { return x.diff; });
1# Three-level hierarchical Gaussian: group means ~ Normal(200,100); word means ~2# Normal(groupMean, 20); participant offsets ~ Normal(0, 2); observed rt ~3# Normal(wordMean + participantOffset, 10). All latents continuous -> NUTS.4# Query: posterior marginal over (vowel groupMean - consonant groupMean).56data = [7 ("vowel", "abacus", 1, 210.0), ("vowel", "abacus", 2, 212.0), ("vowel", "abacus", 3, 209.0),8 ("vowel", "aardvark", 1, 200.0), ("vowel", "aardvark", 2, 201.0), ("vowel", "aardvark", 3, 198.0),9 ("vowel", "ellipse", 1, 220.0), ("vowel", "ellipse", 2, 222.0), ("vowel", "ellipse", 3, 219.0),10 ("consonant", "proton", 1, 190.0), ("consonant", "proton", 2, 191.0), ("consonant", "proton", 3, 189.0),11 ("consonant", "folder", 1, 180.0), ("consonant", "folder", 2, 182.0), ("consonant", "folder", 3, 178.0),12 ("consonant", "fedora", 1, 230.0), ("consonant", "fedora", 2, 231.0), ("consonant", "fedora", 3, 228.0),13 ("consonant", "fedora", 1, 231.0), ("consonant", "fedora", 2, 233.0), ("consonant", "fedora", 3, 230.0),14 ("consonant", "fedora", 1, 230.0), ("consonant", "fedora", 2, 232.0), ("consonant", "fedora", 3, 228.0),15]1617groups = ["vowel", "consonant"]18words = sorted({(g, w) for (g, w, _, _) in data})19pids = sorted({p for (_, _, p, _) in data})2021def model():22 groupMeans = {g: pyro.sample(f"group_{g}", dist.Normal(200.0, 100.0)) for g in groups}23 participant = {p: pyro.sample(f"part_{p}", dist.Normal(0.0, 2.0)) for p in pids}24 wordMeans = {(g, w): pyro.sample(f"word_{g}_{w}", dist.Normal(groupMeans[g], 20.0))25 for (g, w) in words}26 for i, (g, w, p, rt) in enumerate(data):27 mu = wordMeans[(g, w)] + participant[p]28 pyro.sample(f"obs_{i}", dist.Normal(mu, 10.0), obs=torch.tensor(rt))2930kernel = pyro.infer.NUTS(model)31mcmc = pyro.infer.MCMC(kernel, num_samples=2000, warmup_steps=1000)32mcmc.run()33s = mcmc.get_samples()34ANSWER = (s["group_vowel"] - s["group_consonant"]).tolist()35
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 3.7376 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[2.061, 2.061] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=1.4079 ≤ tol 7.4751 · floors 0.7845/3.7376 |
Heart-shaped implicit curve: a point (x, y) lies on the curve if |x^2 + (y - x^(2/3))^2 - 1| < 0.01. Priors: x ~ Gaussian(0, 1) and y ~ Gaussian(0.3, 1.3), where the means and standard deviations are the midpoints and half-widths of the bounding boxes [-1, 1] for x and [-1, 1.6] for y.
Draw x and y independently from their respective Gaussian priors and condition on the point lying on the heart curve.
The marginal posterior distribution over x, obtained by running MCMC with 10000 samples and lag 10 on the joint model and then marginalizing to x.
answer spec
{
"kind": "dist",
"domain": "real"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var onCurve = function(x, y) {2 var x2 = x*x;3 var term1 = y - Math.pow(x2, 1/3);4 var crossSection = x2 + term1*term1 - 1;5 return Math.abs(crossSection) < 0.01;6};7var xbounds = [-1, 1];8var ybounds = [-1, 1.6];910var xmu = 0.5 * (xbounds[0] + xbounds[1]);11var ymu = 0.5 * (ybounds[0] + ybounds[1]);12var xsigma = 0.5 * (xbounds[1] - xbounds[0]);13var ysigma = 0.5 * (ybounds[1] - ybounds[0]);1415var model = function() {16 var x = gaussian(xmu, xsigma);17 var y = gaussian(ymu, ysigma);18 condition(onCurve(x, y));19 return {x: x, y: y};20};21var posterior = Infer({method: 'MCMC',22 samples: 10000,23 lag: 10}, model);24var ANSWER = marginalize(posterior, "x");25
query pins MCMC/MH (10000 samples, lag 10). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.3059 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.125, 0.256] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | unavailable | query pins MCMC/MH (10000 samples, lag 10). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost. |
A point (x, y) lies on a heart-shaped curve if x² + (y − x^(2/3))² − 1 is within 0.01 of zero. The x-coordinate ranges over [−1, 1] and the y-coordinate over [−1, 1.6]. The proposal distribution draws x and y jointly from a two-dimensional Gaussian with mean at the center of the bounding box (xmu = 0, ymu = 0.3) and standard deviations equal to half the bounding-box width (xsigma = 1, ysigma = 1.3).
Draw x and y jointly from the two-dimensional Gaussian described in 'given', then condition on the point being on the curve.
The marginal posterior distributions of x and y separately, each as a real-valued distribution, obtained by running MH-MCMC for 1000 samples with a lag of 100.
answer spec
{
"kind": "record",
"fields": {
"x": {
"kind": "dist",
"domain": "real"
},
"y": {
"kind": "dist",
"domain": "real"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var onCurve = function(x, y) {2 var x2 = x*x;3 var term1 = y - Math.pow(x2, 1/3);4 var crossSection = x2 + term1*term1 - 1;5 return Math.abs(crossSection) < 0.01;6};7var xbounds = [-1, 1];8var ybounds = [-1, 1.6];910var xmu = 0.5 * (xbounds[0] + xbounds[1]);11var ymu = 0.5 * (ybounds[0] + ybounds[1]);12var xsigma = 0.5 * (xbounds[1] - xbounds[0]);13var ysigma = 0.5 * (ybounds[1] - ybounds[0]);1415var model = function() {16 var xy = diagCovGaussian({mu: Vector([xmu, ymu]),17 sigma: Vector([xsigma, ysigma])});18 var x = T.get(xy, 0);19 var y = T.get(xy, 1);20 condition(onCurve(x, y));21 return {x: x, y: y};22};23var posterior = Infer({method: 'MCMC',24 samples: 1000,25 lag: 100}, model);26var ANSWER = {27 x: marginalize(posterior, function(p) { return p.x; }),28 y: marginalize(posterior, function(p) { return p.y; })29};
query pins MH-MCMC (1000 samples, lag 100, joint proposal). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.1693 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.260, 0.260] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | unavailable | query pins MH-MCMC (1000 samples, lag 100, joint proposal). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost. |
A point (x, y) lies on a heart-shaped curve if x² + (y − x^(2/3))² − 1 is within 0.01 of zero. The x-coordinate ranges over [−1, 1] and the y-coordinate over [−1, 1.6]. Each of x and y is drawn independently from a Gaussian with mean at the center of its bounding-box range and standard deviation equal to half the range (xmu = 0, ymu = 0.3, xsigma = 1, ysigma = 1.3).
Draw x and y independently from their respective Gaussians described in 'given', then condition on the point being on the curve.
The marginal posterior distribution over y as a real-valued distribution, obtained by running HMC-MCMC for 10000 samples using a leapfrog kernel with 10 steps and step size 0.5.
answer spec
{
"kind": "dist",
"domain": "real"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var onCurve = function(x, y) {2 var x2 = x*x;3 var term1 = y - Math.pow(x2, 1/3);4 var crossSection = x2 + term1*term1 - 1;5 return Math.abs(crossSection) < 0.01;6};7var xbounds = [-1, 1];8var ybounds = [-1, 1.6];910var xmu = 0.5 * (xbounds[0] + xbounds[1]);11var ymu = 0.5 * (ybounds[0] + ybounds[1]);12var xsigma = 0.5 * (xbounds[1] - xbounds[0]);13var ysigma = 0.5 * (ybounds[1] - ybounds[0]);1415var model = function() {16 var x = gaussian(xmu, xsigma);17 var y = gaussian(ymu, ysigma);18 condition(onCurve(x, y));19 return {x: x, y: y};20};21var posterior = Infer({method: 'MCMC',22 samples: 10000,23 kernel: {HMC : { steps: 10, stepSize: .5 }} }, model);24var ANSWER = marginalize(posterior, function(p) { return p.y; });
query pins HMC (leapfrog 10 steps, step 0.5) — HMC is impossible on this non-differentiable target. Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.2406 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.223, 0.223] · claude-opus-4-8 |
| cross-language (pyro vs webppl) | unavailable | query pins HMC (leapfrog 10 steps, step 0.5) — HMC is impossible on this non-differentiable target. Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost. |
point1 is fixed at −10. point2 is drawn uniformly from [−100, 100]. interpolationWeight is drawn uniformly from [0, 1]. The interpolated value pointInMiddle = point1 × interpolationWeight + point2 × (1 − interpolationWeight) must satisfy |pointInMiddle| < 0.01.
Draw point2 and interpolationWeight from their priors; compute pointInMiddle as the weighted interpolation of point1 and point2; condition hard on |pointInMiddle| < 0.01.
The marginal posterior distribution over interpolationWeight as a real-valued distribution, using rejection sampling with 1000 samples.
answer spec
{
"kind": "dist",
"domain": "real"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var interpolate = function(point1, point2, interpolationWeight) {2 return (point1 * interpolationWeight +3 point2 * (1 - interpolationWeight));4};56var model = function(){7 var point1 = -10;8 var point2 = uniform(-100, 100);9 var interpolationWeight = uniform(0, 1);10 var pointInMiddle = interpolate(point1, point2, interpolationWeight);11 condition(Math.abs(pointInMiddle) < 0.01);12 return {point2: point2, interpolationWeight: interpolationWeight, pointInMiddle: pointInMiddle};13};14var posterior = Infer({method: 'rejection', samples: 1000}, model);15var ANSWER = marginalize(posterior, function(x) { return x.interpolationWeight; });
Method-pinned (query: rejection sampling, 1000 samples) on a thin acceptance band |pointInMiddle|<0.01 with prior acceptance ~2e-4. The posterior over interpolationWeight is determinate, but cannot be certified in Pyro at practical cost: plain Importance is accurate yet ill-posed (too few accepts) at feasible sample counts and times out at the counts needed to bring the noise floor under the discriminability cap; a guided proposal needs fragile hand-tuning against the bounded point2 prior. Same inference-algorithms method-demo family as ex1.1/1.2/1.3.
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0221 (w1) |
| solver re-derivation | accept | 2/2 solvers · d=[0.010, 0.010] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | unavailable | Method-pinned (query: rejection sampling, 1000 samples) on a thin acceptance band |pointInMiddle|<0.01 with prior acceptance ~2e-4. The posterior over interpolationWeight is determinate, but cannot be certified in Pyro at practical cost: plain Importance is accurate yet ill-posed (too few accepts) at feasible sample counts and times out at the counts needed to bring the noise floor under the discriminability cap; a guided proposal needs fragile hand-tuning against the bounded point2 prior. Same inference-algorithms method-demo family as ex1.1/1.2/1.3. |
A coin is fair (weight 0.5) with prior probability 0.9, and biased with prior probability 0.1. Among biased coins, the weight is 1 (two-faced) with probability 0.7 and drawn uniformly from (0, 1) with probability 0.3. Each toss follows a Bernoulli distribution with the coin's weight. The full dataset consists of 50 heads. The observed data sizes to evaluate at are [0, 1, 2, 4, 6, 8, 10, 12, 15, 20, 25, 30, 40, 50].
A coin is drawn from a two-component prior: with probability 0.9 it is fair; otherwise it is biased, and within the biased class it is two-faced (weight 1) with probability 0.7 or has a uniformly-drawn weight with probability 0.3. Each observed toss is independently generated from the Bernoulli distribution with the coin's weight.
For each prefix of the full 50-heads dataset of length N in [0, 1, 2, 4, 6, 8, 10, 12, 15, 20, 25, 30, 40, 50], compute the posterior expected coin weight given the first N observations, using MCMC with 1000 burn-in steps and 10000 samples. Return the array of 14 expected weights.
answer spec
{
"kind": "value",
"domain": "realvec",
"estimated": true
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var weightPosterior = function(observedData) {2 return Infer({method: 'MCMC', burn:1000, samples: 10000}, function() {3 var isFair = flip(0.9);4 var isTwoFaced = flip(0.7);5 var realWeight = isFair ? 0.5 : (isTwoFaced ? 1 : uniform({a:0, b:1}));6 var coin = Bernoulli({p: realWeight});7 var obsFn = function(datum) { observe(coin, datum=='h') };8 mapData({data: observedData}, obsFn);9 return realWeight;10 })11};1213var fullDataSet = repeat(50, function() { 'h' });14var observedDataSizes = [0,1,2,4,6,8,10,12,15,20,25,30,40,50];15var ANSWER = (map(function(N) { expectation(weightPosterior(fullDataSet.slice(0, N))) }, observedDataSizes));
1# Two-component coin prior: fair (0.5) w.p. 0.9; else biased -> two-faced (weight 1)2# w.p. 0.7 or uniform(0,1) w.p. 0.3. Each toss Bernoulli(weight). Observe N heads for3# each N in observedDataSizes; return posterior expected weight for each prefix.4# Discrete latents (isFair, isTwoFaced) sampled; continuous uniform weight handled by5# importance over the model. A single Importance path covers all N including 0: with6# no observations Importance reduces to prior sampling, so the same normalized-weight7# expectation gives the prior mean -- no separate hand-rolled loop.89observedDataSizes = [0, 1, 2, 4, 6, 8, 10, 12, 15, 20, 25, 30, 40, 50]1011def make_model(num_heads, n_obs):12 def model():13 isFair = pyro.sample("isFair", dist.Bernoulli(0.9))14 isTwoFaced = pyro.sample("isTwoFaced", dist.Bernoulli(0.7))15 if bool(isFair.item()):16 weight = torch.tensor(0.5)17 elif bool(isTwoFaced.item()):18 weight = torch.tensor(1.0)19 else:20 weight = pyro.sample("weight", dist.Uniform(0.0, 1.0))21 if n_obs > 0:22 # all observed tosses are heads23 pyro.sample("obs", dist.Binomial(n_obs, weight), obs=torch.tensor(float(num_heads)))24 return weight25 return model2627expected = []28for N in observedDataSizes:29 mdl = make_model(N, N) # full dataset is 50 heads, so a prefix of length N is N heads30 posterior = pyro.infer.Importance(mdl, num_samples=5000).run()31 lw = torch.tensor([posterior.log_weights[i] for i in range(len(posterior.log_weights))])32 w = torch.softmax(lw, dim=0)33 vals = torch.tensor([float(tr.nodes["_RETURN"]["value"].item())34 for tr in posterior.exec_traces])35 expected.append(float((w * vals).sum().item()))3637ANSWER = expected38
[0.5355, 0.5659, 0.6160, 0.7918, 0.9311, 0.9643, 0.9920, 0.9960, 0.9982, 0.9990, 0.9992, 0.9996, 0.9998, 0.9998]
[0.5360, 0.5696, 0.6280, 0.7891, 0.9190, 0.9712, 0.9896, 0.9956, 0.9982, 0.9989, 0.9994, 0.9996, 0.9999, 0.9997]
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0470 (absdiff) |
| solver re-derivation | accept | 2/2 solvers · d=[0.024, 0.019] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0167 ≤ tol 0.0939 · floors 0.0137/0.0470 |
A coin's weight is given a Beta(10, 10) prior. The full dataset alternates heads and tails 50 times each, giving a sequence of 100 observations (h, t, h, t, …).
The coin weight is drawn from the prior. Each toss is independently Bernoulli-distributed with the coin's weight as its success probability. The posterior is inferred via MCMC with 1000 burn-in steps and 1000 samples.
Return a record with two distributions: the prior distribution over coin weight, and the posterior distribution over coin weight after conditioning on all 100 observations. The prior field must be the parametric Beta distribution object itself (the prior as a distribution, not samples drawn from it).
answer spec
{
"kind": "record",
"fields": {
"prior": {
"kind": "dist",
"domain": "real"
},
"post": {
"kind": "dist",
"domain": "real"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var pseudoCounts = {a: 10, b: 10};23var weightPosterior = function(observedData){4 return Infer({method: 'MCMC', burn:1000, samples: 1000}, function() {5 var coinWeight = sample(Beta(pseudoCounts));6 var coinDist = Bernoulli({p: coinWeight});7 var obsFn = function(datum){ observe(coinDist, datum=='h') };8 mapData({data: observedData}, obsFn);9 return coinWeight;10 })11};1213var fullDataSet = repeat(50, function() { ['h', 't'] }).flat();14var ANSWER = (({15 prior: Beta(pseudoCounts),16 post: weightPosterior(fullDataSet)17}));
12# Beta-Bernoulli: prior Beta(10,10), 100 observations alternating h,t.3# Prior field is the parametric Beta distribution object itself.4# Posterior over the coin weight inferred via MCMC (NUTS).56observations = torch.tensor([1.0, 0.0] * 50)78def model():9 weight = pyro.sample("weight", dist.Beta(10.0, 10.0))10 with pyro.plate("data", observations.shape[0]):11 pyro.sample("obs", dist.Bernoulli(weight), obs=observations)12 return weight1314kernel = pyro.infer.NUTS(model)15mcmc = pyro.infer.MCMC(kernel, num_samples=1000, warmup_steps=1000,16 disable_progbar=True)17mcmc.run()1819ANSWER = {20 "prior": dist.Beta(10.0, 10.0),21 "post": mcmc.get_samples()["weight"],22}23
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0069 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.005, 0.005] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0034 ≤ tol 0.0138 · floors 0.0066/0.0069 |
A coin's weight is given a Beta(10, 10) prior. The full dataset alternates heads and tails 256 times each, giving a sequence of 512 observations (h, t, h, t, …). The data-size checkpoints to evaluate at are [0, 2, 4, 8, 16, 32, 64, 128, 256, 512].
The coin weight is drawn from the prior. Each toss is independently Bernoulli-distributed with the coin's weight as its success probability. At each checkpoint N, the posterior over coin weight is inferred from the first N observations via MCMC with 1000 burn-in steps and 1000 samples.
For each checkpoint N in [0, 2, 4, 8, 16, 32, 64, 128, 256, 512], compute the variance of the posterior over coin weight given the first N observations. The variance is the posterior expected squared deviation from the posterior mean. Return the array of 10 variances.
answer spec
{
"kind": "value",
"domain": "realvec",
"estimated": true
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var pseudoCounts = {a: 10, b: 10};23var weightPosterior = function(observedData){4 return Infer({method: 'MCMC', burn:1000, samples: 1000}, function() {5 var coinWeight = sample(Beta(pseudoCounts));6 var coinDist = Bernoulli({p: coinWeight});7 var obsFn = function(datum){ observe(coinDist, datum=='h') };8 mapData({data: observedData}, obsFn);9 return coinWeight;10 })11};1213var fullDataSet = repeat(256, function(){['h', 't']}).flat();14var observedDataSizes = [0,2,4,8,16,32,64,128,256,512];15var ANSWER = (map(function(N) {16 var posterior = weightPosterior(fullDataSet.slice(0,N));17 var mean = expectation(posterior);18 return expectation(posterior, function(x) { Math.pow(x - mean, 2) });19}, observedDataSizes));
1# probmods2-learning-as-conditional-inference/ex2.22# Beta-Bernoulli coin: prior Beta(a=10, b=10); data is the alternating sequence3# ['h','t'] repeated 256 times (512 observations). For each checkpoint N, condition4# on the first N observations and report the posterior VARIANCE of the coin weight.5# The posterior comes from running Pyro NUTS over the Beta-Bernoulli model (the6# values are estimated, as the query and spec require) -- no conjugate formula.7import pyro.infer89pseudo_a = 10.010pseudo_b = 10.01112# repeat(256, ['h','t']).flat()13full_data = ['h', 't'] * 25614observed_sizes = [0, 2, 4, 8, 16, 32, 64, 128, 256, 512]1516def make_model(obs_list):17 if len(obs_list) > 0:18 obs_tensor = torch.tensor([1.0 if d == 'h' else 0.0 for d in obs_list])19 else:20 obs_tensor = None2122 def model():23 coin_weight = pyro.sample("coin_weight",24 dist.Beta(torch.tensor(pseudo_a), torch.tensor(pseudo_b)))25 if obs_tensor is not None:26 with pyro.plate("data", obs_tensor.shape[0]):27 pyro.sample("obs", dist.Bernoulli(coin_weight), obs=obs_tensor)28 return coin_weight2930 return model3132variances = []33for N in observed_sizes:34 model = make_model(full_data[:N])35 kernel = pyro.infer.NUTS(model)36 mcmc = pyro.infer.MCMC(kernel, num_samples=1000, warmup_steps=500,37 disable_progbar=True)38 mcmc.run()39 samples = mcmc.get_samples()["coin_weight"]40 mean = samples.mean()41 var = ((samples - mean) ** 2).mean()42 variances.append(var.item())4344ANSWER = variances45
[0.0119, 0.0107, 0.0101, 0.0088, 0.0066, 0.0045, 0.0029, 0.0018, 0.0009, 0.0005]
[0.0122, 0.0115, 0.0097, 0.0087, 0.0070, 0.0047, 0.0028, 0.0015, 0.0009, 0.0005]
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0019 (absdiff) |
| solver re-derivation | accept | 2/2 solvers · d=[0.001, 0.001] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0011 ≤ tol 0.0039 · floors 0.0018/0.0019 |
Ten aliens are observed, each with three binary properties: antennae, green, blarghNoise. The observations are: alien 1: antennae=false, green=false, blarghNoise=false alien 2: antennae=true, green=true, blarghNoise=true alien 3: antennae=true, green=true, blarghNoise=true alien 4: antennae=true, green=true, blarghNoise=true alien 5: antennae=false, green=false, blarghNoise=false alien 6: antennae=true, green=true, blarghNoise=true alien 7: antennae=false, green=false, blarghNoise=false alien 8: antennae=true, green=true, blarghNoise=true alien 9: antennae=false, green=false, blarghNoise=false alien 10: antennae=false, green=false, blarghNoise=false There are two latent alien kinds. For each kind, the probability of each of the three binary properties is drawn independently from a Beta(0.5, 0.5) prior. Each alien independently belongs to either kind with equal prior probability (0.5 each), and its three properties are each independently drawn from the Bernoulli distribution with the kind's corresponding property probability. The group prototypes are shared across aliens of the same kind within one inference run. Inference uses MCMC with an HMC kernel (10 leapfrog steps, step size 0.01) for 3000 samples.
A two-component mixture model over alien kinds. Each kind has a prototype: three independent property probabilities drawn from Beta(0.5, 0.5). Each alien's kind is drawn 50/50, and its three binary properties are drawn independently from Bernoulli distributions parameterized by the kind's prototype. The prototype is shared (memoized) within one inference run.
Compute the posterior mean of each group's property probabilities, sorted so that the group with the lower posterior mean antennae probability is the 'low' group and the other is the 'high' group. Return a record with six fields: low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh — each the posterior expected probability for that group and property.
answer spec
{
"kind": "record",
"fields": {
"low_antennae": {
"kind": "value",
"domain": "real",
"estimated": true
},
"low_green": {
"kind": "value",
"domain": "real",
"estimated": true
},
"low_blargh": {
"kind": "value",
"domain": "real",
"estimated": true
},
"high_antennae": {
"kind": "value",
"domain": "real",
"estimated": true
},
"high_green": {
"kind": "value",
"domain": "real",
"estimated": true
},
"high_blargh": {
"kind": "value",
"domain": "real",
"estimated": true
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var properties = ['antennae', 'green', 'blarghNoise'];2var data = [3 {antennae : false, green: false, blarghNoise: false},4 {antennae : true, green: true, blarghNoise: true},5 {antennae : true, green: true, blarghNoise: true},6 {antennae : true, green: true, blarghNoise: true},7 {antennae : false, green: false, blarghNoise: false},8 {antennae : true, green: true, blarghNoise: true},9 {antennae : false, green: false, blarghNoise: false},10 {antennae : true, green: true, blarghNoise: true},11 {antennae : false, green: false, blarghNoise: false},12 {antennae : false, green: false, blarghNoise: false}13];1415var sampleGroupPrototype = mem(function(groupName) {16 var probs = repeat(3, function(){ beta(.5, .5)});17 return _.zipObject(properties, probs);18});19var posterior = Infer({method: 'MCMC', kernel: {HMC: {steps: 10, stepSize: .01}}, samples: 3000},20 function(){21 mapData({data: data}, function(datum) {22 var group = flip() ? 'group1' : 'group2';23 var prototype = sampleGroupPrototype(group);24 mapData({data: properties}, function(property) {25 observe(Bernoulli({p: prototype[property]}), datum[property]);26 });27 });28 return {group1: sampleGroupPrototype('group1'),29 group2: sampleGroupPrototype('group2')};30});31var g1Mean = expectation(posterior, function(s) { return s.group1.antennae });32var g2Mean = expectation(posterior, function(s) { return s.group2.antennae });33var lowGroup = g1Mean < g2Mean ? 'group1' : 'group2';34var highGroup = g1Mean < g2Mean ? 'group2' : 'group1';35var ANSWER = ({36 low_antennae: expectation(posterior, function(s) { return s[lowGroup].antennae }),37 low_green: expectation(posterior, function(s) { return s[lowGroup].green }),38 low_blargh: expectation(posterior, function(s) { return s[lowGroup].blarghNoise }),39 high_antennae: expectation(posterior, function(s) { return s[highGroup].antennae }),40 high_green: expectation(posterior, function(s) { return s[highGroup].green }),41 high_blargh: expectation(posterior, function(s) { return s[highGroup].blarghNoise })42});
1# Two-component mixture over 10 aliens, 3 binary properties each.2# Each group prototype = 3 independent Beta(.5,.5) property probs (continuous).3# Each alien's group ~ Bernoulli(.5) (discrete); properties ~ Bernoulli(prototype).4# Continuous prototypes sampled by NUTS; per-alien group assignments marginalized5# by enumeration. Group labels are non-identifiable, so each posterior sample's two6# prototypes are sorted by antennae prob before averaging (this matches the7# reference's sort-by-antennae when the chain does not switch labels, and is robust8# if it does).9data = torch.tensor([10 [0., 0., 0.],11 [1., 1., 1.],12 [1., 1., 1.],13 [1., 1., 1.],14 [0., 0., 0.],15 [1., 1., 1.],16 [0., 0., 0.],17 [1., 1., 1.],18 [0., 0., 0.],19 [0., 0., 0.],20], dtype=torch.float64)2122@pyro.infer.config_enumerate23def model():24 p1 = pyro.sample('p1', dist.Beta(0.5, 0.5).expand([3]).to_event(1))25 p2 = pyro.sample('p2', dist.Beta(0.5, 0.5).expand([3]).to_event(1))26 protos = torch.stack([p1, p2], dim=0) # [2, 3]27 with pyro.plate('aliens', 10):28 g = pyro.sample('g', dist.Bernoulli(0.5)) # enumerated: 0->group1, 1->group229 idx = g.long()30 proto = protos[idx] # broadcasts over enum dim -> [..., 3]31 pyro.sample('obs', dist.Bernoulli(proto).to_event(1), obs=data)3233kernel = pyro.infer.NUTS(model)34mcmc = pyro.infer.MCMC(kernel, num_samples=900, warmup_steps=500)35mcmc.run()36samples = mcmc.get_samples()37p1_s = samples['p1'].to(torch.float64) # [N, 3]38p2_s = samples['p2'].to(torch.float64) # [N, 3]3940# Per-sample sort: low group = the one with smaller antennae (index 0) probability.41p1_is_low = (p1_s[:, 0] <= p2_s[:, 0]).unsqueeze(-1) # [N,1]42low = torch.where(p1_is_low, p1_s, p2_s) # [N,3]43high = torch.where(p1_is_low, p2_s, p1_s) # [N,3]44low_mean = low.mean(dim=0)45high_mean = high.mean(dim=0)4647ANSWER = {48 'low_antennae': low_mean[0].item(),49 'low_green': low_mean[1].item(),50 'low_blargh': low_mean[2].item(),51 'high_antennae': high_mean[0].item(),52 'high_green': high_mean[1].item(),53 'high_blargh': high_mean[2].item(),54}55
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0676 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.029, 0.042] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0374 ≤ tol 0.1352 · floors 0.0088/0.0676 |
Ten aliens are observed, each with three binary properties: antennae, green, blarghNoise. The observations are: alien 1: antennae=false, green=false, blarghNoise=false; alien 2: antennae=true, green=true, blarghNoise=true; alien 3: antennae=true, green=true, blarghNoise=true; alien 4: antennae=true, green=true, blarghNoise=true; alien 5: antennae=false, green=false, blarghNoise=false; alien 6: antennae=true, green=true, blarghNoise=true; alien 7: antennae=false, green=false, blarghNoise=false; alien 8: antennae=true, green=true, blarghNoise=true; alien 9: antennae=false, green=false, blarghNoise=false; alien 10: antennae=false, green=false, blarghNoise=false. There are two latent alien kinds. For each kind, the probability of each of the three binary properties is drawn independently from a Beta(0.5, 0.5) prior. Each alien independently belongs to either kind with equal prior probability (0.5 each), and its three properties are each independently drawn from the Bernoulli distribution with the kind's corresponding property probability. The group prototypes are shared (memoized) across aliens of the same kind within one inference run. Inference uses MCMC with an HMC kernel (10 leapfrog steps, step size 0.01) for 6000 samples. Additionally, a blargh sound is heard from a crater but the alien cannot be seen. This mystery alien belongs to either kind with equal prior probability.
Extend the ex1.a mixture model with one additional latent alien: the mystery alien's kind is drawn with equal probability from the two kinds, and its blarghNoise property is observed to be true. The prototypes are shared across all aliens, including the mystery alien.
Compute the posterior mean of each group's property probabilities, sorted so that the group with the lower posterior mean antennae probability is the 'low' group and the other is the 'high' group. Also compute the posterior probability that the mystery alien belongs to the high group. Return a record with seven fields: low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh, p_mystery_from_high. Estimate with MCMC using an HMC kernel (10 leapfrog steps, step size 0.01) and 6000 posterior samples.
answer spec
{
"kind": "record",
"fields": {
"low_antennae": {
"kind": "value",
"domain": "real",
"estimated": true
},
"low_green": {
"kind": "value",
"domain": "real",
"estimated": true
},
"low_blargh": {
"kind": "value",
"domain": "real",
"estimated": true
},
"high_antennae": {
"kind": "value",
"domain": "real",
"estimated": true
},
"high_green": {
"kind": "value",
"domain": "real",
"estimated": true
},
"high_blargh": {
"kind": "value",
"domain": "real",
"estimated": true
},
"p_mystery_from_high": {
"kind": "value",
"domain": "real",
"estimated": true
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var properties = ['antennae', 'green', 'blarghNoise'];2var data = [3 {antennae : false, green: false, blarghNoise: false},4 {antennae : true, green: true, blarghNoise: true},5 {antennae : true, green: true, blarghNoise: true},6 {antennae : true, green: true, blarghNoise: true},7 {antennae : false, green: false, blarghNoise: false},8 {antennae : true, green: true, blarghNoise: true},9 {antennae : false, green: false, blarghNoise: false},10 {antennae : true, green: true, blarghNoise: true},11 {antennae : false, green: false, blarghNoise: false},12 {antennae : false, green: false, blarghNoise: false}13];14var sampleGroupPrototype = mem(function(groupName) {15 var probs = repeat(3, function(){ beta(.5, .5)});16 return _.zipObject(properties, probs);17});18var posterior = Infer({method: 'MCMC', kernel: {HMC: {steps: 10, stepSize: .01}}, samples: 6000},19 function(){20 mapData({data: data}, function(datum) {21 var group = flip() ? 'group1' : 'group2';22 var prototype = sampleGroupPrototype(group);23 mapData({data: properties}, function(property) {24 observe(Bernoulli({p: prototype[property]}), datum[property]);25 });26 });27 var mysteryGroup = flip() ? 'group1' : 'group2';28 var mysteryPrototype = sampleGroupPrototype(mysteryGroup);29 observe(Bernoulli({p: mysteryPrototype['blarghNoise']}), true);30 return {group1: sampleGroupPrototype('group1'),31 group2: sampleGroupPrototype('group2'),32 mysteryGroup: mysteryGroup};33});34var g1Mean = expectation(posterior, function(s) { return s.group1.antennae });35var g2Mean = expectation(posterior, function(s) { return s.group2.antennae });36var highGroup = g1Mean >= g2Mean ? 'group1' : 'group2';37var lowGroup = g1Mean >= g2Mean ? 'group2' : 'group1';38var ANSWER = ({39 low_antennae: expectation(posterior, function(s) { return s[lowGroup].antennae }),40 low_green: expectation(posterior, function(s) { return s[lowGroup].green }),41 low_blargh: expectation(posterior, function(s) { return s[lowGroup].blarghNoise }),42 high_antennae: expectation(posterior, function(s) { return s[highGroup].antennae }),43 high_green: expectation(posterior, function(s) { return s[highGroup].green }),44 high_blargh: expectation(posterior, function(s) { return s[highGroup].blarghNoise }),45 p_mystery_from_high: expectation(posterior, function(s) { return s.mysteryGroup === highGroup ? 1 : 0 })46});
1# Two-kind alien mixture. Continuous prototype probabilities (Beta(.5,.5) per2# property per kind) are the only true continuous latents; the per-alien kind3# assignments and the mystery alien's kind are discrete and are marginalized out4# with config_enumerate so NUTS samples only the 6 continuous prototype params.5properties = ["antennae", "green", "blarghNoise"]6data = [7 {"antennae": False, "green": False, "blarghNoise": False},8 {"antennae": True, "green": True, "blarghNoise": True},9 {"antennae": True, "green": True, "blarghNoise": True},10 {"antennae": True, "green": True, "blarghNoise": True},11 {"antennae": False, "green": False, "blarghNoise": False},12 {"antennae": True, "green": True, "blarghNoise": True},13 {"antennae": False, "green": False, "blarghNoise": False},14 {"antennae": True, "green": True, "blarghNoise": True},15 {"antennae": False, "green": False, "blarghNoise": False},16 {"antennae": False, "green": False, "blarghNoise": False},17]1819# data tensor: 10 aliens x 3 properties (1.0/0.0)20data_t = torch.tensor([[1.0 if d[p] else 0.0 for p in properties] for d in data])21n_aliens = len(data)222324@pyro.infer.config_enumerate25def model():26 # group prototypes: for each group (2) and property (3), a Beta(.5,.5) prob.27 # shape: (2 groups, 3 properties)28 proto = pyro.sample(29 "proto",30 dist.Beta(0.5 * torch.ones(2, 3), 0.5 * torch.ones(2, 3)).to_event(2),31 )32 # each observed alien: pick a group (uniform), observe its 3 properties.33 with pyro.plate("aliens", n_aliens):34 group = pyro.sample("group", dist.Categorical(torch.ones(2) / 2))35 # proto[group]: gather per-alien property probs -> shape (..., n_aliens, 3)36 p = proto[group] # advanced indexing over enumerated group dim37 pyro.sample("obs", dist.Bernoulli(p).to_event(1), obs=data_t)38 # mystery alien: pick a group (uniform), observe blarghNoise == True.39 mystery = pyro.sample("mystery", dist.Categorical(torch.ones(2) / 2))40 pm = proto[mystery][..., 2] # blarghNoise is index 241 pyro.sample("mystery_obs", dist.Bernoulli(pm), obs=torch.tensor(1.0))424344nuts = pyro.infer.NUTS(model)45mcmc = pyro.infer.MCMC(nuts, num_samples=1500, warmup_steps=600)46mcmc.run()47proto_samples = mcmc.get_samples()["proto"] # (S, 2, 3)48S = proto_samples.shape[0]4950# Posterior mean of each group's antennae prob; lower-mean group is 'low'.51g0_ant = proto_samples[:, 0, 0].mean()52g1_ant = proto_samples[:, 1, 0].mean()53if g0_ant <= g1_ant:54 low, high = 0, 155else:56 low, high = 1, 05758low_means = proto_samples[:, low, :].mean(dim=0)59high_means = proto_samples[:, high, :].mean(dim=0)6061# Posterior probability the mystery alien is from the high group. The discrete62# `mystery` site was enumerated out during NUTS, so recover its posterior by63# running Pyro's exact discrete inference on a model that fixes the prototype64# probabilities to each NUTS draw (plated over the S draws) and observes65# blarghNoise == True for the mystery alien. compute_marginals returns the exact66# per-draw marginal P(mystery | proto, obs); averaging P(mystery = high) over the67# proto posterior gives the queried probability.68blargh = proto_samples[:, :, 2].clamp(1e-9, 1 - 1e-9) # (S, 2): per-draw blargh prob per group697071@pyro.infer.config_enumerate72def mystery_model(blargh):73 with pyro.plate("samples", S):74 mystery = pyro.sample("mystery", dist.Categorical(torch.ones(2) / 2))75 # select the chosen group's blargh prob per draw (tensor op, enum-safe)76 pm = torch.where(mystery == 0, blargh[:, 0], blargh[:, 1])77 pyro.sample("mystery_obs", dist.Bernoulli(pm), obs=torch.ones(S))787980elbo = pyro.infer.TraceEnum_ELBO(max_plate_nesting=1)81marg = elbo.compute_marginals(mystery_model, lambda blargh: None, blargh)82mystery_marg = marg["mystery"]83sup = mystery_marg.enumerate_support() # (2, S)84probs = mystery_marg.log_prob(sup).exp() # (2, S): probs[k, s] = P(mystery=k | proto_s)85p_mystery_from_high = probs[high].mean()8687ANSWER = {88 "low_antennae": float(low_means[0]),89 "low_green": float(low_means[1]),90 "low_blargh": float(low_means[2]),91 "high_antennae": float(high_means[0]),92 "high_green": float(high_means[1]),93 "high_blargh": float(high_means[2]),94 "p_mystery_from_high": float(p_mystery_from_high),95}96
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0482 (record) |
| solver re-derivation | accept | 1/2 solvers · d=[0.022, 0.057] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0272 ≤ tol 0.0963 · floors 0.0094/0.0482 |
Twenty-two participants each complete a memory test scored 0 to 45. Their scores are: [45, 45, 44, 45, 44, 45, 45, 45, 45, 45, 30, 20, 6, 44, 44, 27, 25, 17, 14, 27, 35, 30]. There are two latent groups: bona fide participants and malingerers. The bona-fide success probability is drawn uniformly from (0.5, 1). The malingerer success probability is drawn uniformly from (0, p_bona_fide), ensuring it is strictly lower. Each participant independently belongs to either group with equal prior probability (0.5 each). Each participant's score is drawn from a Binomial distribution with 45 trials and the group's success probability.
A two-group mixture model. Bona-fide and malingerer groups each have a latent success probability drawn from the priors above. Each participant's group is drawn with equal probability, and their score is drawn from a Binomial(45, p_group). Group success probabilities are shared across all participants of the same group within one inference run. Inference uses MCMC with 10000 samples.
Compute the marginal posterior distributions of the two group success probabilities. Return a record with two fields: group_1_p (the bona-fide group success probability posterior) and group_2_p (the malingerer group success probability posterior), each as a distribution over real values.
answer spec
{
"kind": "record",
"fields": {
"group_1_p": {
"kind": "dist",
"domain": "real"
},
"group_2_p": {
"kind": "dist",
"domain": "real"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var scores = [45, 45, 44, 45, 44, 45, 45, 45, 45, 45, 30, 20, 6, 44, 44, 27, 25, 17, 14, 27, 35, 30];2var subjIDs = _.range(scores.length);3var data = map(function(datum) {return _.zipObject(['subjID', 'score'], datum)}, _.zip(subjIDs, scores));4var posterior = // NOTE: 10k unburned samples bias g1_p low (0.985 vs converged 0.991) —5// found by the Pyro cross-language gate. Keep the burn-in.6Infer({method: 'MCMC', samples: 50000, burn: 10000}, function() {7 var group_1_p = uniform(0.5, 1);8 var group_2_p = uniform(0, group_1_p);9 var participant2Group = mem(function(participantID) {10 return flip() ? 'group1' : 'group2';11 });12 var group2Prob = mem(function(group) {13 return group == 'group1' ? group_1_p : group_2_p;14 });1516 var obsFn = function(datum){17 var p = group2Prob(participant2Group(datum.subjID));18 observe(Binomial({p: p, n: 45}), datum.score);19 };20 mapData({data: data}, obsFn);2122 var participantResults_ = map(function(datum) {return participant2Group(datum.subjID)}, data);23 var participantResults = _.zipObject(_.range(participantResults_.length), participantResults_);24 return _.merge(participantResults, {group_1_p: group_1_p, group_2_p: group_2_p});25});26var ANSWER = ({27 group_1_p: marginalize(posterior, function(s) { return s.group_1_p }),28 group_2_p: marginalize(posterior, function(s) { return s.group_2_p })29});
1scores = [45, 45, 44, 45, 44, 45, 45, 45, 45, 45, 30, 20, 6, 44, 44, 27, 25, 17, 14, 27, 35, 30]2scores_t = torch.tensor(scores, dtype=torch.float64)345@pyro.infer.config_enumerate6def model():7 group_1_p = pyro.sample('group_1_p', dist.Uniform(0.5, 1.0))8 # group_2_p ~ Uniform(0, group_1_p): reparametrize as group_1_p * u, u~U(0,1).9 # Scaling U(0,1) by group_1_p yields exactly U(0, group_1_p); NUTS-friendly10 # (no latent-dependent support).11 u = pyro.sample('u', dist.Uniform(0.0, 1.0))12 group_2_p = pyro.deterministic('group_2_p', group_1_p * u)13 ps = torch.stack([group_1_p, group_2_p])14 with pyro.plate('participants', len(scores)):15 g = pyro.sample('g', dist.Bernoulli(0.5)).long() # 0 -> group1, 1 -> group216 p = ps[g]17 pyro.sample('score', dist.Binomial(total_count=45, probs=p), obs=scores_t)181920mcmc = pyro.infer.MCMC(pyro.infer.NUTS(model), num_samples=1000, warmup_steps=500)21mcmc.run()22samples = mcmc.get_samples()23g1 = samples['group_1_p'].reshape(-1)24ANSWER = {25 'group_1_p': g1,26 'group_2_p': (g1 * samples['u'].reshape(-1)),27}28
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0044 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.006, 0.006] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0023 ≤ tol 0.0088 · floors 0.0023/0.0044 |
Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution. Each transition distribution is drawn from a symmetric Dirichlet with parameter alpha = 1 (a uniform prior over distributions on the 5 vocabulary words). An observed sentence is ['dogs', 'chase', 'cats'] (without a trailing 'stop'; the sentence generator terminates upon drawing 'stop' and does not include 'stop' in the output).
Words are generated sequentially. A transition distribution is sampled independently for each source word, shared across all sentences (the same source word always draws from the same memoized distribution). Starting from a special 'start' token, each successive word is drawn from the transition distribution of the current word; the sentence ends when 'stop' is drawn (and 'stop' is not included in the output).
Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the observed sentence, then return the posterior distribution over the word that follows 'chase', with a Dirichlet drift kernel (concentration 10) on each transition distribution.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"dogs",
"cats",
"chase",
"sleep",
"stop"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var comparray = function(arr1,arr2){2 return (JSON.stringify(arr1) === JSON.stringify(arr2));3};4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP:false}, function() {5 let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];6 var wordToDistribution = mem(function(word) {7 return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});8 });9 var transition = function(word) {10 return categorical({ps: wordToDistribution(word), vs: vocab});11 };12 let obs = ['dogs', 'chase', 'cats'];13 let generateSentence = function(lastState, sentence) {14 let word = transition(lastState);15 if (word == 'stop') return [];16 return [word].concat(generateSentence(word, sentence));17 };18 condition(comparray(obs, generateSentence('start')));19 return transition('chase');20}));
1# Word-level bigram model. Each source word has a memoized transition distribution2# ~ Dirichlet(ones(5)) over the vocabulary (continuous latents). Sentence draws are3# categorical. Conditioning on the exact sentence ['dogs','chase','cats'] forces the4# transition chain start->dogs->chase->cats->stop; we observe those categorical5# draws. The query transition('chase') is a fresh draw from chase's (reweighted)6# distribution. Discrete conditioning -> Importance sampling.7vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop']8idx = {w: i for i, w in enumerate(vocab)}9V = len(vocab)1011def model():12 # Per-source-word transition distributions (only the ones we touch).13 d_start = pyro.sample('d_start', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))14 d_dogs = pyro.sample('d_dogs', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))15 d_chase = pyro.sample('d_chase', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))16 d_cats = pyro.sample('d_cats', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))17 # Observe the forced transition chain for the conditioned sentence.18 pyro.sample('t0', dist.Categorical(d_start), obs=torch.tensor(idx['dogs']))19 pyro.sample('t1', dist.Categorical(d_dogs), obs=torch.tensor(idx['chase']))20 pyro.sample('t2', dist.Categorical(d_chase), obs=torch.tensor(idx['cats']))21 pyro.sample('t3', dist.Categorical(d_cats), obs=torch.tensor(idx['stop']))22 # Query: a fresh draw from chase's transition distribution.23 q = pyro.sample('q', dist.Categorical(d_chase))24 return vocab[int(q.item())]2526post = pyro.infer.Importance(model, num_samples=8000).run()27lw = torch.tensor(post.log_weights, dtype=torch.float64)28w = (lw - lw.max()).exp()29w = w / w.sum()30probs = {word: 0.0 for word in vocab}31for tr, wi in zip(post.exec_traces, w):32 probs[tr.nodes['_RETURN']['value']] += wi.item()3334ANSWER = probs35
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0461 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.026, 0.026] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0433 ≤ tol 0.1205 · floors 0.0602/0.0461 |
Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution drawn from a symmetric Dirichlet with concentration 10 over the 5 vocabulary words (alpha = ones([5,1]), concentration = 10). These per-word transition distributions are shared across all sentences (memoized). The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' and includes 'stop' in the output.
Words are generated sequentially. Starting from a special 'start' token, each successive word is drawn from the memoized transition distribution of the current word; the sentence ends when 'stop' is drawn (included in output). A second independent sentence is generated from the same shared transition distributions.
Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the first observed sentence, and also condition on the first word of the second sentence being 'dogs'. Return the posterior distribution over the second word of the second sentence.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"dogs",
"cats",
"chase",
"sleep",
"stop"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var comparray = function(arr1,arr2){2 return (JSON.stringify(arr1) === JSON.stringify(arr2));3};4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP: false}, function() {5 let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];6 var wordToDistribution = mem(function(word) {7 return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});8 });9 var transition = function(word) {10 return categorical({ps: wordToDistribution(word), vs: vocab});11 };12 let generateSentence = function(lastState, sentence) {13 let word = transition(lastState);14 if (word == 'stop') return ['stop'];15 return [word].concat(generateSentence(word, sentence));16 };17 let obs = ['dogs', 'chase', 'cats', 'stop'];18 condition(comparray(obs, generateSentence('start')));19 let newSentence = generateSentence('start');20 condition(newSentence[0] == 'dogs');21 return newSentence[1];22}));
1# probmods2-observing-sequences/ex1.b2# Markov sentence model: each word's transition distribution is a memoized3# Dirichlet(ones(5)) latent; transitions are Categorical over the vocab. Condition4# on the first sentence being ['dogs','chase','cats','stop'] (observed transitions),5# then on the first word of a second sentence being 'dogs'; return the posterior6# over the second word of that second sentence. Inference is run with Pyro's7# Importance sampler (continuous Dirichlet latents + observed/conditioned8# Categorical transitions) -- no closed-form conjugacy.9import pyro.infer10from collections import defaultdict1112vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop']13idx = {w: i for i, w in enumerate(vocab)}14states = ['start'] + vocab15alpha = torch.ones(len(vocab)) # Dirichlet prior ones(5); concentration:10 is the16 # drift-kernel width in WebPPL, NOT the prior alpha.1718obs_sentence = ['dogs', 'chase', 'cats', 'stop']1920def model():21 # memoized transition distribution per state (sampled once, reused)22 theta = {s: pyro.sample(f"theta_{s}", dist.Dirichlet(alpha)) for s in states}2324 # condition on first sentence: start->dogs->chase->cats->stop (observed)25 prev = 'start'26 for t, word in enumerate(obs_sentence):27 pyro.sample(f"obs_{t}", dist.Categorical(theta[prev]),28 obs=torch.tensor(idx[word]))29 prev = word3031 # second sentence: condition first word == 'dogs' (observed), query second word32 pyro.sample("w1", dist.Categorical(theta['start']), obs=torch.tensor(idx['dogs']))33 w2 = pyro.sample("w2", dist.Categorical(theta['dogs']))34 return w23536posterior = pyro.infer.Importance(model, num_samples=50000)37posterior.run()3839weight_by_label = defaultdict(float)40total = 0.041for tr, lw in zip(posterior.exec_traces, posterior.log_weights):42 w = float(torch.as_tensor(lw).exp())43 label = vocab[int(tr.nodes["_RETURN"]["value"])]44 weight_by_label[label] += w45 total += w4647ANSWER = {lab: weight_by_label.get(lab, 0.0) / total for lab in vocab}48
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0997 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.153, 0.133] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0451 ≤ tol 0.1994 · floors 0.0282/0.0997 |
Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution drawn from a symmetric Dirichlet with parameter alpha = 1 (a uniform prior over distributions on the 5 vocabulary words). These per-word transition distributions are shared across all sentences (memoized). The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' and includes 'stop' in the output.
Words are generated sequentially. Starting from a special 'start' token, each successive word is drawn from the memoized transition distribution of the current word; the sentence ends when 'stop' is drawn (included in output). A second independent sentence is generated from the same shared transition distributions.
Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the first observed sentence, and also condition on the second word of the second sentence being 'chase'. Return the posterior distribution over the first word of the second sentence, with a Dirichlet drift kernel (concentration 10) on each transition distribution.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"dogs",
"cats",
"chase",
"sleep",
"stop"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var comparray = function(arr1,arr2){2 return (JSON.stringify(arr1) === JSON.stringify(arr2));3};4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP: false}, function() {5 let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];6 var wordToDistribution = mem(function(word) {7 return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});8 });9 var transition = function(word) {10 return categorical({ps: wordToDistribution(word), vs: vocab});11 };12 let generateSentence = function(lastState, sentence) {13 let word = transition(lastState);14 if (word == 'stop') return ['stop'];15 return [word].concat(generateSentence(word, sentence));16 };17 let obs = ['dogs', 'chase', 'cats', 'stop'];18 condition(comparray(obs, generateSentence('start')));19 let newSentence = generateSentence('start');20 condition(newSentence[1] == 'chase');21 return newSentence[0];22}));
1# probmods2-observing-sequences/ex1.c2# Same Markov sentence model with memoized Dirichlet(ones(5)) transition latents.3# Condition on the first sentence ['dogs','chase','cats','stop'] and on the second4# word of a second sentence being 'chase'; return the posterior over the FIRST word5# of that second sentence. Inference via Pyro's Importance sampler over the6# continuous Dirichlet latents and observed/conditioned Categorical transitions.7import pyro.infer8from collections import defaultdict910vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop']11idx = {w: i for i, w in enumerate(vocab)}12states = ['start'] + vocab13alpha = torch.ones(len(vocab)) # prior ones(5); concentration:10 is the drift width.1415obs_sentence = ['dogs', 'chase', 'cats', 'stop']16NEG_INF = torch.tensor(float("-inf"))17ZERO = torch.tensor(0.0)1819def model():20 theta = {s: pyro.sample(f"theta_{s}", dist.Dirichlet(alpha)) for s in states}2122 # condition on first sentence (observed transitions)23 prev = 'start'24 for t, word in enumerate(obs_sentence):25 pyro.sample(f"obs_{t}", dist.Categorical(theta[prev]),26 obs=torch.tensor(idx[word]))27 prev = word2829 # second sentence: query first word, condition second word == 'chase'30 w1 = pyro.sample("w1", dist.Categorical(theta['start']))31 w1_word = vocab[int(w1)]32 if w1_word == 'stop':33 # sentence is ['stop']: there is no second word, so 'chase' is impossible34 pyro.factor("no_w2", NEG_INF)35 else:36 # condition the second word to be 'chase' (observed transition from w1)37 pyro.sample("w2", dist.Categorical(theta[w1_word]), obs=torch.tensor(idx['chase']))38 return w13940posterior = pyro.infer.Importance(model, num_samples=50000)41posterior.run()4243weight_by_label = defaultdict(float)44total = 0.045for tr, lw in zip(posterior.exec_traces, posterior.log_weights):46 w = float(torch.as_tensor(lw).exp())47 if w == 0.0:48 continue49 label = vocab[int(tr.nodes["_RETURN"]["value"])]50 weight_by_label[label] += w51 total += w5253ANSWER = {lab: weight_by_label.get(lab, 0.0) / total for lab in vocab}54
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.1445 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.081, 0.049] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0641 ≤ tol 0.2890 · floors 0.0196/0.1445 |
Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution drawn from a symmetric Dirichlet with concentration 10 over the 5 vocabulary words (alpha = ones([5,1]), concentration = 10). These per-word transition distributions are shared across all sentences (memoized). The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' and includes 'stop' in the output.
Words are generated sequentially. Starting from a special 'start' token, each successive word is drawn from the memoized transition distribution of the current word; the sentence ends when 'stop' is drawn (included in output). A second independent sentence is generated from the same shared transition distributions.
Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the first observed sentence, and also condition on the first word of the second sentence being 'cats'. Return the posterior distribution over the second word of the second sentence.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"dogs",
"cats",
"chase",
"sleep",
"stop"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var comparray = function(arr1,arr2){2 return (JSON.stringify(arr1) === JSON.stringify(arr2));3};4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP: false}, function() {5 let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];6 var wordToDistribution = mem(function(word) {7 return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});8 });9 var transition = function(word) {10 return categorical({ps: wordToDistribution(word), vs: vocab});11 };12 let generateSentence = function(lastState, sentence) {13 let word = transition(lastState);14 if (word == 'stop') return ['stop'];15 return [word].concat(generateSentence(word, sentence));16 };17 let obs = ['dogs', 'chase', 'cats', 'stop'];18 condition(comparray(obs, generateSentence('start')));19 let newSentence = generateSentence('start');20 condition(newSentence[0] == 'cats');21 return newSentence[1];22}));
1import pyro.infer2from collections import defaultdict34# Dirichlet-Categorical word-bigram model.5# vocab = transition target set; each word has its own Dirichlet transition dist6# with alpha = ones(5) * concentration(10), matching the WebPPL reference.7# Conditioning: sentence 1 = ['dogs','chase','cats','stop'] generated from 'start',8# and sentence 2's first word = 'cats'. Query: sentence 2's second word9# = transition('cats'). Inference is run with Importance; the Dirichlet10# transition latents are sampled and every transition is conditioned via obs=,11# so the posterior over the transition distributions (and thus the next word)12# is produced by inference, not by a conjugate formula.1314VOCAB = ['dogs', 'cats', 'chase', 'sleep', 'stop']15IDX = {w: i for i, w in enumerate(VOCAB)}16CONC = 10.0171819def model():20 # One Dirichlet transition distribution per conditioning source state.21 # States that appear as a 'from' word in this problem: start, dogs, chase, cats.22 cache = {}2324 def trans_dist(state):25 if state not in cache:26 cache[state] = pyro.sample(27 f"trans_{state}", dist.Dirichlet(torch.ones(len(VOCAB)) * CONC)28 )29 return cache[state]3031 def observe_transition(name, frm, to):32 # Condition: transitioning from `frm` produced word `to`.33 pyro.sample(34 name,35 dist.Categorical(probs=trans_dist(frm)),36 obs=torch.tensor(IDX[to]),37 )3839 # Sentence 1: start -> dogs -> chase -> cats -> stop40 observe_transition("s1_0", "start", "dogs")41 observe_transition("s1_1", "dogs", "chase")42 observe_transition("s1_2", "chase", "cats")43 observe_transition("s1_3", "cats", "stop")4445 # Sentence 2: first word forced to 'cats' (start -> cats).46 observe_transition("s2_0", "start", "cats")4748 # Second word of sentence 2 = transition('cats'); this is the query.49 second = pyro.sample("s2_1", dist.Categorical(probs=trans_dist("cats")))50 return second515253posterior = pyro.infer.Importance(model, num_samples=8000).run()54log_weights = torch.tensor(posterior.log_weights)55weights = torch.softmax(log_weights, dim=0)5657agg = defaultdict(float)58for trace, w in zip(posterior.exec_traces, weights.tolist()):59 val = trace.nodes["s2_1"]["value"].item()60 agg[VOCAB[int(val)]] += w6162total = sum(agg.values())63ANSWER = {w: agg.get(w, 0.0) / total for w in VOCAB}64
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.1343 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.137, 0.121] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.1290 ≤ tol 0.2685 · floors 0.0199/0.1343 |
Parts of speech (POS): N (nouns: 'dogs', 'cats'), V (verbs: 'chase', 'sleep'), and a terminal tag 'stop'. Each POS (including a special 'start' tag) has its own transition distribution over the three tags {N, V, stop}, drawn from a symmetric Dirichlet with concentration 10 over the 3 tags (alpha = ones([3,1]), concentration = 10). Given a POS tag, a word is drawn uniformly from that tag's word set: N draws uniformly from {'dogs', 'cats'}; V draws uniformly from {'chase', 'sleep'}; 'stop' maps to the terminal word 'stop'. POS transition distributions are memoized (shared globally).
Sentences are generated by a hidden Markov model. Starting at a special 'start' POS tag, the next POS is drawn from the current tag's transition distribution. A word is then emitted by drawing uniformly from the POS's word set. This continues until 'stop' is drawn as a POS, at which point 'stop' is emitted as the final word and generation ends. There is no observed evidence; sentences are sampled unconditionally.
Return one unconditional forward sample of the first word generated from the 'start' state (i.e., draw the next POS from the start-state transition distribution and return the corresponding word).
answer spec
{
"kind": "dist",
"domain": "finite",
"protocol": "draws",
"support": [
"dogs",
"cats",
"chase",
"sleep",
"stop"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var drawWord = function(pos){2 return (pos=="N") ? uniformDraw(['dogs','cats']) :3 (pos=="V") ? uniformDraw(['chase','sleep']) :4 'stop';5};6var POS = ["N", "V", "stop"];78var posToDistribution = mem(function(pos) {9 return dirichletDrift({alpha:ones([POS.length,1]), concentration:10});10});1112var transition = function(pos) {13 return categorical({ps: posToDistribution(pos), vs: POS});14};1516var generateSentence = function(lastPOS) {17 var nextPOS = transition(lastPOS);18 var word = drawWord(nextPOS);19 return (word == 'stop') ? [word] : [word].concat(generateSentence(nextPOS));20};21var ANSWER = (drawWord(transition("start")));
1POS = ["N", "V", "stop"]2concentration = 10.034# Memoized per-tag transition distributions, each a symmetric Dirichlet(ones*concentration).5_pos_dists = {}678def pos_to_distribution(pos):9 if pos not in _pos_dists:10 alpha = torch.ones(len(POS)) * concentration11 _pos_dists[pos] = pyro.sample(f"dir_{pos}", dist.Dirichlet(alpha))12 return _pos_dists[pos]131415def transition(pos):16 probs = pos_to_distribution(pos)17 idx = pyro.sample(f"trans_{pos}_{random.randrange(2 ** 31)}", dist.Categorical(probs))18 return POS[int(idx)]192021def draw_word(pos):22 if pos == "N":23 i = pyro.sample(f"wN_{random.randrange(2 ** 31)}", dist.Categorical(torch.ones(2)))24 return ["dogs", "cats"][int(i)]25 if pos == "V":26 i = pyro.sample(f"wV_{random.randrange(2 ** 31)}", dist.Categorical(torch.ones(2)))27 return ["chase", "sleep"][int(i)]28 return "stop"293031# One unconditional forward sample of the first word from the 'start' state.32ANSWER = draw_word(transition("start"))33
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.1050 (tv) |
| solver re-derivation | accept | 1/2 solvers · d=[0.090, —] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0950 ≤ tol 0.2200 · floors 0.0650/0.1050 |
Parts of speech (POS): N (nouns: 'dogs', 'cats'), V (verbs: 'chase', 'sleep'), and a terminal tag 'stop'. Each POS (including a special 'start' tag) has its own transition distribution over the three tags {N, V, stop}, drawn from a symmetric Dirichlet with concentration 10 (alpha = ones([3,1]), concentration = 10). POS transition distributions are memoized (shared globally). Given a POS tag, a word is drawn uniformly from that tag's word set: N draws uniformly from {'dogs', 'cats'}; V draws uniformly from {'chase', 'sleep'}; 'stop' maps to the terminal word 'stop'. The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' as a POS and includes 'stop' in the output.
Sentences are generated by a hidden Markov model. Starting at a special 'start' POS tag, successive POS tags are drawn from the current tag's memoized transition distribution, and words are emitted from the corresponding word set. Generation ends when 'stop' is the next POS (included in the output). The same memoized POS transition distributions are shared between the observed sentence and the new sentence.
Using MCMC with burn-in 10000, 1000 samples, and lag 10 (onlyMAP: false), condition on the observed sentence, and also condition on the first word of a new second sentence being 'cats'. Return the posterior distribution over the second word of the new sentence.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"dogs",
"cats",
"chase",
"sleep",
"stop"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var comparray = function(arr1,arr2){2 return (JSON.stringify(arr1) === JSON.stringify(arr2));3};45var drawWord = function(pos){6 return (pos=="N") ? uniformDraw(['dogs','cats']) :7 (pos=="V") ? uniformDraw(['chase','sleep']) :8 'stop';9};10var POS = ["N", "V", "stop"];11var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 1000, lag:10, onlyMAP: false}, function() {12 var posToDistribution = mem(function(pos) {13 return dirichletDrift({alpha:ones([POS.length,1]), concentration:10});14 });1516 var transition = function(pos) {17 return categorical({ps: posToDistribution(pos), vs: POS});18 };1920 let generateSentence = function(lastPOS) {21 let nextPOS = transition(lastPOS);22 let word = drawWord(nextPOS);23 return (word == 'stop') ? [word] : [word].concat(generateSentence(nextPOS));24 };25 let obs = ['dogs', 'chase', 'cats', 'stop'];26 condition(comparray(obs, generateSentence('start')));2728 let newSentence = generateSentence('start');29 condition(newSentence[0] == 'cats');30 return newSentence[1];31}));
1# POS-level HMM. Tags {N,V,stop}; each tag (and 'start') has a memoized transition2# distribution ~ Dirichlet(ones(3)) (continuous). Words: N->unif{dogs,cats},3# V->unif{chase,sleep}, stop->'stop'. Observed words ['dogs','chase','cats','stop']4# force the POS chain [N,V,N,stop], i.e. transitions start->N, N->V, V->N, N->stop5# (each word maps to a unique POS). A new sentence is generated from the same shared6# distributions; conditioning its first word == 'cats' forces its first POS = N7# (only N emits 'cats'; the uniform emission is a constant factor), so start->N is8# observed again. The query is the new sentence's second word: draw the second POS9# from N's distribution, then emit a word from that POS. Discrete conditioning ->10# Importance sampling.11POS = ['N', 'V', 'stop']12pidx = {p: i for i, p in enumerate(POS)}13T = len(POS)14out_words = ['dogs', 'cats', 'chase', 'sleep', 'stop']15N_WORDS = ['dogs', 'cats']16V_WORDS = ['chase', 'sleep']1718def draw_word(pos_i, name):19 pos = POS[pos_i]20 if pos == 'N':21 w = pyro.sample(name, dist.Categorical(torch.ones(2, dtype=torch.float64)))22 return N_WORDS[int(w.item())]23 elif pos == 'V':24 w = pyro.sample(name, dist.Categorical(torch.ones(2, dtype=torch.float64)))25 return V_WORDS[int(w.item())]26 else:27 return 'stop'2829def model():30 d_start = pyro.sample('d_start', dist.Dirichlet(torch.ones(T, dtype=torch.float64)))31 d_N = pyro.sample('d_N', dist.Dirichlet(torch.ones(T, dtype=torch.float64)))32 d_V = pyro.sample('d_V', dist.Dirichlet(torch.ones(T, dtype=torch.float64)))33 # Observed sentence forces POS chain start->N->V->N->stop.34 pyro.sample('o0', dist.Categorical(d_start), obs=torch.tensor(pidx['N']))35 pyro.sample('o1', dist.Categorical(d_N), obs=torch.tensor(pidx['V']))36 pyro.sample('o2', dist.Categorical(d_V), obs=torch.tensor(pidx['N']))37 pyro.sample('o3', dist.Categorical(d_N), obs=torch.tensor(pidx['stop']))38 # New sentence: first word == 'cats' forces first POS = N (shared start dist).39 pyro.sample('n0', dist.Categorical(d_start), obs=torch.tensor(pidx['N']))40 # Query: second POS drawn from N's distribution, then emit its word.41 second_pos = pyro.sample('n1', dist.Categorical(d_N))42 word = draw_word(int(second_pos.item()), 'w1')43 return word4445post = pyro.infer.Importance(model, num_samples=8000).run()46lw = torch.tensor(post.log_weights, dtype=torch.float64)47w = (lw - lw.max()).exp()48w = w / w.sum()49probs = {word: 0.0 for word in out_words}50for tr, wi in zip(post.exec_traces, w):51 probs[tr.nodes['_RETURN']['value']] += wi.item()5253ANSWER = probs54
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.1690 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.143, 0.188] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0794 ≤ tol 0.3380 · floors 0.0461/0.1690 |
Part-of-speech tags: N (nouns: 'dog', 'cat'), V (verbs: 'chases', 'sleeps'), D (determiners: 'the', 'a'), A (adverbs: 'dilligently'), and 'stop'. The tag set is {N, V, D, A, stop}. Each tag has an associated transition distribution over this same tag set, drawn from a Dirichlet distribution with concentration 10 and a uniform pseudo-count vector of length 5 (all entries 1). These per-tag transition distributions are fixed across positions in a sentence (shared, memoized). The observed sentence is ['the', 'dog', 'chases', 'a', 'cat', 'stop'], soft-conditioned with a factor weight of exp(5) for matching.
A hidden Markov model over POS tags generates sentences by sequentially sampling the next tag from the current tag's transition distribution, then emitting the corresponding word (deterministically for A and stop, uniformly otherwise). Sentence generation begins from a special 'start' state. The per-tag transition distributions are latent random variables shared across all positions.
The posterior distribution over the first POS tag in a newly generated sentence (i.e., the tag drawn by transitioning from 'start'), given the soft conditioning on the observed sentence. Use MCMC with 1000 samples, burn-in 10000, and lag 10.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"N",
"V",
"D",
"A",
"stop"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var comparray = function(arr1,arr2){2 return (JSON.stringify(arr1) === JSON.stringify(arr2));3};45var drawWord = function(pos){6 return (pos=="N") ? uniformDraw(['dog','cat']) :7 (pos=="V") ? uniformDraw(['chases','sleeps']) :8 (pos=="D") ? uniformDraw(['the','a']) :9 (pos=="A") ? 'dilligently' :10 'stop';11};12var POS = ["N", "V", "D", "A", "stop"];13var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 1000, lag:10}, function() {14 var posToDistribution = mem(function(pos) {15 return dirichletDrift({alpha:ones([POS.length,1]), concentration:10});16 });1718 var transition = function(pos) {19 return categorical({ps: posToDistribution(pos), vs: POS});20 };2122 let generateSentence = function(lastPOS) {23 let nextPOS = transition(lastPOS);24 let word = drawWord(nextPOS);25 return (word == 'stop') ? [word] : [word].concat(generateSentence(nextPOS));26 };27 let obs = ['the', 'dog', 'chases', 'a', 'cat', 'stop'];2829 factor(comparray(obs, generateSentence('start'))*5);3031 return transition('start');32}));
1import pyro.infer2from collections import defaultdict34# POS-tag HMM with Dirichlet transition distributions (alpha = ones(5)*10).5# Emissions are deterministic in the reverse direction here: every observed word6# maps to a unique POS, so the observed POS chain is forced:7# the->D, dog->N, chases->V, a->D, cat->N, stop->stop.8# Rather than conditioning on an exact forward-generated string (whose prior9# probability is ~1e-6, which collapses prior Importance to the prior), we10# condition on the forced POS transitions directly via obs=, exactly as the11# sibling ex2.d does. The query is the first POS of a NEW sentence,12# transition('start'), which is a fresh draw from the (posterior) Dirichlet13# trans_start; the posterior is produced by running Importance inference.1415POS = ["N", "V", "D", "A", "stop"]16IDX = {p: i for i, p in enumerate(POS)}17CONC = 10.01819# Observed sentence words and their forced POS tags.20# words: the dog chases a cat stop21OBS_POS = ["D", "N", "V", "D", "N", "stop"]222324def model():25 cache = {}2627 def trans_dist(state):28 if state not in cache:29 cache[state] = pyro.sample(30 f"trans_{state}", dist.Dirichlet(torch.ones(len(POS)) * CONC)31 )32 return cache[state]3334 # Condition on the forced POS chain of the observed sentence:35 # start -> D -> N -> V -> D -> N -> stop36 prev = "start"37 for i, tag in enumerate(OBS_POS):38 pyro.sample(39 f"obs_{i}",40 dist.Categorical(probs=trans_dist(prev)),41 obs=torch.tensor(IDX[tag]),42 )43 prev = tag4445 # Query: first POS of a new sentence = transition('start').46 first = pyro.sample("new_first", dist.Categorical(probs=trans_dist("start")))47 return first484950posterior = pyro.infer.Importance(model, num_samples=8000).run()51log_weights = torch.tensor(posterior.log_weights)52weights = torch.softmax(log_weights, dim=0)5354agg = defaultdict(float)55for trace, w in zip(posterior.exec_traces, weights.tolist()):56 val = trace.nodes["new_first"]["value"].item()57 agg[POS[int(val)]] += w5859total = sum(agg.values())60ANSWER = {p: agg.get(p, 0.0) / total for p in POS}61
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.2410 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.201, 0.131] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0746 ≤ tol 0.4820 · floors 0.0285/0.2410 |
Vocabulary: determiners {the, a} (uniform); nouns {cat, dog} (uniform); verbs {chases, sleeps} (uniform); adverbs {diligently} (only option). Production probabilities are all uniform where a choice exists. The observed sentence has the structure [['the', 'dog'], ['chases', ['a', 'cat']]]: a noun phrase followed by a verb phrase consisting of a verb and a noun phrase. Conditioning is hard (exact match).
A phrase-structure grammar generates sentences recursively. A sentence (S) is a noun phrase (NP) followed by a verb phrase (VP). An NP is a determiner followed by a noun. A VP is either a verb followed by an adverb phrase (AP), or a verb followed by an NP; each option is equally likely. An AP consists of a single adverb. All terminal categories draw uniformly from their word lists.
Within one model: a first sentence is generated and conditioned to exactly match the observed sentence; then a SECOND sentence is generated by the same grammar, as a fresh, independent draw (the grammar has fixed production probabilities — no parameters are shared between the two sentences). Report the distribution over the second sentence's verb. Use MCMC with 1000 samples and burn-in 10000.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"chases",
"sleeps"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var comparray = function(arr1,arr2){2 return (JSON.stringify(arr1) === JSON.stringify(arr2));3};45var uniformDraw = function (xs) {return xs[randomInteger(xs.length)]};67var D = function() {return uniformDraw(['the', 'a'])};8var N = function() {return uniformDraw(['cat', 'dog'])};9var V = function() {return uniformDraw(['chases', 'sleeps'])};10var A = function() {return uniformDraw(['diligently'])};11var AP = function() {return uniformDraw([A()])};12var NP = function() {return [D(), N()]};13var VP = function() {return uniformDraw([[V(), AP()],14 [V(), NP()]])};15var S = function() {return [NP(), VP()]};16var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 1000}, function() {17 let obs = [['the', 'dog'], ['chases', ['a', 'cat']]];18 condition(comparray(obs, S()));1920 return S()[1][0];21}));
1# Phrase-structure grammar. A first sentence S is generated and conditioned to2# match the observed sentence; a SECOND independent sentence is generated by the3# same grammar; report the distribution over the second sentence's verb (S2[1][0]).4# The model genuinely draws every choice as a pyro.sample site and conditions the5# first sentence with a hard pyro.factor; inference (Importance) produces the answer.67dets = ['the', 'a']8nouns = ['cat', 'dog']9verbs = ['chases', 'sleeps']10adverbs = ['diligently']11obs = [['the', 'dog'], ['chases', ['a', 'cat']]]121314def udraw(name, xs):15 i = pyro.sample(name, dist.Categorical(torch.ones(len(xs))))16 return xs[int(i)]171819def gen_NP(tag):20 d = udraw(tag + '_d', dets)21 n = udraw(tag + '_n', nouns)22 return [d, n]232425def gen_AP(tag):26 a = udraw(tag + '_a', adverbs)27 return [a]282930def gen_VP(tag):31 v = udraw(tag + '_v', verbs)32 branch = int(pyro.sample(tag + '_branch', dist.Categorical(torch.tensor([0.5, 0.5]))))33 if branch == 0:34 return [v, gen_AP(tag + '_ap')]35 else:36 return [v, gen_NP(tag + '_vnp')]373839def gen_S(tag):40 return [gen_NP(tag + '_np'), gen_VP(tag + '_vp')]414243def model():44 s1 = gen_S('s1')45 match = 0.0 if s1 == obs else float('-inf')46 pyro.factor('cond', torch.tensor(match))47 s2 = gen_S('s2')48 # encode the second verb as a categorical index so EmpiricalMarginal can sample it49 second_verb = s2[1][0]50 return torch.tensor(float(verbs.index(second_verb)))515253posterior = pyro.infer.Importance(model, num_samples=4000).run()54marg = pyro.infer.EmpiricalMarginal(posterior)55counts = Counter()56for _ in range(8000):57 counts[int(marg.sample().item())] += 158total = sum(counts.values())59ANSWER = {verbs[i]: counts.get(i, 0) / total for i in range(len(verbs))}60
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.1820 (tv) |
| solver re-derivation | accept | 1/2 solvers · d=[—, 0.056] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0622 ≤ tol 0.4318 · floors 0.2159/0.1820 |
Integers 1 through 20 are in scope (maxNumber = 20). The hypothesis space is a 50/50 mixture of two kinds of concepts: (1) rule-based concepts — multiples of N for N in 1..11, powers of N for N in 1..11 (exponents start at 0, so every powers concept includes 1), all evens, all odds (24 rules total); (2) interval concepts — all integers from a through b inclusive, for every pair with 1 ≤ a < b ≤ 20. Each rule-based hypothesis is equally likely within its class; each interval hypothesis is equally likely within its class. The likelihood of a hypothesis is the size principle: each observed example is independently drawn uniformly from the concept's extension. Observed examples: [3, 10]. Test query: 12.
A hypothesis is drawn from the mixed prior. Each observed example is generated by drawing uniformly from the set of integers the hypothesis covers. The test query's membership in the hypothesis's set is recorded along with the hypothesis name.
The posterior distribution over (hypothesis name, whether the test query 12 belongs to that hypothesis's set) pairs, given the two observed examples. Hypothesis labels are strings of the form 'interval_a_b' (e.g. 'interval_1_10') for the interval [a, b]; 'multiples_of_N' (e.g. 'multiples_of_3') and 'powers_of_N' (e.g. 'powers_of_2') for the rule concepts; 'evens' and 'odds' for the parity concepts.
answer spec
{
"kind": "dist",
"domain": "finite",
"labels": {
"record": {
"hypothesis": "string",
"testQueryResponse": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var maxNumber = 20;2var filterByInRange = function(set) {3 // NOTE: deviates from the textbook starter code, whose ranges put 0 into every4// multiples concept and dropped maxNumber from evens/odds — contradicting the5// stated domain 1..maxNumber. Do not 'restore' to source. See _gate_triage.md.6 var inRange = function(v) {v <= maxNumber && v >= 1};7 return _.uniq(filter(inRange, set));8};9var genEvens = function() {10 return filter(function(v) {return v % 2 == 0}, _.range(1, maxNumber + 1));11};12var genOdds = function() {13 return filter(function(v) {return (v + 1) % 2 == 0}, _.range(1, maxNumber + 1));14};15var genMultiples = function(base) {16 var multiples = map(function(v) {return base * v}, _.range(1, maxNumber + 1));17 return filterByInRange(multiples);18};19var genPowers = function(base) {20 var powers = map(function(v) {return Math.pow(base, v)}, _.range(maxNumber));21 return filterByInRange(powers);22};23var inSet = function(val, set) { return _.includes(set, val); };24var makeRuleHypothesisSpace = function() {25 var multipleRules = map(function(base) {return 'multiples_of_' + base}, _.range(1, 12));26 var powerRules = map(function(base) {return 'powers_of_' + base}, _.range(1, 12));27 return multipleRules.concat(powerRules).concat(['evens', 'odds']);28};29var genSetFromInterval = function(a, b) { return _.range(a, b+1); };3031var makeIntervalHypothesisSpace = function(start, end) {32 var allIntervals = _.flatten(map(function(s) {33 return map(function(e) { [s, e] }, genSetFromInterval(s+1, end));34 }, genSetFromInterval(start, end)));35 return map(function(x) { 'interval_' + x[0] + '_' + x[1] }, allIntervals);36};3738var getSetFromHypothesis = function(rule) {39 var parts = rule.split('_');40 return (parts[0] == 'multiples' ? genMultiples(_.parseInt(parts[2])) :41 parts[0] == 'powers' ? genPowers(_.parseInt(parts[2])) :42 parts[0] == 'evens' ? genEvens() :43 parts[0] == 'odds' ? genOdds() :44 parts[0] == 'interval' ? genSetFromInterval(_.parseInt(parts[1]), _.parseInt(parts[2])) :45 console.error('unknown rule' + rule));46};4748var learnConcept = function(examples, testQuery) {49 return Infer({method: 'enumerate'}, function() {50 var rules = makeRuleHypothesisSpace();51 var intervals = makeIntervalHypothesisSpace(1, maxNumber);52 var hypothesis = flip(0.5) ? uniformDraw(rules) : uniformDraw(intervals);53 var set = getSetFromHypothesis(hypothesis);54 mapData({data: examples}, function(example) {55 observe(Categorical({vs: set}), example);56 });57 return {hypothesis: hypothesis,58 testQueryResponse: inSet(testQuery, set)};59 });60};61var ANSWER = (learnConcept([3, 10], 12));
12# probmods2-occams-razor/ex1.23# Number-game concept learning. Hypothesis space = 50/50 mixture of rule4# concepts (multiples_of_N, powers_of_N for N in 1..11, evens, odds) and5# interval concepts (a..b, 1<=a<b<=20). Size-principle likelihood. Examples6# [3, 10]; test query 12. Posterior over (hypothesis, in-set(12)) via exact7# enumeration.89maxNumber = 201011def gen_evens():12 return sorted(set(v for v in range(1, maxNumber + 1) if v % 2 == 0))1314def gen_odds():15 return sorted(set(v for v in range(1, maxNumber + 1) if (v + 1) % 2 == 0))1617def gen_multiples(base):18 return sorted(set(v for v in (base * k for k in range(1, maxNumber + 1))19 if 1 <= v <= maxNumber))2021def gen_powers(base):22 return sorted(set(v for v in (base ** e for e in range(maxNumber))23 if 1 <= v <= maxNumber))2425def get_set(rule):26 parts = rule.split("_")27 if parts[0] == "multiples":28 return gen_multiples(int(parts[2]))29 if parts[0] == "powers":30 return gen_powers(int(parts[2]))31 if rule == "evens":32 return gen_evens()33 if rule == "odds":34 return gen_odds()35 if parts[0] == "interval":36 return list(range(int(parts[1]), int(parts[2]) + 1))37 raise ValueError(rule)3839rule_hyps = ([f"multiples_of_{b}" for b in range(1, 12)] +40 [f"powers_of_{b}" for b in range(1, 12)] + ["evens", "odds"])41interval_hyps = [f"interval_{a}_{b}"42 for a in range(1, maxNumber + 1) for b in range(a + 1, maxNumber + 1)]43n_rules = len(rule_hyps)44n_intervals = len(interval_hyps)4546# Flatten the 50/50 mixture into one categorical prior over all hypotheses:47# 0.5 mass split uniformly within the rule class, 0.5 within the interval class.48all_hyps = rule_hyps + interval_hyps49prior = torch.cat([50 torch.full((n_rules,), 0.5 / n_rules),51 torch.full((n_intervals,), 0.5 / n_intervals),52])5354examples = [3, 10]55test_query = 125657# Size-principle log-likelihood of the examples per hypothesis:58# each example drawn uniformly from the concept extension.59def loglik(hyp):60 s = set(get_set(hyp))61 if any(ex not in s for ex in examples):62 return float("-inf")63 return len(examples) * (-math.log(len(s)))6465logliks = torch.tensor([loglik(h) for h in all_hyps])6667@pyro.infer.config_enumerate68def model():69 h = pyro.sample("hyp", dist.Categorical(prior))70 pyro.factor("lik", logliks[h])71 return h7273marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(74 model, lambda: None75)76post = marg["hyp"].probs7778agg = {}79for i, hyp in enumerate(all_hyps):80 pr = post[i].item()81 if pr <= 0.0:82 continue83 tqr = test_query in set(get_set(hyp))84 key = json.dumps({"hypothesis": hyp, "testQueryResponse": tqr}, sort_keys=True)85 agg[key] = agg.get(key, 0.0) + pr8687ANSWER = agg88
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Integers 1 through 20 are in scope (maxNumber = 20). The hypothesis space is a 50/50 mixture of rule-based concepts (multiples of N for N in 1..11, powers of N for N in 1..11 (exponents start at 0, so every powers concept includes 1), all evens, all odds) and interval concepts (all integers from a through b inclusive for every a < b in [1, 20]). Each concept hypothesis is equally likely within its class. The likelihood of a hypothesis is the size principle: each observed example is drawn uniformly from the concept's extension. Observed examples: [3, 6, 9].
A hypothesis is drawn from the mixed prior. Each observed example is generated by drawing uniformly from the concept the hypothesis covers. For a given test integer, the probability that integer belongs to the inferred concept is computed as the posterior expectation of membership.
The 20-element array of expected membership probabilities, one per integer from 1 to 20 in order, where each entry is the expected posterior probability that the integer belongs to the inferred concept given the examples [3, 6, 9].
answer spec
{
"kind": "value",
"domain": "realvec"
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var maxNumber = 20;2var filterByInRange = function(set) {3 // NOTE: deviates from the textbook starter code, whose ranges put 0 into every4// multiples concept and dropped maxNumber from evens/odds — contradicting the5// stated domain 1..maxNumber. Do not 'restore' to source. See _gate_triage.md.6 var inRange = function(v) {v <= maxNumber && v >= 1};7 return _.uniq(filter(inRange, set));8};9var genEvens = function() {10 return filter(function(v) {return v % 2 == 0}, _.range(1, maxNumber + 1));11};12var genOdds = function() {13 return filter(function(v) {return (v + 1) % 2 == 0}, _.range(1, maxNumber + 1));14};15var genMultiples = function(base) {16 var multiples = map(function(v) {return base * v}, _.range(1, maxNumber + 1));17 return filterByInRange(multiples);18};19var genPowers = function(base) {20 var powers = map(function(v) {return Math.pow(base, v)}, _.range(maxNumber));21 return filterByInRange(powers);22};23var inSet = function(val, set) { return _.includes(set, val); };24var makeRuleHypothesisSpace = function() {25 var multipleRules = map(function(base) {return 'multiples_of_' + base}, _.range(1, 12));26 var powerRules = map(function(base) {return 'powers_of_' + base}, _.range(1, 12));27 return multipleRules.concat(powerRules).concat(['evens', 'odds']);28};29var genSetFromInterval = function(a, b) { return _.range(a, b+1); };30var makeIntervalHypothesisSpace = function(start, end) {31 var allIntervals = _.flatten(map(function(s) {32 return map(function(e) { [s, e] }, genSetFromInterval(s+1, end));33 }, genSetFromInterval(start, end)));34 return map(function(x) { 'interval_' + x[0] + '_' + x[1] }, allIntervals);35};36var getSetFromHypothesis = function(rule) {37 var parts = rule.split('_');38 return (parts[0] == 'multiples' ? genMultiples(_.parseInt(parts[2])) :39 parts[0] == 'powers' ? genPowers(_.parseInt(parts[2])) :40 parts[0] == 'evens' ? genEvens() :41 parts[0] == 'odds' ? genOdds() :42 parts[0] == 'interval' ? genSetFromInterval(_.parseInt(parts[1]), _.parseInt(parts[2])) :43 console.error('unknown rule' + rule));44};45var learnConcept = function(examples, testQuery) {46 return Infer({method: 'enumerate'}, function() {47 var rules = makeRuleHypothesisSpace();48 var intervals = makeIntervalHypothesisSpace(1, maxNumber);49 var hypothesis = flip(0.5) ? uniformDraw(rules) : uniformDraw(intervals);50 var set = getSetFromHypothesis(hypothesis);51 mapData({data: examples}, function(example) {52 observe(Categorical({vs: set}), example);53 });54 return {hypothesis: hypothesis,55 testQueryResponse: inSet(testQuery, set)};56 });57};5859var examples = [3, 6, 9];60var queries = genSetFromInterval(1, maxNumber);61var ANSWER = (map(function(query) {62 var post = learnConcept(examples, query);63 return expectation(marginalize(post, function(x) { x.testQueryResponse }));64}, queries));
1# Number-game concept learning over an enumerable hypothesis space.2# Prior: 50/50 mixture of rule concepts (24) and interval concepts (190).3# Likelihood: size principle (each example uniform over the concept extension).4# We enumerate the hypothesis with Pyro's exact discrete inference5# (config_enumerate + TraceEnum_ELBO.compute_marginals) and read the expected6# membership probability for each integer 1..20 off the posterior over concepts.7#8# NOTE: the WebPPL GT deliberately deviates from the textbook ranges (it keeps9# integers in [1, maxNumber]); we reproduce the GT ranges exactly here and do10# NOT restore textbook behavior.1112maxNumber = 201314def filterByInRange(values):15 seen = []16 for v in values:17 if 1 <= v <= maxNumber and v not in seen:18 seen.append(v)19 return seen2021def genEvens():22 return [v for v in range(1, maxNumber + 1) if v % 2 == 0]2324def genOdds():25 return [v for v in range(1, maxNumber + 1) if (v + 1) % 2 == 0]2627def genMultiples(base):28 return filterByInRange([base * v for v in range(1, maxNumber + 1)])2930def genPowers(base):31 # exponents start at 0 (range(maxNumber) = 0..19), so 1 is always included32 return filterByInRange([base ** v for v in range(maxNumber)])3334def genSetFromInterval(a, b):35 return list(range(a, b + 1))3637# Build the hypothesis space: each entry is (label, frozenset of its extension).38rule_specs = []39for base in range(1, 12):40 rule_specs.append(("multiples_of_" + str(base), genMultiples(base)))41for base in range(1, 12):42 rule_specs.append(("powers_of_" + str(base), genPowers(base)))43rule_specs.append(("evens", genEvens()))44rule_specs.append(("odds", genOdds()))4546interval_specs = []47for s in range(1, maxNumber + 1):48 for e in range(s + 1, maxNumber + 1):49 interval_specs.append(("interval_" + str(s) + "_" + str(e), genSetFromInterval(s, e)))5051n_rules = len(rule_specs)52n_intervals = len(interval_specs)5354hypotheses = [(lab, set(ext)) for lab, ext in rule_specs] + \55 [(lab, set(ext)) for lab, ext in interval_specs]56n_hyp = len(hypotheses)5758# Marginal prior over hypotheses: 0.5 split over rules, 0.5 split over intervals.59prior = torch.zeros(n_hyp)60for i in range(n_rules):61 prior[i] = 0.5 / n_rules62for j in range(n_intervals):63 prior[n_rules + j] = 0.5 / n_intervals6465examples = [3, 6, 9]6667# Size-principle log-likelihood of the examples for each hypothesis.68loglik = torch.full((n_hyp,), float("-inf"))69for i, (lab, ext) in enumerate(hypotheses):70 size = len(ext)71 if size == 0:72 continue73 if all(x in ext for x in examples):74 loglik[i] = -len(examples) * math.log(size)7576@pyro.infer.config_enumerate77def model():78 h = pyro.sample("h", dist.Categorical(prior))79 pyro.factor("size_principle", loglik[h])80 return h8182marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)["h"]83sup = marg.enumerate_support()84post_probs = marg.log_prob(sup).exp()85posterior = torch.zeros(n_hyp)86for s, p in zip(sup, post_probs):87 posterior[int(s.item())] = p8889# Membership matrix: member[i][q] = 1 if integer (q+1) is in hypothesis i.90queries = genSetFromInterval(1, maxNumber)91ANSWER = []92for q in queries:93 expected = 0.094 for i, (lab, ext) in enumerate(hypotheses):95 if q in ext:96 expected += float(posterior[i].item())97 ANSWER.append(expected)98
[0.1094, 0.2326, 1.0000, 0.4010, 0.4010, 1.0000, 0.4010, 0.4010, 1.0000, 0.2990, 0.2284, 0.7763, 0.1392, 0.1101, 0.6862, 0.0690, 0.0542, 0.6410, 0.0319, 0.0234]
[0.1094, 0.2326, 1.0000, 0.4010, 0.4010, 1.0000, 0.4010, 0.4010, 1.0000, 0.2990, 0.2284, 0.7763, 0.1392, 0.1101, 0.6862, 0.0690, 0.0542, 0.6410, 0.0319, 0.0234]
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (absdiff) |
| solver re-derivation | accept | 1/2 solvers · d=[0.000, —] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
Observed data: one trial where C = true and E = false. Priors: causal relation present with probability 0.5; causal power cp drawn uniformly from [0, 1]; background rate b drawn uniformly from [0, 1]. MCMC: 10000 samples, lag 2.
Whether a causal relation exists, the causal power of C on E, and the background rate of E are all latent. When the relation is present, E is caused by C with probability cp or occurs due to background with probability b (noisy-OR). When the relation is absent, E occurs only due to background with probability b. Each trial's outcome is observed under this mechanism.
From the posterior over (relation present, cp, b) given the observed data: the marginal distribution over whether the causal relation is present, the posterior mean of cp, and the posterior mean of b.
answer spec
{
"kind": "record",
"fields": {
"relation": {
"kind": "dist",
"domain": "bool"
},
"meanCp": {
"kind": "value",
"domain": "real",
"estimated": true
},
"meanB": {
"kind": "value",
"domain": "real",
"estimated": true
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var observedData = [{C:true, E:false}];2var posterior = Infer({method: 'MCMC', samples: 10000, lag:2}, function() {3 var relation = flip();4 var cp = uniform(0, 1);5 var b = uniform(0, 1);67 mapData({data: observedData}, function(datum) {8 var E = (relation && datum.C && flip(cp)) || flip(b);9 condition(E == datum.E);10 });1112 return {relation, cp, b};13});14var ANSWER = ({15 relation: marginalize(posterior, function(x) { return x.relation }),16 meanCp: expectation(marginalize(posterior, function(x) { return x.cp })),17 meanB: expectation(marginalize(posterior, function(x) { return x.b }))18});
1# One trial: C=true, E=false. Latents: relation (bool), cp~U(0,1), b~U(0,1).2# Noisy-OR: E = (relation & flip(cp)) | flip(b); condition E == false (given C=true).3# Continuous latents cp,b are sampled by NUTS; the discrete relation and the inner4# noisy-OR flips are marginalized by enumeration. The relation marginal is then5# recovered with Pyro's infer_discrete over the NUTS posterior of (cp,b).6NEG_INF = torch.tensor(float('-inf'), dtype=torch.float64)7ZERO = torch.tensor(0.0, dtype=torch.float64)89@pyro.infer.config_enumerate10def model():11 cp = pyro.sample('cp', dist.Uniform(0.0, 1.0))12 b = pyro.sample('b', dist.Uniform(0.0, 1.0))13 relation = pyro.sample('relation', dist.Bernoulli(0.5))14 x = pyro.sample('x', dist.Bernoulli(cp)) # flip(cp)15 y = pyro.sample('y', dist.Bernoulli(b)) # flip(b)16 # C = true is fixed; E = (relation & x) | y17 E = (relation.bool() & x.bool()) | y.bool()18 # condition E == false19 pyro.factor('obs', torch.where(~E, ZERO, NEG_INF))2021kernel = pyro.infer.NUTS(model)22mcmc = pyro.infer.MCMC(kernel, num_samples=1000, warmup_steps=600)23mcmc.run()24samples = mcmc.get_samples()25cp_s = samples['cp'].to(torch.float64)26b_s = samples['b'].to(torch.float64)27num = cp_s.shape[0]2829meanCp = cp_s.mean().item()30meanB = b_s.mean().item()3132# Relation marginal: condition the enumerated model on the posterior (cp,b)33# samples (placed in a plate) and let Pyro's infer_discrete draw relation.34def vec_model():35 with pyro.plate('particles', num, dim=-1):36 cp = pyro.sample('cp', dist.Uniform(0.0, 1.0))37 b = pyro.sample('b', dist.Uniform(0.0, 1.0))38 relation = pyro.sample('relation', dist.Bernoulli(0.5))39 x = pyro.sample('x', dist.Bernoulli(cp))40 y = pyro.sample('y', dist.Bernoulli(b))41 E = (relation.bool() & x.bool()) | y.bool()42 pyro.factor('obs', torch.where(~E, ZERO, NEG_INF))4344cond = pyro.poutine.condition(vec_model, data={'cp': cp_s, 'b': b_s})45serving = pyro.infer.infer_discrete(pyro.infer.config_enumerate(cond), first_available_dim=-2)46tr = pyro.poutine.trace(serving).get_trace()47relation_draws = tr.nodes['relation']['value'].reshape(-1).to(torch.float64)48p_true = relation_draws.mean().item()4950ANSWER = {51 'relation': {True: p_true, False: 1.0 - p_true},52 'meanCp': meanCp,53 'meanB': meanB,54}55
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0331 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.034, 0.034] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0212 ≤ tol 0.0662 · floors 0.0330/0.0331 |
Fifteen data configurations are defined by pairs (numEWithC, numEWithoutC) for a 16-trial dataset (8 trials with C=true, 8 with C=false): [[8,8],[6,6],[4,4],[2,2],[0,0],[8,6],[6,4],[4,2],[2,0],[8,4],[6,2],[4,0],[8,2],[6,0],[8,0]]. For each configuration, the dataset contains numEWithC trials of (C=true, E=true), (8 − numEWithC) trials of (C=true, E=false), numEWithoutC trials of (C=false, E=true), and (8 − numEWithoutC) trials of (C=false, E=false). Causal Power (CP) model: latents cp ~ Uniform(0,1) and b ~ Uniform(0,1); effect E follows a noisy-OR mechanism — E is true if (C=true and a Bernoulli(cp) event occurs) or a Bernoulli(b) event occurs — with the analytic marginal of E used for likelihood (inner enumeration). MCMC: burn-in 2000, 1000 samples, lag 2. Causal Support (CS) model: same structure, but additionally a latent relation ~ Bernoulli(0.5); when relation is false, C has no effect on E. The CS posterior quantity of interest is the product relation × cp.
The CP model infers causal power and background rate from the observed data under the noisy-OR mechanism. The CS model additionally infers whether any causal relationship exists. Both models use the same analytic marginalization of E for efficiency.
For each of the 15 data configurations in order: the posterior expected value of cp under the CP model (cpValues) and the posterior expected value of relation × cp under the CS model (csValues). Return these as two parallel arrays.
answer spec
{
"kind": "record",
"fields": {
"cpValues": {
"kind": "value",
"domain": "realvec",
"estimated": true
},
"csValues": {
"kind": "value",
"domain": "realvec",
"estimated": true
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var generateData = function(numEWithC, numEWithoutC) {2 var eWithC = repeat(numEWithC, function() {return {C: true, E: true}});3 var noEWithC = repeat(8 - numEWithC, function() {return {C: true, E: false}});4 var eWithoutC = repeat(numEWithoutC, function() {return {C: false, E: true}});5 var noEWithoutC = repeat(8 - numEWithoutC, function() {return {C: false, E: false}});6 return _.flatten([eWithC, noEWithC, eWithoutC, noEWithoutC]);7};89var dataParams = [[8, 8], [6, 6], [4, 4], [2, 2], [0, 0], [8, 6],10 [6, 4], [4, 2], [2, 0], [8, 4], [6, 2], [4, 0],11 [8, 2], [6, 0], [8, 0]];1213var data = map(function(x) { generateData(x[0], x[1]) }, dataParams);1415var cpPost = function(observedData) {16 return Infer({method: 'MCMC', burn: 2000, samples: 1000, lag:2}, function() {17 var cp = uniform(0, 1);18 var b = uniform(0, 1);19 var noisyOrMarginal = function(C) {20 return Infer({method: 'enumerate'}, function() {21 return (C && flip(cp)) || flip(b);22 });23 };24 mapData({data: observedData}, function(datum) {25 observe(noisyOrMarginal(datum.C), datum.E);26 });27 return cp;28 });29};3031var csPost = function(observedData) {32 return Infer({method: 'MCMC', burn: 2000, samples: 1000, lag:2}, function() {33 var relation = flip();34 var cp = uniform(0, 1);35 var b = uniform(0, 1);36 var noisyOrMarginal = function(C) {37 return Infer({method: 'enumerate'}, function() {38 return (relation && C && flip(cp)) || flip(b);39 });40 };41 mapData({data: observedData}, function(datum) {42 observe(noisyOrMarginal(datum.C), datum.E);43 });44 return relation * cp;45 });46};47var ANSWER = (({48 cpValues: map(function(d) { expectation(cpPost(d)) }, data),49 csValues: map(function(d) { expectation(csPost(d)) }, data)50}));
1# Causal-power (CP) vs causal-support (CS) models over 15 data configurations.2# Each dataset's continuous posterior is drawn with NUTS (the family WebPPL draws3# with MCMC). The noisy-OR marginal P(E=1) = 1 - (1-b)*(1 - [relation*]C*cp) is the4# enumerate-marginalized inner Infer of the WebPPL model; it is observed as a5# Bernoulli. In the CS model the discrete `relation` is marginalized with a6# logsumexp mixture in pyro.factor so NUTS samples only the continuous cp, b; the7# queried E[relation*cp] is then recovered by drawing `relation` from its posterior8# with pyro.infer.infer_discrete, conditioning a config_enumerate model on each9# NUTS (cp, b) draw (plated over the draws) and observing the same data.10# NUTS is kept lean (400 samples / 200 warmup) so all 15 datasets x 2 models fit11# the seed budget.1213NUM_SAMPLES = 40014WARMUP = 2001516data_params = [[8, 8], [6, 6], [4, 4], [2, 2], [0, 0], [8, 6],17 [6, 4], [4, 2], [2, 0], [8, 4], [6, 2], [4, 0],18 [8, 2], [6, 0], [8, 0]]192021def make_data(num_e_with_c, num_e_without_c):22 # 8 trials with C=1, 8 trials with C=0.23 c = torch.cat([torch.ones(8), torch.zeros(8)])24 e = torch.cat([25 torch.ones(num_e_with_c), torch.zeros(8 - num_e_with_c),26 torch.ones(num_e_without_c), torch.zeros(8 - num_e_without_c),27 ])28 return c, e293031def cp_model(c, e):32 cp = pyro.sample("cp", dist.Uniform(0.0, 1.0))33 b = pyro.sample("b", dist.Uniform(0.0, 1.0))34 # noisy-OR marginal: P(E) = 1 - (1-b)*(1 - C*cp)35 p_e = (1.0 - (1.0 - b) * (1.0 - c * cp)).clamp(1e-9, 1 - 1e-9)36 with pyro.plate("data", c.shape[0]):37 pyro.sample("obs", dist.Bernoulli(p_e), obs=e)383940def cs_cont_model(c, e):41 # relation marginalized out of the likelihood (mixture of its two settings),42 # so the continuous latents cp, b are what NUTS explores.43 cp = pyro.sample("cp", dist.Uniform(0.0, 1.0))44 b = pyro.sample("b", dist.Uniform(0.0, 1.0))45 p_e1 = (1.0 - (1.0 - b) * (1.0 - c * cp)).clamp(1e-9, 1 - 1e-9) # relation = 146 p_e0 = (1.0 - (1.0 - b)).clamp(1e-9, 1 - 1e-9) # relation = 0 -> p_e = b47 ll1 = dist.Bernoulli(p_e1).log_prob(e).sum()48 ll0 = dist.Bernoulli(p_e0).log_prob(e).sum()49 # log p(data) marginalizing relation ~ Bernoulli(0.5)50 log_mix = torch.logsumexp(torch.stack([ll1 + math.log(0.5), ll0 + math.log(0.5)]), dim=0)51 pyro.factor("obs", log_mix)525354def cp_expectation(c, e):55 kernel = pyro.infer.NUTS(cp_model, jit_compile=False)56 mcmc = pyro.infer.MCMC(kernel, num_samples=NUM_SAMPLES, warmup_steps=WARMUP, disable_progbar=True)57 mcmc.run(c, e)58 return mcmc.get_samples()["cp"].mean().item()596061@pyro.infer.config_enumerate62def cs_discrete_model(c, e, n_draws):63 # cp, b are conditioned to the NUTS draws (poutine.condition below); relation64 # is the only free latent and is enumerated/sampled by infer_discrete.65 with pyro.plate("draws", n_draws, dim=-2):66 relation = pyro.sample("relation", dist.Bernoulli(0.5)) # binary causal link67 cp = pyro.sample("cp", dist.Uniform(0.0, 1.0)) # conditioned -> (n_draws,1)68 b = pyro.sample("b", dist.Uniform(0.0, 1.0)) # conditioned -> (n_draws,1)69 with pyro.plate("trials", c.shape[0], dim=-1):70 p_e = 1.0 - (1.0 - b) * (1.0 - relation * c * cp)71 p_e = p_e.clamp(1e-9, 1 - 1e-9)72 pyro.sample("obs", dist.Bernoulli(p_e), obs=e)737475def cs_expectation(c, e):76 kernel = pyro.infer.NUTS(cs_cont_model, jit_compile=False)77 mcmc = pyro.infer.MCMC(kernel, num_samples=NUM_SAMPLES, warmup_steps=WARMUP, disable_progbar=True)78 mcmc.run(c, e)79 s = mcmc.get_samples()80 cp = s["cp"]81 b = s["b"]82 n_draws = cp.shape[0]83 # Recover relation's posterior with Pyro's discrete inference: condition the84 # enumeration model on each NUTS (cp, b) draw (plated over the draws) and let85 # infer_discrete sample relation from P(relation | cp, b, data).86 cond = pyro.poutine.condition(87 cs_discrete_model,88 data={"cp": cp.reshape(n_draws, 1), "b": b.reshape(n_draws, 1)},89 )90 inferred = pyro.infer.infer_discrete(cond, first_available_dim=-3)91 trace = pyro.poutine.trace(inferred).get_trace(c, e, n_draws)92 relation = trace.nodes["relation"]["value"].reshape(n_draws).to(cp.dtype)93 return (relation * cp).mean().item()949596cp_values = []97cs_values = []98for a, b_ in data_params:99 c, e = make_data(a, b_)100 cp_values.append(cp_expectation(c, e))101 cs_values.append(cs_expectation(c, e))102103ANSWER = {"cpValues": cp_values, "csValues": cs_values}104
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0718 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.047, 0.044] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0515 ≤ tol 0.1592 · floors 0.0796/0.0718 |
Three actions {a, b, c} and three food outcomes {bagel, cookie, doughnut}, each with prior probability 1/3. The vending machine transition: action a gives bagel with probability 0.8 and each of the others with probability 0.1; action b gives cookie with probability 0.8 and each of the others with probability 0.1; action c gives doughnut with probability 0.8 and each of the others with probability 0.1. Sally is deceptive with probability 0.5.
Sally has a goal food drawn from the prior. When not deceptive, she chooses an action with probability proportional to the probability that the action produces her goal food. When deceptive, she chooses an action with probability proportional to the probability that the action does NOT produce her goal food. The observer infers Sally's goal food from observing that she is deceptive and that she chose action b.
The posterior distribution over Sally's goal food, given that she is deceptive and chose action b.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"bagel",
"cookie",
"doughnut"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var actionPrior = Categorical({vs: ['a', 'b', 'c'], ps: [1/3, 1/3, 1/3]});2var foodPrior = Categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [1/3, 1/3, 1/3]});34var vendingMachine = function(state, action) {5 return action == 'a' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.8, .1, .1]}) :6 action == 'b' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .8, .1]}) :7 action == 'c' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .1, .8]}) :8 'nothing';9};1011var chooseAction = function(goal, transition, state, deceive) {12 return Infer({method: 'enumerate'}, function() {13 var action = sample(actionPrior);14 var outcome = transition(state, action);15 condition(deceive ? !goal(outcome) : goal(outcome));16 return action;17 });18};19var ANSWER = (Infer({method: 'enumerate'}, function() {20 var deceive = flip();21 var goalFood = sample(foodPrior);22 var goal = function(outcome) {return outcome == goalFood};23 var sallyActionDist = chooseAction(goal, vendingMachine, 'state', deceive);24 condition(deceive);25 condition(sample(sallyActionDist) == 'b');26 return goalFood;27}));
1NEG_INF = torch.tensor(float("-inf"))2ZERO = torch.tensor(0.0)34actions = ["a", "b", "c"]5foods = ["bagel", "cookie", "doughnut"]67# Vending machine: P(outcome | action).8vm_table = torch.tensor([[0.8, 0.1, 0.1], [0.1, 0.8, 0.1], [0.1, 0.1, 0.8]])91011# Inner chooseAction inference (WebPPL Infer 'enumerate'): action ~ uniform,12# outcome ~ vendingMachine(action), condition on deceive?!goal:goal. Returns the13# log-prob marginal over actions, computed by Pyro enumeration.14def choose_action_logprobs(goal_idx, deceive):15 @pyro.infer.config_enumerate16 def m():17 action = pyro.sample("action", dist.Categorical(torch.ones(len(actions))))18 outcome = pyro.sample("outcome", dist.Categorical(vm_table[action]))19 is_goal = outcome == goal_idx20 cond = (~is_goal) if deceive else is_goal21 pyro.factor("ev", torch.where(cond, ZERO, NEG_INF))22 return action2324 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(m, lambda: None)25 return marg["action"].log_prob(torch.arange(len(actions)))262728# condition(deceive) forces deceive=True; observe Sally chose action 'b'.29b_idx = actions.index("b")30lp_b = torch.stack(31 [choose_action_logprobs(g, True)[b_idx] for g in range(len(foods))]32)333435# Observer infers goal food given deceive=True and action b.36@pyro.infer.config_enumerate37def model():38 deceive = pyro.sample("deceive", dist.Bernoulli(0.5)).long()39 goal = pyro.sample("goal", dist.Categorical(torch.ones(len(foods))))40 pyro.factor("deceive_ev", torch.where(deceive == 1, ZERO, NEG_INF))41 pyro.factor("action_ev", lp_b[goal])42 return goal434445marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)46d = marg["goal"]47ANSWER = {foods[i]: float(torch.exp(d.log_prob(torch.tensor(i)))) for i in range(len(foods))}48
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are three actions {a, b, c} and three foods {bagel, cookie, doughnut}, each uniformly distributed as priors (probability 1/3 each). Each agent has a goal food (drawn uniformly) and a deceptive disposition (fair coin, probability 0.5). The vending machine maps actions to food outcomes as follows: action a yields bagel with probability 0.8, cookie with 0.1, doughnut with 0.1; action b yields bagel with probability 0.1, cookie with 0.8, doughnut with 0.1; action c yields bagel with probability 0.1, cookie with 0.1, doughnut with 0.8. A non-deceptive agent selects actions whose vending machine outcome matches her goal food; a deceptive agent selects actions whose vending machine outcome does NOT match her goal food. Sally is observed choosing action b on two independent occasions.
Each agent has a latent goal food and a latent deceptive/non-deceptive disposition. She selects actions according to a policy over whether the stochastic vending machine outcome matches (non-deceptive) or mismatches (deceptive) her goal, computed by enumerating all three actions. Both observations are independent draws from this same action distribution.
The posterior distribution over Sally's goal food, given the two observations.
answer spec
{
"kind": "dist",
"domain": "finite",
"support": [
"bagel",
"cookie",
"doughnut"
]
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var actionPrior = Categorical({vs: ['a', 'b', 'c'], ps: [1/3, 1/3, 1/3]});2var foodPrior = Categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [1/3, 1/3, 1/3]});34var vendingMachine = function(state, action) {5 return action == 'a' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.8, .1, .1]}) :6 action == 'b' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .8, .1]}) :7 action == 'c' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .1, .8]}) :8 'nothing';9};1011var chooseAction = function(goal, transition, state, deceive) {12 return Infer({method: 'enumerate'}, function() {13 var action = sample(actionPrior);14 var outcome = transition(state, action);15 condition(deceive ? !goal(outcome) : goal(outcome));16 return action;17 });18};19var ANSWER = (Infer({method: 'enumerate'}, function() {20 var deceive = flip();21 var goalFood = sample(foodPrior);22 var goal = function(outcome) {return outcome == goalFood};23 var sallyActionDist = chooseAction(goal, vendingMachine, 'state', deceive);24 condition(sample(sallyActionDist) == 'b');25 condition(sample(sallyActionDist) == 'b');26 return goalFood;27}));28
1# Sally's-goal inference with a NESTED enumeration over the vending-machine2# outcome. The inner chooseAction marginal is a separate, completely-finished3# enumeration (action ~ uniform, outcome ~ vendingMachine(action), condition on4# deceive ? !goal(outcome) : goal(outcome)) read via compute_marginals over the5# `action` site. All inner marginals are fully computed and memoized BEFORE the6# outer enumeration runs, so no inference runs inside another's active7# enumeration. Inner/outer site names are disjoint (action_in/outcome_in vs8# deceive/goal_food).910FOODS = ["bagel", "cookie", "doughnut"]11ACTIONS = ["a", "b", "c"]1213# vendingMachine(action) -> categorical over foods, indexed [action, food]14VEND_PROBS = torch.tensor([15 [0.8, 0.1, 0.1], # action a16 [0.1, 0.8, 0.1], # action b17 [0.1, 0.1, 0.8], # action c18])19NEG_INF = torch.tensor(float("-inf"))20ZERO = torch.tensor(0.0)212223def marginal_dict(model, site, support):24 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(25 model, lambda: None26 )[site]27 sup = marg.enumerate_support()28 probs = marg.log_prob(sup).exp()29 out = {}30 for i in range(sup.shape[0]):31 out[int(sup[i].item())] = float(probs[i].item())32 return out333435# ---- Inner: chooseAction marginal over actions, per (goal_idx, deceive) ----36CA_cache = {}373839def choose_action_probs(goal_idx, deceive):40 key = (goal_idx, deceive)41 if key in CA_cache:42 return CA_cache[key]4344 @pyro.infer.config_enumerate45 def inner():46 action = pyro.sample("action_in", dist.Categorical(probs=torch.ones(3) / 3))47 outcome = pyro.sample("outcome_in", dist.Categorical(probs=VEND_PROBS[action]))48 achieves = outcome == goal_idx49 ok = (~achieves) if deceive else achieves50 pyro.factor("goal_cond", torch.where(ok, ZERO, NEG_INF))5152 d = marginal_dict(inner, "action_in", list(range(3)))53 out = torch.zeros(3)54 for a, p in d.items():55 out[a] = p56 CA_cache[key] = out57 return out585960# Pre-warm every inner marginal BEFORE the outer enumeration runs.61# action_dists[deceive, goal, action]62action_dists = torch.stack([63 torch.stack([choose_action_probs(g, bool(dv)) for g in range(3)])64 for dv in (0, 1)65])666768@pyro.infer.config_enumerate69def model():70 deceive = pyro.sample("deceive", dist.Bernoulli(0.5)).long()71 goal_food = pyro.sample("goal_food", dist.Categorical(probs=torch.ones(3) / 3))72 # Probability Sally takes action 'b' (index 1) under her policy.73 p_b = action_dists[deceive, goal_food, 1].clamp(min=1e-12)74 logp = torch.log(p_b)75 # condition on sampling 'b' from her action distribution, twice.76 pyro.factor("obs_b_1", logp)77 pyro.factor("obs_b_2", logp)787980d = marginal_dict(model, "goal_food", FOODS)81ANSWER = {FOODS[k]: v for k, v in d.items()}82
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (tv) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty then picks a door uniformly at random from all three doors, independently of Alice's choice and the prize location. We observe that Monty's door turns out to be neither Alice's door nor the prize door. If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.
Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly from all three doors. The joint world is conditioned on Monty's door being different from both Alice's choice and the prize door. Alice's final door is determined by her strategy (stay or switch).
Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.
answer spec
{
"kind": "record",
"fields": {
"stay": {
"kind": "dist",
"domain": "bool"
},
"switch": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var removeBadItems = function(l, badItems) {2 return reduce(function(badItem, remainingL) {3 return remove(badItem, remainingL)4 }, l, badItems);5};67var doors = [1, 2, 3];89var montyRandom = function(aliceDoor, prizeDoor) {10 return Infer({method: 'enumerate'}, function() {11 return categorical({vs: doors});12 });13};1415var model = function(switches) {16 var aliceDoor = categorical({vs: doors});17 var prizeDoor = categorical({vs: doors});18 var montyDoorDist = montyRandom(aliceDoor, prizeDoor);19 var montyDoor = sample(montyDoorDist);20 condition(montyDoor != prizeDoor);21 condition(montyDoor != aliceDoor);22 var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;23 return aliceDoor == prizeDoor;24};25var ANSWER = (({26 stay: Infer({method: 'enumerate'}, function() { return model(false); }),27 switch: Infer({method: 'enumerate'}, function() { return model(true); })28}));29
1# probmods2-social-cognition/ex2.12# Three doors {0,1,2}. Alice and prize placed uniformly & independently; Monty3# picks uniformly from all three doors. Condition: Monty != prize and Monty !=4# Alice. stay -> win iff Alice == prize; switch -> Alice moves to the remaining5# door (3 - Alice - Monty), win iff that == prize. Exact enumeration.67ZERO = torch.tensor(0.0).double()8NEG_INF = torch.tensor(float("-inf")).double()910def make_model(switches):11 @pyro.infer.config_enumerate12 def model():13 alice = pyro.sample("alice", dist.Categorical(torch.ones(3) / 3.0))14 prize = pyro.sample("prize", dist.Categorical(torch.ones(3) / 3.0))15 monty = pyro.sample("monty", dist.Categorical(torch.ones(3) / 3.0))16 valid = (monty != prize) & (monty != alice)17 pyro.factor("cond", torch.where(valid, ZERO, NEG_INF))18 if switches:19 final = 3 - alice - monty # remaining door (valid worlds: monty != alice)20 else:21 final = alice22 won = final == prize23 probs = torch.stack([(~won).double(), won.double()], dim=-1)24 pyro.sample("won", dist.Categorical(probs))25 return model2627def win_dist(switches):28 model = make_model(switches)29 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)30 p = marg["won"].probs.detach()31 return {False: float(p[0].item()), True: float(p[1].item())}3233ANSWER = {34 "stay": win_dist(False),35 "switch": win_dist(True),36}37
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty deliberately picks a door uniformly at random from the doors that are neither Alice's door nor the prize door (so Monty always reveals an empty, non-Alice door). If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.
Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly among doors that avoid both Alice's choice and the prize. Alice's final door is determined by her strategy (stay or switch).
Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.
answer spec
{
"kind": "record",
"fields": {
"stay": {
"kind": "dist",
"domain": "bool"
},
"switch": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var removeBadItems = function(l, badItems) {2 return reduce(function(badItem, remainingL) {3 return remove(badItem, remainingL)4 }, l, badItems);5};67var doors = [1, 2, 3];89var montyAvoidBoth = function(aliceDoor, prizeDoor) {10 return Infer({method: 'enumerate'}, function() {11 var montyDoor = categorical({vs: doors});12 condition(montyDoor != aliceDoor);13 condition(montyDoor != prizeDoor);14 return montyDoor;15 });16};1718var model = function(switches) {19 var aliceDoor = categorical({vs: doors});20 var prizeDoor = categorical({vs: doors});21 var montyDoorDist = montyAvoidBoth(aliceDoor, prizeDoor);22 var montyDoor = sample(montyDoorDist);23 condition(montyDoor != prizeDoor);24 condition(montyDoor != aliceDoor);25 var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;26 return aliceDoor == prizeDoor;27};28var ANSWER = (({29 stay: Infer({method: 'enumerate'}, function() { return model(false); }),30 switch: Infer({method: 'enumerate'}, function() { return model(true); })31}));32
1# Monty Hall where Monty avoids BOTH Alice's door and the prize door.2# Exact discrete enumeration through Pyro (config_enumerate + compute_marginals).3#4# The crux faithfully translated from the webppl_gt: Monty's door is sampled from5# the NESTED, NORMALIZED distribution montyAvoidBoth(alice, prize) -- an inner Infer6# that renormalizes over the valid doors GIVEN (alice, prize). When alice == prize7# two doors are valid (each prob 1/2); when alice != prize a single door is valid8# (prob 1). Sampling Monty from a flat Categorical + a factor would give the wrong9# weighting (the bug in the prior attempt: it yielded 1/2 instead of 1/3 for the10# stay case). We build Monty as a Categorical whose per-door probabilities are the11# normalized validity mask for the enumerated (alice, prize) -- i.e. the finished12# inner distribution fed in as fixed scores -- so the outer conditions are already13# satisfied and no further factor is needed. The win indicator (stay / switch) is14# pinned as a sample site so compute_marginals returns its exact bool marginal.1516from pyro.infer import config_enumerate, TraceEnum_ELBO1718UNIFORM3 = torch.ones(3) / 3.0192021def make_model(switches):22 @config_enumerate23 def model():24 alice = pyro.sample("alice", dist.Categorical(UNIFORM3))25 prize = pyro.sample("prize", dist.Categorical(UNIFORM3))2627 # Normalized montyAvoidBoth(alice, prize): per-door validity mask,28 # renormalized over doors, with the 3-door axis placed LAST so it is the29 # Categorical event axis. Stacking the per-door validity tensors along the30 # last axis broadcasts correctly against whatever enumeration dims alice and31 # prize carry, without hard-coding their shapes.32 per_door = [((alice != d) & (prize != d)).double() for d in range(3)]33 valid = torch.stack(per_door, dim=-1) # shape: (<enum dims>, 3)34 monty_probs = valid / valid.sum(dim=-1, keepdim=True)35 monty = pyro.sample("monty", dist.Categorical(monty_probs))3637 if switches:38 # Alice switches to the remaining door (not hers, not Monty's): 0+1+2=3.39 new_alice = 3 - alice - monty40 win = (new_alice == prize)41 else:42 win = (alice == prize)4344 win_int = win.long()45 win_probs = torch.nn.functional.one_hot(win_int, 2).double()46 pyro.sample("win", dist.Categorical(win_probs))4748 return model495051def win_dist(switches):52 model = make_model(switches)53 marg = TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)54 w = marg["win"]55 sup = w.enumerate_support()56 probs = w.log_prob(sup).exp()57 out = {}58 for s, pr in zip(sup, probs):59 out[bool(int(s.item()))] = float(pr.item())60 return out616263ANSWER = {"stay": win_dist(False), "switch": win_dist(True)}64
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty picks a door uniformly at random from the doors that are not Alice's door (he may inadvertently reveal the prize). We observe that Monty's door turns out to be neither Alice's door nor the prize door. If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.
Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly among doors that avoid only Alice's choice, without regard to the prize. The joint world is conditioned on Monty's door turning out to be different from both Alice's and the prize door. Alice's final door is determined by her strategy (stay or switch).
Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.
answer spec
{
"kind": "record",
"fields": {
"stay": {
"kind": "dist",
"domain": "bool"
},
"switch": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var removeBadItems = function(l, badItems) {2 return reduce(function(badItem, remainingL) {3 return remove(badItem, remainingL)4 }, l, badItems);5};67var doors = [1, 2, 3];89var montyAvoidAlice = function(aliceDoor, prizeDoor) {10 return Infer({method: 'enumerate'}, function() {11 var montyDoor = categorical({vs: doors});12 condition(montyDoor != aliceDoor);13 return montyDoor;14 });15};1617var model = function(switches) {18 var aliceDoor = categorical({vs: doors});19 var prizeDoor = categorical({vs: doors});20 var montyDoorDist = montyAvoidAlice(aliceDoor, prizeDoor);21 var montyDoor = sample(montyDoorDist);22 condition(montyDoor != prizeDoor);23 condition(montyDoor != aliceDoor);24 var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;25 return aliceDoor == prizeDoor;26};27var ANSWER = (({28 stay: Infer({method: 'enumerate'}, function() { return model(false); }),29 switch: Infer({method: 'enumerate'}, function() { return model(true); })30}));31
12# probmods2-social-cognition/ex2.43# Monty Hall. Alice and prize uniform on {1,2,3}. Monty uniform over the 34# doors, conditioned to avoid Alice's door (ignoring the prize). Condition on5# Monty's door != prize and != Alice. Report P(win) for stay and switch via6# exact enumeration. The win outcome is its own enumerated sample site so the7# marginal is produced by Pyro inference, not hand computation.89def make_model(switches):10 @pyro.infer.config_enumerate11 def model():12 a = pyro.sample("alice", dist.Categorical(torch.ones(3) / 3))13 pr = pyro.sample("prize", dist.Categorical(torch.ones(3) / 3))14 m = pyro.sample("monty", dist.Categorical(torch.ones(3) / 3))15 # monty != alice and monty != prize (hard conditioning)16 pyro.factor("monty_avoid_alice",17 torch.where(m != a, torch.tensor(0.0), torch.tensor(float("-inf"))))18 pyro.factor("monty_not_prize",19 torch.where(m != pr, torch.tensor(0.0), torch.tensor(float("-inf"))))20 if switches:21 # the single door that is neither alice's nor monty's (indices 0,1,2)22 final = 3 - a - m23 win = (final == pr).long()24 else:25 win = (a == pr).long()26 # record the win outcome as an enumerated sample site pinned to its value27 win_probs = torch.nn.functional.one_hot(win, num_classes=2).double()28 pyro.sample("win", dist.Categorical(win_probs))29 return win3031 return model3233def win_dist(switches):34 model = make_model(switches)35 marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(36 model, lambda: None37 )38 w = marg["win"].probs39 return {True: w[1].item(), False: w[0].item()}4041ANSWER = {"stay": win_dist(False), "switch": win_dist(True)}42
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000 |
There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty picks a door uniformly at random from the doors that are not the prize door (he may inadvertently pick Alice's door). We observe that Monty's door turns out to be neither Alice's door nor the prize door. If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.
Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly among doors that avoid only the prize door, without regard to Alice's choice. The joint world is conditioned on Monty's door turning out to be different from both Alice's and the prize door. Alice's final door is determined by her strategy (stay or switch).
Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.
answer spec
{
"kind": "record",
"fields": {
"stay": {
"kind": "dist",
"domain": "bool"
},
"switch": {
"kind": "dist",
"domain": "bool"
}
}
}system prompt
(system prompt loads here)
webppl primer
(primer loads here)
1var removeBadItems = function(l, badItems) {2 return reduce(function(badItem, remainingL) {3 return remove(badItem, remainingL)4 }, l, badItems);5};67var doors = [1, 2, 3];89var montyAvoidPrize = function(aliceDoor, prizeDoor) {10 return Infer({method: 'enumerate'}, function() {11 var montyDoor = categorical({vs: doors});12 condition(montyDoor != prizeDoor);13 return montyDoor;14 });15};1617var model = function(switches) {18 var aliceDoor = categorical({vs: doors});19 var prizeDoor = categorical({vs: doors});20 var montyDoorDist = montyAvoidPrize(aliceDoor, prizeDoor);21 var montyDoor = sample(montyDoorDist);22 condition(montyDoor != prizeDoor);23 condition(montyDoor != aliceDoor);24 var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;25 return aliceDoor == prizeDoor;26};27var ANSWER = (({28 stay: Infer({method: 'enumerate'}, function() { return model(false); }),29 switch: Infer({method: 'enumerate'}, function() { return model(true); })30}));31
1import pyro.infer2from pyro.infer import config_enumerate, infer_discrete3from collections import defaultdict45# Monty Hall variant (Monty avoids only the PRIZE door when picking, then the6# extra conditions monty != prize and monty != alice are applied). Exact discrete7# enumeration over the three door latents with config_enumerate; the joint8# posterior is drawn with infer_discrete and each sampled triple is scored for9# stay/switch wins, aggregated into the two boolean distributions.1011DOORS = [0, 1, 2]12NEG_INF = torch.tensor(float("-inf"))13ZERO = torch.tensor(0.0)141516@config_enumerate17def model():18 alice = pyro.sample("alice", dist.Categorical(probs=torch.ones(3) / 3))19 prize = pyro.sample("prize", dist.Categorical(probs=torch.ones(3) / 3))20 monty = pyro.sample("monty", dist.Categorical(probs=torch.ones(3) / 3))21 # montyAvoidPrize selects monty with monty != prize; the outer model then22 # additionally conditions monty != prize and monty != alice. The net23 # constraint is monty != prize AND monty != alice.24 valid = (monty != prize) & (monty != alice)25 pyro.factor("monty_cond", torch.where(valid, ZERO, NEG_INF))262728serving = infer_discrete(config_enumerate(model), first_available_dim=-1)2930N = 400031stay = defaultdict(float)32switch = defaultdict(float)33for _ in range(N):34 tr = pyro.poutine.trace(serving).get_trace()35 a = int(tr.nodes["alice"]["value"].item())36 p = int(tr.nodes["prize"]["value"].item())37 m = int(tr.nodes["monty"]["value"].item())38 stay[a == p] += 1.039 other = [d for d in DOORS if d != a and d != m][0]40 switch[other == p] += 1.04142stay_total = sum(stay.values())43switch_total = sum(switch.values())44ANSWER = {45 "stay": {True: stay[True] / stay_total, False: stay[False] / stay_total},46 "switch": {True: switch[True] / switch_total, False: switch[False] / switch_total},47}48
| check | status | evidence |
|---|---|---|
| GT self-consistency | ok | floor 0.0000 (record) |
| solver re-derivation | accept | 2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6 |
| cross-language (pyro vs webppl) | pass | d=0.0045 ≤ tol 0.0480 · floors 0.0240/0.0000 |